faq
flatforty
contribute
subscribe
configure
search
rdf
main
parent
thread
|
Re: That's nice and all but...
by Carbon on Monday 16/Jul/2001, @02:21
|
>universities of big corporations (see IBM)
Well, I knew it wouldn't take long for IBM to buy a university or two! :-)
Well, text to speech does require (somewhat large, but not huge) databases too. What I think you're referring to by "grammar", to explain it a little more in detail, is the databases that Festival, Mbrola, and (i think) the Macintosh TTS use.
Basically, these (about 10MB) databases consist of two things. The first is a database of the pronunciation of many words. The other part is a sound database containing a sample for each sound that the TTS system can play.
For every word it tries to read, it looks in the pronunciation database for which sounds the word is composed of, gets all those sounds from the sound database, and strings them together. If a word isn't found, it attempts to guess how to pronounce it, often with hilarious results
I don't really know all _that_ much about it (not nearly enought to code something like this myself, anyways), so if you really want more info on how this is done, go to the festival homepage (listed above) and read their thesis-like explanation yourself. |
|
|