[KDE Dot News]
 faq
 flatforty
 contribute
 subscribe
 configure
 search
 rdf

 main
 parent
 thread


Re: That's nice and all but...
by dmalloc on Sunday 15/Jul/2001, @23:11
Well, first of all, this is a text TO speech plugin and no voice control plugin. Text TO speech does use synthesized patters of speech, but it is not dependant on huge databses. Very smart people have found a way to describe a "langugae" in which a speech synthesizer, based on a "grammar" can actually produce valid sounding output, whihc our brain recognizes as a word or a sentence.
Yet, even though it sounds stupid, many big Voice recognition softwares have come to the consluion, that simple "comparison" between spoken text and known text is not good enough. There are a few more approaches thos this by now, for example by analyzing key parts of a word, recognizing the seiquenz opf certain sound triplets and other stochastical means of categorizing data. Basically they are developing very complex, yet precise heuristic algorithms for natural speech.
Since that requires to analyze gazillions of GIGbytes of actually spoken data, this reserach is very expensinve and therefor emostly carried by universities of big corperations (see IBM).
  Related Links
 ·   Articles on Accessibility
 ·   Also by dmalloc
 ·   Contact author

Thread Threshold:

The Fine Print: The following comments are owned by whomever posted them.
( Reply )

Re: That's nice and all but...
by Carbon on Monday 16/Jul/2001, @02:21
>universities of big corporations (see IBM)

Well, I knew it wouldn't take long for IBM to buy a university or two! :-)

Well, text to speech does require (somewhat large, but not huge) databases too. What I think you're referring to by "grammar", to explain it a little more in detail, is the databases that Festival, Mbrola, and (i think) the Macintosh TTS use.

Basically, these (about 10MB) databases consist of two things. The first is a database of the pronunciation of many words. The other part is a sound database containing a sample for each sound that the TTS system can play.

For every word it tries to read, it looks in the pronunciation database for which sounds the word is composed of, gets all those sounds from the sound database, and strings them together. If a word isn't found, it attempts to guess how to pronounce it, often with hilarious results

I don't really know all _that_ much about it (not nearly enought to code something like this myself, anyways), so if you really want more info on how this is done, go to the festival homepage (listed above) and read their thesis-like explanation yourself.
[ Reply To This | View ]
The Fine Print: The previous comments are owned by whomever posted them.
( Reply )

  "Don't code today what you can't debug tommorow." -- Ariya Hidayat
KDE®, "K Desktop Environment", "KDE Dot News", "got the dot?" and the KDE Logo® are trademarks or registered trademarks of KDE e.V. in the European Union, the United States and other countries. All other trademarks and copyrights on this page are owned by their respective owners. Comments are owned by the poster. The rest: Copyright © 2000-2008 KDE e.V. for The KDE Project. For further information or comments on this site, please contact the Webmaster.
[ home | post article | flat forty | subscribe | search | rdf ]