Konqueror Gets Text-to-Speech Synthesis

George Russell today released Speaker, a first salvo at making Konqueror (the KDE web browser) synthesize text to speech and hence making Konqueror useable by people with visual impairments and by people who are otherwise unable to view a screen. Speaker is a plugin for Konqueror which provides text-to-speech-synthesis using the Festival Speech Synthesis System engine developed at Edinburgh University. Currently text has to be selected with the mouse and the Speak menu entry selected, but hopefully the interface will be improved so visually impaired users can surf the web with Konqueror. More information is on the homepage and at apps.kde.com. Note that this is a testing release and requires a KDE CVS tree.

Dot Categories: 

Comments

Perhaps this will excite some development into Festival. I'm doing a (non-KDE related) project myself using it, and it hasn't recieved and update in years. It's sound quality is not nearly as understandable as the ubiquitous SimpleText.

However, a tip : the sound audibility improves a whole lot if you set the duration_stretch property to 1.6 or higher.

This can be esialy solved, if you use the logic of festival but the actual speech synthesis of mbrola. You can find the sources and instructions on how to use it with festival on their homepage here
http://tcts.fpms.ac.be/synthesis/mbrola.html

They even offer different speech types. Such as somone with a UK accent or somone from france.

Enjoy.

I've used mbrola myself, and it does provide some very good synthesis. Only problem : licensing. The Mbrola diphone databases (which is mostly what allows Mbrola to have such good speech quality) are for non-commercial use only. This is fine for your average home user, but this would disallow many people access to Konqueror's speech-synth, including commericial organizations for which this sort of technology is exactly what's required (for example, a private school for disabled kids)

If the konqi speech-synth stuff connects to festival using it's standard scheme interface, that makes installation of Mbrola optional. Could someone go check if that's the case?

Hello.
I know, that this may sound harsh, but if a large corporation wanted to make use of mbrola, they should pay a licensing fee to the universities involved. Research is important and very expensive at times.
Concerning private schools, whihc provide help or education to disabled people, I am sure, that the license will bend for such cases, that is only a matter of arrangement. There should be enough room for anyone that truely wants to use this system, to get in touch witht he maintainers and discuss this.

Well, that sounds fine. What i was getting at, is if we stared making Mbrola a listed dependency for KDE, then we could run into annoying legal issues, but the quality of Festival is significantly less without Mbrola.

by Ralph Clark (not verified)

Seriously there is no need for any dependency issue. I played with festival last year (courtesy of one kind individual who had posted SuSE-compatible RPMs) and along with the packaged binaries you could install various back ends including the MBROLA ones as files from a completely separate tarball. There is no static linking or recompilation involved and therefore there is no impact upon open source licences. The author should continue to ship "speaker" with the default speech synthesis but include a configuration option to pull in the MBROLA files (which really do give excellent results, near enough state-of-the-art in fact).

>There is no static linking or recompilation involved and therefore there is no impact upon open source licences.

The part that has a non-open-source license is not the mbrola program itself, but the mbrola diphone and lexicon databases, which are neccesary for mbrola to operate. Besides, what do static linking and recompilation have to do with licenses?

>MBROLA files (which really do give excellent results, near enough state-of-the-art in fact).

No argument there, mbrola does sound great!

by George Russell (not verified)

This was just a quick hack - to copy the Babel fish plugin as much as possible.

I know no C++ or Qt - any help on making this better would be appreciated.

You could help by telling me howto associate a keystroke with the plugin - so that Ctrl-A, Keystroke would start reading.

Also - you could tell me enough C++ to put the KProcess in class scope and allow a second action to stop reading. It'd be better than killall audsp for stopping.

Thanks
George Russell

by KDE User (not verified)

Isn't it funny how much you can with a *quick hack* in KDE these days? Good job!

by Richard Moore (not verified)

This seems similar to my KTalkEdit app from last year (for those who don't know, it's kedit hacked to support Festival). We ought to try to work together to come up with something a bit more general that we could use throughout KDE.

> You could help by telling me howto associate a
> keystroke with the plugin - so that Ctrl-A,
> Keystroke would start reading.

See my plugins tutorial on developer.kde.org for this (you can find it by searching for previous articles on the Dot). Basically, you just call setAccel() on the action you want to add a key binding to.

Rich.

Offhand this would look like a good feature in klipper, but if you can't see what you select in the first place then what's the point. -:)

by George Russell (not verified)

It'd be nice to see a standard kde wide ability to read a text selection / HTML view aloud.

Of course, I can't actually write this ;-)

George Russell

hi,i am having some problem in compiling the speaker.while trying to make configure it says no rule to make '../../configure.in needed by makefile.in. kindly do suggest a solution.i am still a newbie
thanx
ranjan simon

by dc (not verified)

...from what I understand, voice control is just comparing the sound inputed from the microphone with the database of pre-defined voices.
Is this also true for the open source KDE speech recognition software?
If yes, wouldn't they need some commercial backing to create such huge database?

by dmalloc (not verified)

Well, first of all, this is a text TO speech plugin and no voice control plugin. Text TO speech does use synthesized patters of speech, but it is not dependant on huge databses. Very smart people have found a way to describe a "langugae" in which a speech synthesizer, based on a "grammar" can actually produce valid sounding output, whihc our brain recognizes as a word or a sentence.
Yet, even though it sounds stupid, many big Voice recognition softwares have come to the consluion, that simple "comparison" between spoken text and known text is not good enough. There are a few more approaches thos this by now, for example by analyzing key parts of a word, recognizing the seiquenz opf certain sound triplets and other stochastical means of categorizing data. Basically they are developing very complex, yet precise heuristic algorithms for natural speech.
Since that requires to analyze gazillions of GIGbytes of actually spoken data, this reserach is very expensinve and therefor emostly carried by universities of big corperations (see IBM).

by Carbon (not verified)

>universities of big corporations (see IBM)

Well, I knew it wouldn't take long for IBM to buy a university or two! :-)

Well, text to speech does require (somewhat large, but not huge) databases too. What I think you're referring to by "grammar", to explain it a little more in detail, is the databases that Festival, Mbrola, and (i think) the Macintosh TTS use.

Basically, these (about 10MB) databases consist of two things. The first is a database of the pronunciation of many words. The other part is a sound database containing a sample for each sound that the TTS system can play.

For every word it tries to read, it looks in the pronunciation database for which sounds the word is composed of, gets all those sounds from the sound database, and strings them together. If a word isn't found, it attempts to guess how to pronounce it, often with hilarious results

I don't really know all _that_ much about it (not nearly enought to code something like this myself, anyways), so if you really want more info on how this is done, go to the festival homepage (listed above) and read their thesis-like explanation yourself.

by AC (not verified)

Totally offtopic, but whatever happened to that contest IBM was running for the best KDE themes?

by Karl Garrison (not verified)

Looks like it ended at the end of May, and they picked 3 unspecified winners. I wish they'd post the winning themes, however!

http://www-106.ibm.com/developerworks/linux/library/l-kde-c/?open&l=1974...

-Karl

by AC (not verified)

Yeah... That page says "June 2001: Top 10 themes available for download from the developerWorks Linux zone"... I've been waiting impatiently to see the new themes! ;-) Thanks for the pointer to the page.

by Speak to me sof... (not verified)

Like the dev part of it so you can use it to make commercial apps (like their own) or as the basis for further opensource tools and apps ...

??

by Gert Gunlaugson (not verified)

No but it is free ... so presumably you could develop an app that required customers to download the free runtime.

See:

http://www-4.ibm.com/software/speech/enterprise/te_5.html

by David Watson (not verified)

I was just thinking yesterday about writing a mod to KMail to do this sort of thing - using Festival, no less!

Guess I'll just have to work more on my projects that I've already got in the pipe :).

by kde-user (not verified)

Can't this technology be written as a KPart, so that ANY application can understand the text-to-speech process?

Or am I barking up the wrong tree?

All you then need is a generic way to select, start and stop the reading process which would be the same for all applications.

this is nice but in my opinion for visually impaired you'd need to do also a specifically redesigned no-x konqueror as they don't use nor need X.

by Galvatron (not verified)

Okay, maybe it's not great for the disabled, but it's still useful. For example, you could have books from Gutenburg read to you, if you don't like staring at the computer screen. Or, you can use it for multitasking (have it read one thing while you do something else). Or, 10 year olds can get hours of enjoyment from making it read swear words. Just a couple ideas off the top of my head...

by Milan Svoboda (not verified)

Little OT:
is possible to run some program
instead of playing sound ?
I mean these sounds which is configurable
in the kcontrolcentrum (?) (maximize,
minimalize, moving windows, starting
some programs, etc...)

I have text speech synthetizer (it's named
'say') and i want associated him to telling
what is happennig. eg: if app crash i want it
telling me 'it's death'.

it would be cool if we could use the dcop signals/slots mechanism for that. one problem: the dcop 'interface' does not show the signals a program could emit. I think dcop would be more usefull if those signals were listed...
then someone could come up with a 'waitforsignal' program you could use in shell scripts ... (other ideas: cat/cp/ls.. using KIO)

Hi,

I'm partially sighted and i like to see that
KDE/Konqueror will be more accessible.
I'm using KDE but for people who need text-to-speech it isn't.
The text-to-speech module is a very important thing but if blind people should use it, we need a powerful screenreader for KDE (for blinds), even
more possibilites for zoom etc.. Many other things
are optional useful.
I hope more kde-users & developers will join the
the accessibility mailing list to start a discussion about this topics.

by dingodonkey (not verified)

I hope features like this are not included by default. It would only make Konqueror bloated with features 99% of people using it would never use. I've seen that happen to great programs before, would hate to see it happen to Konqueror.

It's a plugin, which means that if you don't use it, it doesn't get loaded. konqueror's default size remains unchanged and bloat free.

by A Sad Person (not verified)

Also, the actual .cpp file is only about 150 lines long - and that's with the identation style that has brackets on separate lines, tons of whitespace, comments, etc. Also, some code is duplicated between the "Speak Selected"/"Speak All" code. So, fully optimized binaries are going to be tiny.

On another note, I must note that my jaw dropped when I saw how simple the code is. I always heard about how easy-to-use and powerful the KDE framework is, but this makes me thing that such descriptions are an understament. The KProcIO class seems particularly nice - doing standard fork/dup/exec thing is extremely ugly.

So, I must thank the KDE Developers not only for a great desktop environment, but for providing a great programming environment, which means we may be seeing more great apps soon.

by Rick (not verified)

OK, so how do I get the plugin to work?

Festival is installed and works fine. But I am unable to get Konqueror to recognize the plugin.

I've gone in, added it's directory as a plugin, scan for new plugins... Nothing.

My girlfriend is sight impaired and I'd love to be able to get this going for her.

Help!

by Richard Roseen (not verified)

Text to speech has uses other than sight impairment computer access.

In my profession I do a lot of writing in which it is important to write with accurate spelling and grammar.

They are several computer aided methods to proofread for spelling and grammar such as grammar and spell checkers, online dictionaries and thesaur and reviewing by reading the screen. There are also, the traditional proofreading modes via printing out the draft and reading or reading the printed draft out loud.

Therefore, text to speech provides a means to save paper and have the computer read the draft out loud. Of course, when someone other than the author reads the draft out loud or not even more errors are noticed. Thus, the computer with text to speech provides this person other than the author.