FEB
1
2001

Trolltech, IBM and KDE to Demo Voice-Control

Trolltech, IBM (NYSE:IBM - news), and KDE have teamed up at LinuxWorld Expo in New York and are demonstrating IBM's ViaVoice speech-recognition technology running on Qt and KDE. With ViaVoice integrated into Qt/KDE, it will be possible to control Qt/KDE desktop applications with speech input -- from launching applications to menu selections to text entry. Developers can easily integrate this technology into existing applications; in fact, in many cases no changes have to be made. The Trolltech press release follows.

 

Santa Clara, California -- Trolltech, IBM (NYSE:IBM - news), and KDE are teaming up at LinuxWorld to demonstrate IBM's ViaVoice speech-recognition technology running on Trolltech's Qt, a cross-platform C++ GUI framework in the K Desktop Environment.

The technology preview will be running during the entire show at Trolltech's Booth, No. 1557 at LinuxWorld, which will be held at the Jacob Javitz Convention
Center January 31 through February 2, 2001.

"This combination of technologies will greatly accelerate the creation and adoption of speech-enabled applications for the Linux desktop," says Patricia
McHugh, Director, New Business Development, IBM Voice Systems.

Matthias Ettrich, a senior software engineer at Trolltech and the founder of KDE, elaborates: "When ViaVoice is integrated with Qt, it will be possible to control
Qt-based Linux desktop applications with speech input that is as simple as -- if not more simple than -- keyboard input. Developers can build speech-capability
into the structure of their application from the beginning."

In other words, the two technologies running together eliminate several of the obstacles that have hampered widespread adoption of speech-recognition on the
desktop, including: inefficient resource-use; sub-optimum performance; and the difficulty of "bolting on" this functionality after a typical application has already
been written.

ViaVoice has already shown that it can handle the two typical speech-recognition tasks: command and control; and dictation. In addition, however, ViaVoice on
Qt supports: TTS (text to speech), in which the system can read any kind of text input and translate it into speech; and a function that allows programmers to
define a "grammar" in BNF format. The engine will then recognize phrases that match the grammar, e.g., special input modes for dates or numbers such as
"Monday, the first of June" or "two thousand one hundred and seventy five."

About Trolltech

Trolltech develops, supports, and markets Qt, a C++ cross-platform toolkit and windowing system. Qt and Qt/Embedded let programmers rapidly build
state-of-the-art GUI applications for desktop and embedded environments using a "write once, compile anywhere" strategy. Qt has been used to develop
hundreds of successful commercial applications worldwide, and is the basis of the K Desktop Environment (KDE). Trolltech is headquartered in Oslo, Norway,
with offices in Santa Clara, California, and Brisbane, Australia. www.trolltech.com

CONTACT: Trolltech
Aron Kozak, 408/219-6303
aron@trolltech.com
or
Al Shugart International
Jessica Damsen, 831/464-4746
jdamsen@alshugart.com

Comments

Let me just say that this is beyond cool. Once again, the TT/KDE guys, as well as IBM, show their sheer brilliance. My hat's off to them all.


By Joe KDE User at Thu, 2001/02/01 - 6:00am

Hmm, text to speech looks really cool. It would save a lot of reading. I'm guessing the speech to text would require a pretty beefy machine though.


By Ashleigh G at Thu, 2001/02/01 - 6:00am

Do you really think it would save so much reading? I don't. Thing is, that (at least I) can read much faster than I can speak, meaning ViaVoice reading something to me will always be slower than me reading myself.

Reading is great. All you have to do is move your eyes to the position that you want to read. How will ViaVoice handle moving to a different text position (maybe because you're not interested in this paragraph)? Its like comparing a tape with a cd (access-wise). Imagine reading Slashdot! It would probably read out the leftmost column first (faq, code, osdn, awards, privacy etc.) Wow, great! Then, with some bad luck, you're not interested in the first new item and have to spend 30 seconds on listening to it?

Well, speech recognition and text2speech is cool, but I cannot imagine how to make especially the latter very useful. I just hope they find a way!


By me at Thu, 2001/02/01 - 6:00am

This is about VOICE RECOGNITION

NOT

TEXT SYNTHESIS!


By anon at Thu, 2001/02/01 - 6:00am

No its not!

Read the darn article: "In addition, however, ViaVoice on Qt supports: TTS (text to speech)"

The someone said: "That would save me a lot of reading." The above poster replied to that. period.


By bah at Thu, 2001/02/01 - 6:00am

How about when you can have your computer tell you about something when you're not at your desk?
*phones home.
*computer answers (text2speech) "You have 3 email messages on your work account. One is marked urgent. Would you like me to list the subject lines?"
(voice recognition) "no. Tell me if the door is locked."
*computer answers (yes jim, the house is secured.)"
etc etc.


By vod at Thu, 2001/02/01 - 6:00am

*computer answers (yes jim, the house is secured.)" etc etc.

A more realistic conversation:

Dave: "Open the pod bay doors, HAL"

HAL: "I'm sorry, Dave, I'm afraid I can't do that..."

flounder


By flounder at Thu, 2001/02/01 - 6:00am

There is the issue of "accessibility" which is the point IBM usually brings up(not only the deaf & blind, but also access from alternative methods, such as handheld devices...or as in the previous message, through the telephone.


By Henry Izurieta at Thu, 2001/02/01 - 6:00am

This is off topic, but here goes:

Synthesized speech can be made a _lot_ faster then real human speech.

I have taken part to a project in which we created a synthesizer for blind people. We were able to make the thing speak three times the normal speed without any deterioration of recognisability.

Contrary to what you may think, it is not possible to speak this fast normally, just record yourself speaking and then compare files in a wave editor.

OTOH, if you are a fast reader, you are right - there is no way to make a synthesizer as fast as reading, but it can get pretty close.

The real problem is that browsing with read aloud text is really hard.


By Pirkka Jokela at Thu, 2001/02/01 - 6:00am

This is even more off topic... once years ago I won a contest reciting the names of all 66 books of the bible in 13 seconds. It's amazing how fast you can say a tongue twister with a few days practice.

Generally though you cannot speak that fast. As someone who has done public speaking and taught it I know that we can hear and process words something like 6 times faster than we can talk. For a speakder this means you must adjust vocoal inflections to hold audience interest. If you think about it you can look at something, think about something else and listen to someone talk so you can process information much faster than anyone can speak...

However I imagine very fast speach may take some getting used to. Classically it is considered good form to speak slower to be able to digest what is being said. Our prejudice against speach rates are that slower rates are perceived as lacking mental ability and faster rates are perceived as crooked types. These are in relation to our rate of speach. 5% faster is considered intelligent... so it would be interesting also to see how people perceive fast synthesized speach and how it might impact us.


By Eric Laffoon at Thu, 2001/02/01 - 6:00am

text to speach is handy.

I work for an ISP, use KDE all day and have quite a few screens - so logs of problems are missed. recently got text to speach on my log files courtesy of Festival - works pretty well - no matter what screen I'm workign on I know when things go wrong.

Need to admit I woudl feel an idoit talkign to the computer in teh office though ;-)

have fun,

aid


By aid at Fri, 2001/02/02 - 6:00am

A hammer is a hammer not a hoe!!!!

The point is not how a technology like a spread sheet will do the same old things in the same old way... Likker aps re-define the eway we get our job done... The succesfull ones are seldon incremental...

E.g ask yourself HOW could I do WHAT that was not practical before this technology existed and you will likly see it's true impact on man...

-tdh


By TDH at Tue, 2003/06/24 - 5:00am

It would be good for those who are disabled and need a way to interact with a system other then typing . I cant wait to see this. I would also like to know th eplanned cost of doing this and where we can down load it....


By Richard Bollinger at Wed, 2007/01/17 - 6:00am

Hmm,

I was just thinking about license issues.
Anybody who knows more?

Max


By Max at Thu, 2001/02/01 - 6:00am

IBM's product is commersial...


By Jo at Thu, 2001/02/01 - 6:00am

Does this mean we`ll have the whole liscens-issue all over again?


By coba at Thu, 2001/02/01 - 6:00am

Yes, exactly. All of the KDE developers have unanimously decided to distribute KDE only in binary form from now on, so that ViaVoice can be integrated into KDE. Some people raised concerns about this issue, but were quickly convinced by the clear advantages that a speech-driven desktop would bring them.

The Gnome Foundation immediately reacted to this revolutionary improvement by also closing all of their sources to be able to add AOL's Instant Messenger to their distribution.

bah!


By Anonymous at Thu, 2001/02/01 - 6:00am

I'd like to know if IBM will support internationalization. Will IBM release some utils to make localized phonems and train Voice recognition for another languages than English, Spanish, French, German? Or will IBM localize ViaVoice technology itself? I don't think so.


By Petr Husak at Thu, 2001/02/01 - 6:00am

You might be surprised- while living in Japan I wanted to buy via-voice (English) for a friend... couldn't find it anywhere because the only version available was the *localised* one for the Japanese market. I was told by several people it did very poorly at recognising English- reasonable proof to me that it had been well localised (in Japanese there are no solitary consonant sounds other than 'n'- everything is consonant+vowel.)


By Julian Rendell at Thu, 2001/02/01 - 6:00am

well, Japanese market is really much bigger
then eg. Czech, Slovakian, Hungarian ,...


By Milan Svoboda at Mon, 2001/02/05 - 6:00am

Just one quick troll:

Good work, boys!:) Only about a year and a half behind GTK this time.

GVoice, a Wrapper library around ViaVoice that provides call-back gtk+ signals, was initially announced sometime around June of 1999.

I normally wouldn't post trash such as this, but every article posted here is always accompanied with at least 3 comments to the effect of "What do you have to say about this, GNOME?:P" So I figured, if the KDE peeps troll on their own forum, I might as well too:) Happy replies:)


By james at Thu, 2001/02/01 - 6:00am

Oh no more postings like this one, please...

So just use your GVoice and your call-back signals and feel happy, o.k. ?


By thomas at Thu, 2001/02/01 - 6:00am

The comment was in jest..:)


By james at Thu, 2001/02/01 - 6:00am

ah... what does 'in jest' mean ?

If you want to compare Gnome with KDE please use
neutral ground (not dot.kde.org).
And a little bit self praise is allowed on both sides I think :) (this includes statements like
'what's your answer Gnome?'-justified or not..)


By thomas at Thu, 2001/02/01 - 6:00am

Is there such a thing such as a _neutral_ forum for talking about Gnome VS KDE ? :)


By Batard at Fri, 2001/02/02 - 6:00am

Sure there is... Microsoft web communities =)


By Divine at Fri, 2001/02/02 - 6:00am

really?

Honestly, where can I find this, its not on sourceforge!

Also, can I use every GNOME/Gtk App with it, or do they have to be modified? If yes, how far along are they?

I don't think a Speech-Controlled Desktop makes anyone work faster (except for dicatting maybe), but playing around with this feature surely is cool. Also, disabled^W differently abled people should get some advantages from this, I guess.


By me at Thu, 2001/02/01 - 6:00am

"...should get some advantages from this, I guess."

hmmm... depends if you consider being able to use a computer an advantage or not, I guess.

"I don't think a Speech-Controlled Desktop makes anyone work faster..."

it enables some of us to work, at all.


By disabifferently... at Thu, 2001/02/01 - 6:00am

yeah, even for those people afflicted with something as common as carpal-tunnel syndrome, this would be a huge benefit.


By will at Thu, 2001/02/01 - 6:00am

Freshmeat:)


By james at Thu, 2001/02/01 - 6:00am

A simple wrapper library around the native and very specific ViaVoice API is nice to have, but far from being sufficient - unless you explictely want to bind GPL'd applications to a non-free component.

I don't think you want that, James :)


By Matthias Ettrich at Thu, 2001/02/01 - 6:00am

Dunno, this is a good point. I would be curious to understand the difference though, because in all honestly I don't. Evidently I'm missing something important here (And admittedly I just skimmed the article). But I really don't see the difference between GVoice and the QT integration:)

- James


By james at Thu, 2001/02/01 - 6:00am

Are you sugesting that By accesing VV throgh QT, KDE insulates itself from the license isue ?

Pleas clarify.


By Forge at Fri, 2001/02/02 - 6:00am

> Are you sugesting that By accesing VV throgh QT,
> KDE insulates itself from the license isue ?

No, I'm not suggesting that.

Unfortuntely, I can't speak openly about the technology as it's just a demo yet (I suggest you try to contact IBM at the LinuxWorld).

What I can say is that all involved parties are aware of the licensing issues and that I personally won't do anything that might lock KDE into a non-free product (been there, done that).


By Matthias Ettrich at Fri, 2001/02/02 - 6:00am

and how on earth is that ever supposed to work properly...
Facts:

  • KDE2 needs alsa sound drivers
  • ViaVoice needs OSS drivers
  • They just don't work together.
  • I just tried to use ViaVoice's cmdlinespeak example to say something to me, it hung after speaking, with strace I found a syscall that would not finish which I reported to the alsa bugs page, let's wait and see.

    Don't tell me about soundcards, tried on different machines with different soundcards...


    By Mathias at Thu, 2001/02/01 - 6:00am

    KDE2 needs alsa sound drivers

    Huh? How did you come to this statement???

    Lukas


    By Lukas Tinkl at Thu, 2001/02/01 - 6:00am

    crystal ball?
    tea leaves?
    or just the fact that kde2 would not want to hoot at me when using the plain old kernel drivers, and anything KDE2 with sound is linked against the alsa libs???


    By Mathias at Thu, 2001/02/01 - 6:00am

    and anything KDE2 with sound is linked against the alsa libs
    Well, what can I say? If you had linked it against Alsa, it's linked against Alsa. Otherwise you can still compile kdelibs and kdemultimedia --with-oss and/or --with-alsa. It's up to you! :)
    Cheers,
    Lukas Tinkl [lukas@kde.org]


    By Lukas Tinkl at Thu, 2001/02/01 - 6:00am

    So maybe that's what's wrong, the goys @ suse (where I go my KDE 2.0.1 packages) just built against ALSA nut not OSS... but I just don't have the time to rebuild my KDE 2, sorry.


    By Mathias Homann at Thu, 2001/02/01 - 6:00am

    Actually, I'm running a kernel driver for my Sound Blaster 128 (the es1371 driver) and it works great under kde2. Although Alsa has support for OSS compatibility from what I understand. (it takes some extra entries in /etc/modules.conf) what kinda sound card are you working with?


    By Mr.Kill-9 at Fri, 2001/02/02 - 6:00am

    Note that the anounce specificaly says they will be doing demoes. Not just talking about this.

    More importantly since it's antitrust headaches, IBM tends not to anounce anything ontil it actualy works.


    By Forge at Thu, 2001/02/01 - 6:00am

    then they had that fixed within the last 8 days...


    By Mathias at Thu, 2001/02/01 - 6:00am

    Posible. Or maybe they just havn't released the working code yet :)


    By Forge at Fri, 2001/02/02 - 6:00am

    Hi!

    At least some of your facts are..well...wrong.

    Arts is the KDE sound daemon. If you look at this page, you'll find "support for audio output via ALSA" as a new feature for kde2.1beta2. So I guess, OSS was supported before. Or how else could someone have had sound before?


    By ac at Thu, 2001/02/01 - 6:00am

    same to you... ever tried to use kmixer with the old OSS driver???
    no valid device found is what you get in KDE2...


    By Mathias at Thu, 2001/02/01 - 6:00am

    Yep, and it work!
    I don't use ALSA on my slackware but KDE2, Arts and sound playing work really fine.

    :-)


    By Michele at Thu, 2001/02/01 - 6:00am

    Since your statement is wrong (kde does accept OSS) I guess you never really tested this....


    By rinse at Thu, 2001/02/01 - 6:00am

    yea, as if I was using linux only since yesterday...
    I had OSS drivers up and running since kernel 1.0, the ossfree from the kernel on one machine, the commercial oss on my laptop. Installed KDE2, no sound.
    installed alsa instead of oss drivers, sound. any further questions?


    By Mathias at Thu, 2001/02/01 - 6:00am

    Pages