Trolltech, IBM and KDE to Demo Voice-Control

Trolltech, IBM (NYSE:IBM - news), and KDE have teamed up at LinuxWorld Expo in New York and are demonstrating IBM's ViaVoice speech-recognition technology running on Qt and KDE. With ViaVoice integrated into Qt/KDE, it will be possible to control Qt/KDE desktop applications with speech input -- from launching applications to menu selections to text entry. Developers can easily integrate this technology into existing applications; in fact, in many cases no changes have to be made. The Trolltech press release follows.

 

Santa Clara, California -- Trolltech, IBM (NYSE:IBM - news), and KDE are teaming up at LinuxWorld to demonstrate IBM's ViaVoice speech-recognition technology running on Trolltech's Qt, a cross-platform C++ GUI framework in the K Desktop Environment.

The technology preview will be running during the entire show at Trolltech's Booth, No. 1557 at LinuxWorld, which will be held at the Jacob Javitz Convention
Center January 31 through February 2, 2001.

"This combination of technologies will greatly accelerate the creation and adoption of speech-enabled applications for the Linux desktop," says Patricia
McHugh, Director, New Business Development, IBM Voice Systems.

Matthias Ettrich, a senior software engineer at Trolltech and the founder of KDE, elaborates: "When ViaVoice is integrated with Qt, it will be possible to control
Qt-based Linux desktop applications with speech input that is as simple as -- if not more simple than -- keyboard input. Developers can build speech-capability
into the structure of their application from the beginning."

In other words, the two technologies running together eliminate several of the obstacles that have hampered widespread adoption of speech-recognition on the
desktop, including: inefficient resource-use; sub-optimum performance; and the difficulty of "bolting on" this functionality after a typical application has already
been written.

ViaVoice has already shown that it can handle the two typical speech-recognition tasks: command and control; and dictation. In addition, however, ViaVoice on
Qt supports: TTS (text to speech), in which the system can read any kind of text input and translate it into speech; and a function that allows programmers to
define a "grammar" in BNF format. The engine will then recognize phrases that match the grammar, e.g., special input modes for dates or numbers such as
"Monday, the first of June" or "two thousand one hundred and seventy five."

About Trolltech

Trolltech develops, supports, and markets Qt, a C++ cross-platform toolkit and windowing system. Qt and Qt/Embedded let programmers rapidly build
state-of-the-art GUI applications for desktop and embedded environments using a "write once, compile anywhere" strategy. Qt has been used to develop
hundreds of successful commercial applications worldwide, and is the basis of the K Desktop Environment (KDE). Trolltech is headquartered in Oslo, Norway,
with offices in Santa Clara, California, and Brisbane, Australia. www.trolltech.com

CONTACT: Trolltech
Aron Kozak, 408/219-6303
[email protected]
or
Al Shugart International
Jessica Damsen, 831/464-4746
[email protected]

Dot Categories: 

Comments

by Joe KDE User (not verified)

Let me just say that this is beyond cool. Once again, the TT/KDE guys, as well as IBM, show their sheer brilliance. My hat's off to them all.

Hmm, text to speech looks really cool. It would save a lot of reading. I'm guessing the speech to text would require a pretty beefy machine though.

Do you really think it would save so much reading? I don't. Thing is, that (at least I) can read much faster than I can speak, meaning ViaVoice reading something to me will always be slower than me reading myself.

Reading is great. All you have to do is move your eyes to the position that you want to read. How will ViaVoice handle moving to a different text position (maybe because you're not interested in this paragraph)? Its like comparing a tape with a cd (access-wise). Imagine reading Slashdot! It would probably read out the leftmost column first (faq, code, osdn, awards, privacy etc.) Wow, great! Then, with some bad luck, you're not interested in the first new item and have to spend 30 seconds on listening to it?

Well, speech recognition and text2speech is cool, but I cannot imagine how to make especially the latter very useful. I just hope they find a way!

This is about VOICE RECOGNITION

NOT

TEXT SYNTHESIS!

No its not!

Read the darn article: "In addition, however, ViaVoice on Qt supports: TTS (text to speech)"

The someone said: "That would save me a lot of reading." The above poster replied to that. period.

by vod (not verified)

How about when you can have your computer tell you about something when you're not at your desk?
*phones home.
*computer answers (text2speech) "You have 3 email messages on your work account. One is marked urgent. Would you like me to list the subject lines?"
(voice recognition) "no. Tell me if the door is locked."
*computer answers (yes jim, the house is secured.)"
etc etc.

by flounder (not verified)

*computer answers (yes jim, the house is secured.)" etc etc.

A more realistic conversation:

Dave: "Open the pod bay doors, HAL"

HAL: "I'm sorry, Dave, I'm afraid I can't do that..."

flounder

There is the issue of "accessibility" which is the point IBM usually brings up(not only the deaf & blind, but also access from alternative methods, such as handheld devices...or as in the previous message, through the telephone.

by Pirkka Jokela (not verified)

This is off topic, but here goes:

Synthesized speech can be made a _lot_ faster then real human speech.

I have taken part to a project in which we created a synthesizer for blind people. We were able to make the thing speak three times the normal speed without any deterioration of recognisability.

Contrary to what you may think, it is not possible to speak this fast normally, just record yourself speaking and then compare files in a wave editor.

OTOH, if you are a fast reader, you are right - there is no way to make a synthesizer as fast as reading, but it can get pretty close.

The real problem is that browsing with read aloud text is really hard.

by Eric Laffoon (not verified)

This is even more off topic... once years ago I won a contest reciting the names of all 66 books of the bible in 13 seconds. It's amazing how fast you can say a tongue twister with a few days practice.

Generally though you cannot speak that fast. As someone who has done public speaking and taught it I know that we can hear and process words something like 6 times faster than we can talk. For a speakder this means you must adjust vocoal inflections to hold audience interest. If you think about it you can look at something, think about something else and listen to someone talk so you can process information much faster than anyone can speak...

However I imagine very fast speach may take some getting used to. Classically it is considered good form to speak slower to be able to digest what is being said. Our prejudice against speach rates are that slower rates are perceived as lacking mental ability and faster rates are perceived as crooked types. These are in relation to our rate of speach. 5% faster is considered intelligent... so it would be interesting also to see how people perceive fast synthesized speach and how it might impact us.

text to speach is handy.

I work for an ISP, use KDE all day and have quite a few screens - so logs of problems are missed. recently got text to speach on my log files courtesy of Festival - works pretty well - no matter what screen I'm workign on I know when things go wrong.

Need to admit I woudl feel an idoit talkign to the computer in teh office though ;-)

have fun,

aid

A hammer is a hammer not a hoe!!!!

The point is not how a technology like a spread sheet will do the same old things in the same old way... Likker aps re-define the eway we get our job done... The succesfull ones are seldon incremental...

E.g ask yourself HOW could I do WHAT that was not practical before this technology existed and you will likly see it's true impact on man...

-tdh

It would be good for those who are disabled and need a way to interact with a system other then typing . I cant wait to see this. I would also like to know th eplanned cost of doing this and where we can down load it....

Hmm,

I was just thinking about license issues.
Anybody who knows more?

Max

IBM's product is commersial...

Does this mean we`ll have the whole liscens-issue all over again?

Yes, exactly. All of the KDE developers have unanimously decided to distribute KDE only in binary form from now on, so that ViaVoice can be integrated into KDE. Some people raised concerns about this issue, but were quickly convinced by the clear advantages that a speech-driven desktop would bring them.

The Gnome Foundation immediately reacted to this revolutionary improvement by also closing all of their sources to be able to add AOL's Instant Messenger to their distribution.

bah!

I'd like to know if IBM will support internationalization. Will IBM release some utils to make localized phonems and train Voice recognition for another languages than English, Spanish, French, German? Or will IBM localize ViaVoice technology itself? I don't think so.

by Julian Rendell (not verified)

You might be surprised- while living in Japan I wanted to buy via-voice (English) for a friend... couldn't find it anywhere because the only version available was the *localised* one for the Japanese market. I was told by several people it did very poorly at recognising English- reasonable proof to me that it had been well localised (in Japanese there are no solitary consonant sounds other than 'n'- everything is consonant+vowel.)

by Milan Svoboda (not verified)

well, Japanese market is really much bigger
then eg. Czech, Slovakian, Hungarian ,...

Just one quick troll:

Good work, boys!:) Only about a year and a half behind GTK this time.

GVoice, a Wrapper library around ViaVoice that provides call-back gtk+ signals, was initially announced sometime around June of 1999.

I normally wouldn't post trash such as this, but every article posted here is always accompanied with at least 3 comments to the effect of "What do you have to say about this, GNOME?:P" So I figured, if the KDE peeps troll on their own forum, I might as well too:) Happy replies:)

Oh no more postings like this one, please...

So just use your GVoice and your call-back signals and feel happy, o.k. ?

The comment was in jest..:)

ah... what does 'in jest' mean ?

If you want to compare Gnome with KDE please use
neutral ground (not dot.kde.org).
And a little bit self praise is allowed on both sides I think :) (this includes statements like
'what's your answer Gnome?'-justified or not..)

Is there such a thing such as a _neutral_ forum for talking about Gnome VS KDE ? :)

Sure there is... Microsoft web communities =)

really?

Honestly, where can I find this, its not on sourceforge!

Also, can I use every GNOME/Gtk App with it, or do they have to be modified? If yes, how far along are they?

I don't think a Speech-Controlled Desktop makes anyone work faster (except for dicatting maybe), but playing around with this feature surely is cool. Also, disabled^W differently abled people should get some advantages from this, I guess.

by disabifferently... (not verified)

"...should get some advantages from this, I guess."

hmmm... depends if you consider being able to use a computer an advantage or not, I guess.

"I don't think a Speech-Controlled Desktop makes anyone work faster..."

it enables some of us to work, at all.

yeah, even for those people afflicted with something as common as carpal-tunnel syndrome, this would be a huge benefit.

Freshmeat:)

by Matthias Ettrich (not verified)

A simple wrapper library around the native and very specific ViaVoice API is nice to have, but far from being sufficient - unless you explictely want to bind GPL'd applications to a non-free component.

I don't think you want that, James :)

Dunno, this is a good point. I would be curious to understand the difference though, because in all honestly I don't. Evidently I'm missing something important here (And admittedly I just skimmed the article). But I really don't see the difference between GVoice and the QT integration:)

- James

Are you sugesting that By accesing VV throgh QT, KDE insulates itself from the license isue ?

Pleas clarify.

by Matthias Ettrich (not verified)

> Are you sugesting that By accesing VV throgh QT,
> KDE insulates itself from the license isue ?

No, I'm not suggesting that.

Unfortuntely, I can't speak openly about the technology as it's just a demo yet (I suggest you try to contact IBM at the LinuxWorld).

What I can say is that all involved parties are aware of the licensing issues and that I personally won't do anything that might lock KDE into a non-free product (been there, done that).

and how on earth is that ever supposed to work properly...
Facts:

  • KDE2 needs alsa sound drivers
  • ViaVoice needs OSS drivers
  • They just don't work together.
  • I just tried to use ViaVoice's cmdlinespeak example to say something to me, it hung after speaking, with strace I found a syscall that would not finish which I reported to the alsa bugs page, let's wait and see.

    Don't tell me about soundcards, tried on different machines with different soundcards...

    by Lukas Tinkl (not verified)

    KDE2 needs alsa sound drivers

    Huh? How did you come to this statement???

    Lukas

    crystal ball?
    tea leaves?
    or just the fact that kde2 would not want to hoot at me when using the plain old kernel drivers, and anything KDE2 with sound is linked against the alsa libs???

    by Lukas Tinkl (not verified)

    and anything KDE2 with sound is linked against the alsa libs
    Well, what can I say? If you had linked it against Alsa, it's linked against Alsa. Otherwise you can still compile kdelibs and kdemultimedia --with-oss and/or --with-alsa. It's up to you! :)
    Cheers,
    Lukas Tinkl [[email protected]]

    by Mathias Homann (not verified)

    So maybe that's what's wrong, the goys @ suse (where I go my KDE 2.0.1 packages) just built against ALSA nut not OSS... but I just don't have the time to rebuild my KDE 2, sorry.

    Actually, I'm running a kernel driver for my Sound Blaster 128 (the es1371 driver) and it works great under kde2. Although Alsa has support for OSS compatibility from what I understand. (it takes some extra entries in /etc/modules.conf) what kinda sound card are you working with?

    by Forge (not verified)

    Note that the anounce specificaly says they will be doing demoes. Not just talking about this.

    More importantly since it's antitrust headaches, IBM tends not to anounce anything ontil it actualy works.

    by Mathias (not verified)

    then they had that fixed within the last 8 days...

    by Forge (not verified)

    Posible. Or maybe they just havn't released the working code yet :)

    Hi!

    At least some of your facts are..well...wrong.

    Arts is the KDE sound daemon. If you look at this page, you'll find "support for audio output via ALSA" as a new feature for kde2.1beta2. So I guess, OSS was supported before. Or how else could someone have had sound before?

    same to you... ever tried to use kmixer with the old OSS driver???
    no valid device found is what you get in KDE2...

    Yep, and it work!
    I don't use ALSA on my slackware but KDE2, Arts and sound playing work really fine.

    :-)

    Since your statement is wrong (kde does accept OSS) I guess you never really tested this....

    yea, as if I was using linux only since yesterday...
    I had OSS drivers up and running since kernel 1.0, the ossfree from the kernel on one machine, the commercial oss on my laptop. Installed KDE2, no sound.
    installed alsa instead of oss drivers, sound. any further questions?