JUL
25
2010

Simon at Akademy 2010: Interview with Peter Grasch

Troy: Peter, to begin, as a first time attendee of Akademy, what was your initial impression of the event?

Peter: Okay, there were a lot more people than I expected. When I arrived at the university grounds, I met a guy who was also looking for the entrance, and we started talking. This was the starting point for talking and it just continued from there - talking and talking and meeting people everywhere. It was really nice that I could talk about an issue I was having with KMail. Many other people also use KMail and knew what I was talking about.

It was also nice putting names to faces - yourself, Jos and Aaron for example.


Peter Grasch - The Simon Project

Troy: You were specifically invited to give a talk about The Simon Project - what was your expectation as to how you'd be received by the KDE community, and how did it compare to the actual reception you got?

Peter: To be honest, I didn't really know. I came directly from the university and didn't really have time to form expectations. From my experiences reading the Dot and PlanetKDE, I thought it would just be a bunch of hackers doing their thing, in a somewhat formal setting. More specifically, I expected that the KDE team would be formally defined without much interaction with outsiders. However I found that it is quite different and more informal than I expected. It is kind of nice to get direct access to developers to deal with bugs and so forth.

The reception has been overwhelmingly positive. The people that saw the talk received it quite well. Today at the workshop, people were surprised that it worked for them as well. It's nice that the project is being accepted by the community and people like what you are doing.

Troy: What features are new in Simon that were not there a year ago?

Peter: A year ago I was talking a lot about the scenario system, the package system, and now it's fully implemented. While it was hard to implement as I expected, everyone is glad that the system is in place now. A lot of users at the workshop today really liked the concept, which confirms feedback from the mailing lists. We also now have the first user contributed scenarios available on kde-files.org, so the community is starting to grow.

We also implemented base model support, so people can use precompiled base models. These are general acoustic models, not user dependent models. So now they don't have to install HTK to get stuff. We also created and uploaded a German model which is available on VoxForge.

And we introduced a new application, called Sam (Simon Acoustic Modeler), which is the professional tool to manage Simon speech models. It is not necessarily geared towards end users, but instead allows more direct control over what is happening with speech models.

We have two more applications - we introduced a special set of tools for large sample acquisition. We have three teams touring Austria right now, just recording voice, in order to produce a better standard model.

Troy: How has the Simon community grown in the last year? Do you expect that coming to Akademy will help grow the community for the next year?

Peter: I don't think Alex was part of the team back then, so we have one more part-time contributor. Other than that, we had kind of a huge push because of Akademy. The mailing list is active right now, and so is git which is directly related to Akademy. The accessibility BoF was very helpful too. I finally got to meet the KDE accessibility folks, which is nice since their mailing list is not very active.

Troy: Have more distros started to package Simon since we last talked?

Peter: Yes, we have an official repository on the OpenSuse Build Service, which was not created by me (which makes me very proud). We also had a request to integrate Simon into Vinux, an unofficial accessibility testing ground for Ubuntu. This is perfect since they have to create Ubuntu packages as part of this process; it should make it easier for Ubuntu users to get these packages in the future.

Troy: In the previous article, you made a comment about the state of Linux audio systems. In your opinion, has this situation improved in the last year, and if so, what has made it better?

Peter: Well sadly many of the problems still exist. The essential thing is that we switched to QtMultimedia, not because it is the right solution, but at the moment it's the best workaround. It still doesn't work well with PulseAudio, it still breaks in many configurations. Unfortunately, we still can't use Phonon because it lacks recording functionality. Someone is working on that right now through this year's Summer of Code. We want to switch fully to Phonon soon, and have a real solution instead of just jumping from API to API.

Troy: During your presentation, you used a standard English language model. Is the availability of this model a new development?

Peter: No, but that Simon can use it is new. This is the base model functionality that I was talking about earlier. This makes a lot of difference for users who are getting started.

Troy: Does similar data exist for other languages?

Peter: Yes, we have one for German, and there is some source material for other languages, but to the best of my knowledge there are no other complete (and usable) models yet.

Troy: How does the availability of standard models make installation and use of Simon easier (for users not requiring custom models)?

Peter: It doesn't make it easier, it makes it easy. It wasn't easy before. As I said in the presentation, we developed the first run wizard with the KDE Usability team. We managed to come up with a nice wizard that gets people started right away.

Troy: According to your presentation, dictation is still not supported in any real capacity. Are you following the VoxForge project still, and has there been any progress made?

Peter: Yeah, sure. The base models that we are using now are from VoxForge, and we compiled the German model for them using their data. I am still looking forward very much to working with VoxForge on dictation models; we're closer than we were last year. We are at very basic levels right now with, for example, the virtual keyboard being available.

Troy: One of the future goals mentioned in the previous article was integration with KDE's Get Hot New Stuff framework. Since this has now been successfully implemented, what new goals do you have for future releases?

Peter: One major point that keeps coming up is that we're trying to ensure that our scenarios are context aware. So for example, if you're using Amarok, the Amarok scenario is loaded while others are disabled - this would improve recognition.

We're actually working on two big projects right now. One of them is a benefit project to create a system to be used in senior homes as well as individual installations in community centers. It will include features like Skype control, TV and message passing between the administration and the residents. We use XBMC for this - XBMC and Simon are working together closely on this.

The other project is developing a robot for seniors. A lot of them do not want a care worker in their home, so in some cases where people may need only a little bit of help, this robot would be able to phone authorities for help and so forth. The little stuff. This is an official EU research project called Astromobile. This is not our project - we are just doing the speech recognition while a University in Pisa handles the project itself.

Troy: Now to the sticky licensing questions (again). How has the availability of standard speech models affected the longstanding HTK licensing issues?

Peter: Well, there has not been a direct licensing effect. You don't need to compile your speech model yourself (in English and German), you can just install the base model without training the speech model yourself. This means that you can set up your Simon system without the HTK.

We're in a unique position where if we decide to switch to Sphinx, which is free software, we actually expect an increase in recognition performance. This is due to the nature of HTK being a research project instead of a real world implementation.