[KDE Dot News]
 faq
 flatforty
 contribute
 subscribe
 configure
 search
 rdf

 main


  FOSDEM 2005: Desktop Search Interview
Community and Events Posted by Jonathan Riddell on Wednesday 23/Feb/2005, @05:04
from the they-look-here-they-look-there dept.
The schedule for the KDE developers room talks at FOSDEM is now online. Our final interview with the speakers is with Scott Wheeler who will be giving a talk titled "KDE 4: Beyond Hierarchical Data, The Desktop as a Searchable Web of Context". FOSDEM is this weekend, see you there.

Please introduce yourself and your role in KDE.

I feel like I've been asked this question enough times that I should have an exciting answer by now. But well, I wrote JuK and TagLib as well as a couple of other small applications in KDE CVS and do some work on a handful of things in kdelibs and elsewhere across KDE.

What kind of search capabilities do you think a modern desktop should have?

Well, I think I'd like to step back a bit first and look a little at the problem — and the problem isn't a lack of a search tool, the problem is that it's hard to find things. Search tool or no, all of the ideas flow from the idea of solving the problem rather than just creating a new tool. So, in a sense, I don't think a modern desktop should have a search tool; I think a modern desktop should make it easy to find stuff — we're then left with how to get there.

And I suppose with all of the buzz around search tools these days people have a much more concrete idea in mind when they hear about searching on the desktop. But such wasn't the case when I started kicking these ideas around a while back. Spotlight was announced a few days after I'd submitted my abstract for the KDE Developer's conference, Beagle was relatively low profile, Google for the Desktop and its successors hadn't entered the scene yet, etc.

So, I think — fundamentally "what sort of search should the desktop have" is almost the wrong question. "How should we make it easier to work with the data we accumulate on the desktop?" is closer to the right question. I think search is just part of the answer.

Where did the idea of integrating a search capability throughout KDE come from?

Well, a few things actually. It mostly came from not being able to find things and asking some fundamental questions about how we organize and access information on the desktop. The first step — and this is tied up with the first part of the name of both this talk (which is related to the one that I gave at Linux Bangalore) and the one at the KDE conference this summer — is that hierarchical interfaces simply don't make sense in a lot of cases.

When I started looking around for examples of how this had played out in other domains of information, the most obvious example was the World Wide Web, where we've already moved from hierarchical interfaces to search based interfaces. It seemed logical that we could learn from that metaphor.

On the technical side of things I'd just written the listview search line class (used in JuK) that's now fairly prevalent in KDE that makes filtering of information in lists much easier, so that played into things too.

What do you think of other search tools such as GNOME's Beagle and Google's Desktop Search?

Well, they're fundamentally different in scope. Again, right now the term "desktop search" actually means something; that wasn't really true when I started working on these ideas this summer. So while there are some things in common, they're really pretty different approaches.

Beagle, Spotlight, Google for the Desktop, and their relatives are more interested in static indexing and search through that information. That's kind of where I was at conceptually early this summer when I coded the first mock-up. Since then however the ideas have moved on quite a bit and I think we've actually got something rather more interesting up our proverbial sleeves. (I should note however that I think the Beagle group is doing fine work, but it's something pretty different from what I'm interested in.)

The first difference is that this is a framework, not a tool. Beagle has some elements of this, but it's still not integrated into the core of the desktop. Google for the Desktop is mostly just a standalone tool from what I know of it. Honestly I think it's really below the level of innovation that I tend to expect from Google.

What we're now looking for in the KDE 4 infrastructure is a general way of linking information and storing contextual information — that information can come from meta-data, usage patterns, explicit relationships and a host of other places.

There won't be a single interface to this set of contextual information; we'll provide some basic APIs for accessing the components in KDE applications, but we're quite interested in seeing what application authors will think to do with it. Really I think they'll surprise us.

We're looking at everything from reorganizing KControl to make search and related items and usage patterns more prevalent to annotating mails or documents with notes to reworking file dialogs. Really the scope is pretty broad.

Do you think Free Software solutions from KDE and GNOME can compete with the likes of Google and Microsoft?

Sure. I mean — I don't think the ability to compete with commercial players is significantly different with desktop search than it is with other components of the desktop. And honestly I think we've kind of got a head start here.

Has there been any progress on planning or coding search into KDE yet? Is anyone helping you? What problems are you facing?

There have been a number of cycles through some API and database design sketches. But right now we tend to write code and as soon as it's done we've realized the flaws in it and start rewriting. This will probably continue for a while, but I think we'll be able to have something pretty useful in KDE 4.

There are a number of folks involved in discussion of these issues from various sub-projects inside of KDE. Thusfar it's been mostly myself and Aaron Seigo banging on the API, but others have contributed to the discussions.

I think the biggest problem that we're dealing with is moving from the abstract set of ideas that we're working with into real APIs — trying to keep things general enough to stay as extensible as we'd like them to be, but not so lofty that they're convoluted and useless.

What technologies do you plan on using, e.g. which database?

Well, we've gravitated towards Postgres, but mostly because of licensing. Other than that, well, uhm, we're using Qt. The Qt 4 SQL API seems much improved, so I've kind of been mentally stalling on really finishing up the current code until I can just work with that since otherwise everything would just have to be rewritten in a few weeks.

Is the KDE search tool likely to be cross desktop compatible so we could have a common base with Gnome?

Well, again, this really isn't about a "KDE search tool" -- and the chances of it being GNOME compatible out of the box aren't particularly high. That said, as the data store will just be a Postgres database and ideally we won't have to use too many complex serialized types, there wouldn't be a reason that a GNOME frontend couldn't be written. But generally speaking I'd like to get the technology laid down and then see if we can convince others to adopt it rather than the other way around.

What does the project need most now?

Time. And I mean that in a few ways — we need time to finish fleshing out the ideas, time to let the stuff mature inside of KDE and well, the couple of us working on it could use more time for such. But really as most of the framework for things like metadata collection and whatnot are already inside of KDE this won't be a huge project from the framework side. What will take a good while will be porting over applications to use it where appropriate.



<  |  >

 

  Related Links
 ·   Articles on Community and Events
 ·   Also by Jonathan Riddell
 ·   Contact author

Thread Threshold:

The Fine Print: The following comments are owned by whomever posted them.
( Reply )

Over 40 comments listed. Printing out index only.
Video of presentations?
by LB on Wednesday 23/Feb/2005, @06:32
Is it possible to create a video of the KDE-related presentations?, unfortunately I'm not able to go to fosdem, but I'm very interested in the presentations.
[ Reply To This | View ]
Choice of database
by AC on Wednesday 23/Feb/2005, @06:33
Have anyone looked at Derby.
http://incubator.apache.org/derby/
It is licensed under Apache license, Ansi SQL, portable (pure java), has a small footprint and is supposedly very easy to use. So it seems to fit the bill perfectly.
[ Reply To This | View ]
Lucene?
by Michael Schuerig on Wednesday 23/Feb/2005, @06:42
Has anyone of you had a look at the Java search framework Lucene (http://lucene.apache.org/java/docs/index.html) and its C++ port, CLucene (http://sourceforge.net/projects/clucene/), in particular?

Lucene is an excellent, sophisticated and yet easily usable framework for indexing and searching. It might be usable as is or for inspiration only.

Michael
[ Reply To This | View ]
search tool ??
by Jakob on Wednesday 23/Feb/2005, @06:48
And what is it all about? A kind of dashboard?
[ Reply To This | View ]
Search API progress?
by ac on Wednesday 23/Feb/2005, @07:11
Do you do the discussion on some particular mailing list or do you have some place where you show the current drafts (wiki?) or is a proof of concept already available in one of the numerous kdenonbeta modules?

I'd think catching all the metadata kfile reads out for ages already would be an excellent start. ;)
[ Reply To This | View ]
why postgres?
by Pat on Wednesday 23/Feb/2005, @07:44
does this mean we'll all need a full installation of postgreSQL ? isn't that a bit heavy? why not sqlite or mysql (i know that they're GPL (not sure about sqlite)but they're faster and lighter than postgres which is great but maybe a bit too much). Just because MS is going to use some kind of reworked sqlserver with winfs on longhorn doesn't mean we should do the same with postgres :)
[ Reply To This | View ]
reiser4?
by me on Wednesday 23/Feb/2005, @10:04
just thinking...

I know that using reiser4 should probably not be a requirement to using the !"search tool", but would it make sense to create a reiser4 plugin to be used with your ideas? Maybe you could store some information not in the database, but right with the files. One could argue that this is where the information belongs: in the filesystem.

IIRC, Hans Reiser said that whenever you use a database, its because of the shortcomings of your filesystem, and the now-released reiser4 is supposed to fix that.
[ Reply To This | View ]
What I want.
by Derek Kite on Wednesday 23/Feb/2005, @19:35
The discussions about what DB backend really are irrelevant. Even the front end, user interface is not really the most important. The middleware, what data is archived and indexed, and how contexts and patterns are matched is the key.

I want something that recognizes contexts of activity. My work patterns are usually by blocks; I sit down and write and assemble the digest. I sit and read my favorite blogs. I read and sort my email. Those are the regular blocks. Then the projects that I work on, ie taxes, planning a trip, researching a specific subject, work tasks such as proposals and product research, etc.

For example, in june of last year I was researching travel in europe since my daughter was travelling there, and I needed to figure out how to her get somewhere. I found interesting sites, some helpful emails came in, including correspondance with my daughter. There was a pattern to that activity. Say I want to arrange a trip for myself and want to find all those sites I found helpful. So I start looking, and keywords london, paris, europe, airline, low-cost come up. Same keywords, same context. The indexing/data retrieval system that recognizes the context, suggests how to replicate the previous context.

It isn't simply data that is indexed, but time, duration, frequency, context, what application. I can search my datafiles quite easily with grep. But I can't for the life of me remember what tax filing software I used last year. Or where that interesting blog on the NHL strike was. The only time recently where I remembering wishing I had an index of a bunch of data files is when reading product documentation pdf's on a cdrom where the filenames were 6 digits.

I don't want something that tells me I have Results 1 - 10 of about 220,000,000 for linux. I know I got thousands of references to KDE on my hard drive. I want a maximum of 20 selections based on the context I am working in.

This would obviously entail hooks into the various data streams. And some kind of realtime archiving and pattern matching. And possibly background data mining. An api is best since applications sometimes know the best way to work with the data that they produce.

This is neat stuff.

Derek
[ Reply To This | View ]
What I need/want and I don't think that it is SQL
by James Richard Tyrer on Wednesday 23/Feb/2005, @20:38
Perhaps I am missing something here, but what I want to start with is that I have a directory with a bunch of HTML files in it and I want to be able to search them for content just like Google searches the web.

Will you project do this? or am I talking about something else?

It seems like a KDE front end for ht:/Dig would do what I want.

--
JRT
[ Reply To This | View ]
Wonderful (and some suggestions)
by jameth on Wednesday 23/Feb/2005, @20:44
"that information can come from meta-data, usage patterns, explicit relationships and a host of other places"

Whenever I talk to people or look into the stuff out there, they only seem concerned about meta-data and full-text search! I love that you are planning for both usage patterns and explicit relationships to be included.

Also, on that note, I hope you are considering how the data will be inputted by the user. It can be extremely useful if the user input method allows for them to understand the organizational system without adding complexity. The only model for this that I've found to be useful is that of categorization.

The user has categories which they define and place files into, allowing a file to be in as many categories as they desire. Then, the system can use those categories as a good way to narrow searches. For example, I would categorize everything into at least one of four categories: Work, Personal Work, Entertainment, or Belonging to Someone Else. But, at the same time, something in any one of those categories could be in several others. For example, for myself, a lot of all those categories would be in: Writing, Gaming, and/or Art.

With a good categorization system (maybe visualize it as a set of directories with check-boxes to determine which categories it goes into) I could swiftly and easily place a file while saving it, at least as quickly as I can organize into directories right now. And, if I didn't categorize something right away, it could be automatically be tossed into a category such as 'unsorted' or whatever, so I knew that I hadn't organized it yet.

Further, this is a portion of the organizational system that might be representable in a real filesystem, which means the save dialog wouldn't be completely useless. I asked someone who knows more about file-system performance than I do, and they said that it was perfectly feasible to have directories for each category and hard-link files throughout them. They even said there shouldn't be any performance issues if you used a modern OS. And, a search organization system which can also be somewhat used from a standard browser would be nifty.

And, to go back to my original point, cool! I hope the entire system can be gotten working for KDE 4.0, because it sounds awesome.
[ Reply To This | View ]
Very Interesting
by jesusfish on Wednesday 23/Feb/2005, @21:01
I remember hearing Nat Freidman at RealWorld Linux last year speak on something similar, and he showed an app that demonstrated this (what may actually be Beagle now, I'm not quite sure). It would be a really big step to have technology and tools like this I think.

The whole concept resolves around connecting ideas, whether they be data, programs, time, etc. I think Derek hit it right where it's at. When I use my computer, my actions are all motivated by thoughts. I want to know about this, I want to do that, etc. Imagine how nice it would be to connect everything on your desktop relative to a particular thought. An example, say you're working on a wesbite...you could essentially find every program, file, search, etc, that is related to that at one time. It would be like telling your desktop that you want to work on that site, and everything you need is neatly retrieved and organized for you. That is convenience.

Hopefully I've got this whole concept correct, or I sound quite dumb.
[ Reply To This | View ]
remember some of the architecture of BeOS?
by Ferdinand on Thursday 24/Feb/2005, @00:24
Not that a little anecdote about BeOS would lead us anywhere, but since ReiserFS4 was brought up, it inevitably reminded me of BeOS and its radical goals to build an OS around a, what may be called, database driven storage paradigm rather than the hierarchical organization of file systems. Something that was tried several times, technologically superior but never widely and successful disseminated - you may be forgiven for thinking that there are many software and hardware de-facto standards out there that had better not come into existence.

That much said, it may be helpful and also a bit insane to encourage such efforts that implement more effective solutions by designing them with a user centric focus rather than a feasibility driven approach in mind. Disseminating such new systems may require overcoming seemingly insurmountable thresholds imposed by incompatibilities of existing applications with previous APIs, concepts, designs and last but not least architectures.

The ineffectiveness of machines is typically the result of fundamental design flaws. So, beware and balance the requirements to befriend software evolution!
[ Reply To This | View ]
Windows attempts at similar objective
by Dan Housman on Tuesday 01/Mar/2005, @05:52
I found this article/project interesting. We are a small software company working on a Windows based solution similar in scope. The product is called Viapoint (http://www.viapoint.com). I doubt the true Linux folks will want to play with it but it could be interesting to keep track of how the products evolve across operating systems. We have been calling the product a Smart Organizer and are looking to avoid building desktop search components by calling on Google's Desktop Search APIs or an equivalent so that we can focus on building the context sensitive part of the application as well as functionality to help a user actually work. You can download for free if you want to check it out. I'll have to get Linux or watch videos to see this tool.
[ Reply To This | View ]
Desktop Search
by Chris on Thursday 23/Mar/2006, @20:29
Anybody ever looked at Kat Desktop Search project?

http://kat.mandriva.com/

What's everyone's opinion on using this technology instead of a completely new one?
[ Reply To This | View ]

 
The Fine Print: The previous comments are owned by whomever posted them.
( Reply )

  "My hobbies include, well, coding... and uhm... that's about it." -- Daniel M. Duley
KDE®, "K Desktop Environment", "KDE Dot News", "got the dot?" and the KDE Logo® are trademarks or registered trademarks of KDE e.V. in the European Union, the United States and other countries. All other trademarks and copyrights on this page are owned by their respective owners. Comments are owned by the poster. The rest: Copyright © 2000-2008 KDE e.V. for The KDE Project. For further information or comments on this site, please contact the Webmaster.
[ home | post article | flat forty | subscribe | search | rdf ]