DEC
15
2005

Linux Magazine: Busy Kat

For all the users wanting to better know how the Kat desktop search program works, Roberto Cappuccio explains the inner workings of Kat, the difficulties encountered during development and the future of this long awaited (and still under heavy development) piece of software in the article Busy Kat on Linux Magazine.

Comments

A KAT-io-slave would take the project a _big_ step forward. The benefits of an io-slave can be seen in the demo of kio-clucene (http://kioclucene.objectis.net/). It's an easy but very powerful way to embed the desktop search into the desktop.
I hope KAT's development moves forward at same speed as in the last months - keep up your good work :-)


By birdy at Thu, 2005/12/15 - 6:00am

I don't remember where I read this, but won't KDE4 use postgresql to store metadata information? I realize this is a minor detail, but I was quite happy about that choice, provided this will be transparent for the user (no need to set up pgsql manually: create a default db for each user, and ask for a password once when you install KDE?).


By Csaba Molnar at Thu, 2005/12/15 - 6:00am

There has been no such decision on the subject.


By Thiago Macieira at Thu, 2005/12/15 - 6:00am

Ok.

Well, I had the same impression. Probably from this article.

Newsforge: Updating KDE at the Appeal initiative - http://software.newsforge.com/software/05/12/06/2042232.shtml?tid=130


By anders at Thu, 2005/12/15 - 6:00am

The new architecture of Kat (the one which will be published with Kat 0.7.0, codename Lilith) is based on plugins and is therefore fully expandable.
You can provide plugins for both the Repositories (the storage layer, for example SQLite3, PostgreSQL, Lucene or even Reiser4 or XML...) and the Spaces (the information layer, for example FileSystem which indexes files, Communication which indexes emails and contacts, or Links which indexes the connections that hold between objects of the other spaces).

So, if KDE will incorporate a metadata layer, it will be easy to build a Kat Repository plugin for it.

Bye


By Roberto Cappuccio at Thu, 2005/12/15 - 6:00am

I can't see the PDF correctly with KPDF on KDE 3.5 :( the text is corrupted. With AcroRead it works perfectly.


By Davide Ferrari at Thu, 2005/12/15 - 6:00am

kghostview on kde 3.5 works, too


By hoirkman at Thu, 2005/12/15 - 6:00am

It works perfect for me on KDE3.5. Most likely you have a build of KPDF using the poppler PDF library. Depending on the version of poppler you are using, you get rendering accordingly. Older version are not particularly good, and is known to have problems.


By Morty at Thu, 2005/12/15 - 6:00am

Try a newer version of freetype. I had a similar problem with another PDF file and the nice guys over on the kpdf team pointed me in the right direction when I raised a bug. Updated version of freetype fixed the problem.


By Jon Scobie at Thu, 2005/12/15 - 6:00am

Also I was wondering, if two users choose to index the same dir, will Kat store it twice or once and use it twice?


By Hobbit HK at Thu, 2005/12/15 - 6:00am

For the moment Kat creates a repository for each user. We are planning to add the possibility to share the entire repository or single information spaces.
If you want to know better what repositories and information spaces are in Kat, please read the online API documentation at: http://kat.mandriva.com/apidox/
The documentation is under development but you will find some interesting information about Kat::Repository and Kat::Space.


By Roberto Cappuccio at Thu, 2005/12/15 - 6:00am

I like the basic concept of indexing the contents of your data, but in most of my Linux installs, the homedirectories of users are on a network drive. Keeping a copy of all the indexed file in a database (I'm assuming this will also be located in the homedirectory, but I couldn't determine that from the article) seems like a huge overhead on the network and server diskspace!

I could imagine that the index and cache is actually managed by a central service running on the network, not under the user's administration, but by the system admin. If it is not central or outside the network drive space, I would (as a sysadmin) have to disable the kat functionality entirely (which I will do when we switch to Mandriva 2006.x)

/Simon


By Simon at Thu, 2005/12/15 - 6:00am

Good observation. For situations like the one you describe, we are planning to suggest the use of a centralized database (PostgreSQL, MySQL, MSSQLserver, whatever) running on a central machine.
Every user will have his own repository and will only have access to it and to the repositories the other users will mark as shared.

Bye


By Roberto Cappuccio at Thu, 2005/12/15 - 6:00am

i'm looking forward to the day KAT is integrated into our desktop evironments...

maybe the Kuartet Superkaramba applet (http://www.kde-look.org/content/show.php?content=32541) can support kat...


By superstoned at Thu, 2005/12/15 - 6:00am

great to hear that lucene has been ported to Qt!


By ac at Thu, 2005/12/15 - 6:00am

How does this compare with Tenor? Are these complimentary or competing initiatives?


By Avdi at Thu, 2005/12/15 - 6:00am

Yep, Tenor is the one I want to know about. I don't know why there's so much fuss about Beagle and Kat, or why GNOME's Dashboard project seems to have died. That was really astounding in its utility -- a true killer app. At this rate, windows will have it (I believe they're working on it for Vista) before the Free Software community gets it from the drawing board and obscure projects to the everyday users' desktops. That's a real tragedy, since the kind of integration needed for Dashboard/Tenor is something open source patches should enable easily.

But, to answer your question from what little I've heard... Tenor is coming in KDE 4, and the Kat folks are working with them. I think Tenor is going to be another backend for Kat. Someone mentioned those backends above. Personally, I'm hoping it really focuses on Tenor technology, rather than watering down the possibilities of Tenor for a generic search system. After all, with no disrespect to the Kat team, Kat's own technology isn't really much more than find or grep.


By Lee at Thu, 2005/12/15 - 6:00am

Maybe you should RTFA, it says that Kat will provide an API so that Tenor can build on top of it as a layer because Kat is the perfect basis for Tenor (down the PDF it explains that).


By patcito at Thu, 2005/12/15 - 6:00am

Dashboard is not dead, it was renamed to Beagle.


By Anonymous at Fri, 2005/12/16 - 6:00am

> Dashboard is not dead, it was renamed to Beagle

But wasn't Dashboard (http://www.nat.org/dashboard/) much more than Beagle (http://beaglewiki.org/Main_Page)?

As I see it, Beagle is a search tool very similar to Kat. You have to search for your stuff.

Dashboard seemed to be tool that showed data which fitted the current context without the user activly searching for it.


By Christian Loose at Fri, 2005/12/16 - 6:00am

I really wanted to like Kat, in the absence of something like Tenor, but I'm sad to say that Kat was basically useless for me. When indexing things, it doesn't really do anything except say that it found it, and in what file. That's really not good enough, if you're trying to, say, search irc conversations for a discussion that happened five hours in, when you only remember a few keywords. Likewise, when I search PDFs for text, I don't want to just know that the text is in that file *somewhere*.

For IRC, it would need to display READABLE context, preferably in the normal IRC log format, and have a "Open" or "View" or maybe even a plugin-aware button like "View discussion", which opens the appropriate app in a highly-integrated way.

So, for instance, when I find an IRC log, clicking View might bring it up in Kopete's log viewer, already centered on that first conversation, and automatically jumping to the right place if I select another IRC search hit.

Likewise, if I select a PDF, I need it to open at the page that actually has that text. Otherwise, I just know that the 500-page PDF contains the phrase "secure network infrastructure". Which, honestly, I probably already knew.

I don't need a file search tool. "find" and "grep" do that. I need an information search tool, that brings up what I ask for, ready to use.

Please don't take this as criticism. I'm not trying to insult Kat -- nor to demand things. I would help if I could. I'm just hoping you can make it fit my needs, and maybe give us all a really great tool that will make KDE even better :)


By Lee at Thu, 2005/12/15 - 6:00am

> Please don't take this as criticism. I'm not trying to insult Kat

I like criticism, especially when it is constructive, and your post points out some of the most frequently criticized features (or the lack of them) of the actual version of Kat.

The new version, on which I'm currently working, addresses most of them. In particular, it will feature the Google-like "two lines preview" that shows 2 (or more) lines of the text where the searched words are highlighted.

To open a PDF right at the page where the words have been found, well, it depends on bot Kat and the PDF viewer. We can issue a command like "open that file at page x" but then it is up to the PDF viewer to actually show that page.

For this kind of things we will need collaboration from the authors of the applications.

> I don't need a file search tool. "find" and "grep" do that.

Well, if you really think that, continue to use them, but, as I said a thousand times on other forums, it won't work for the vast majority of file formats (like PDF, PS, XLS and the like) because they don't contain clear text.

Then, if the documents are saved in an encoding which is not the one you use on your command line (probably UTF8), you will not find anything at all.

Moreover, if you have Gigabytes of documents to search, "find" and "grep" will take hours to give you what you need.

So, please don't say that "find" and "grep" are equivalent to Kat. They aren't.

Bye


By Roberto Cappuccio at Fri, 2005/12/16 - 6:00am

You mention that with the current interface you can only search for a single word, but you plan on adding more advanced capabilities like AND, NOT, et cetera, later on. How about using the same sort of syntax for this as Google has? It has the advantage of being relatively simple, and probably the most well known (or should I rather say, the least obscure) -- out of the people out there who know any kind of search syntax at all and aren't programmers, this is probably your best bet. I've already added support to amaroK, as well :)

Basically, words are ANDed by default unless you put an OR between them yourself, to exclude something you put a - before it, use ""s to match exact phrases, and you can search in specific fields/attributes with field:word. GMail has basically the same thing, but you can use parentheses to group things as well.

For an example, bats OR "flying mice" -baseball site:wikipedia.org
would search wikipedia for either bats or "flying mice", but not for baseball bats.


By Gábor Lehel at Thu, 2005/12/15 - 6:00am

This is exactly what I'm planning to implement. Thanks for pointing it out.

Bye


By Roberto Cappuccio at Fri, 2005/12/16 - 6:00am

Take a look at amarok playlist filter. It already supports Google syntax and may be useful to your needs.


By Davide Ferrari at Fri, 2005/12/16 - 6:00am

Wow, that's pretty cool! I never knew amarok had that feature.


By ciasa at Mon, 2005/12/26 - 6:00am