[KDE Dot News]
 faq
 flatforty
 contribute
 subscribe
 configure
 search
 rdf

 main


  The Road to KDE 4: Strigi and File Information Extraction
KDE Public Relations and Marketing Posted by Troy Unrau on Wednesday 11/Apr/2007, @09:54
from the you've-got-to-dig-a-little-deeper dept.
After a short delay due to a heavy dosage of Real Life(tm), I return to bring you more on the technologies behind KDE 4. This week I am featuring Strigi, an information extraction subsystem that is being fully deployed for KDE 4.0. KDE has previously had the ability to extract information about files of various types, and has used them in a variety of functional contexts, such as the Properties Dialog. Strigi promises many improvements over the existing versions. Read on for more...

Strigi is a library that sits at a lower level than KDE. It is written in C++, and is designed to present a series of generic calls that a program can use to find more information about a given file or files. It is in no way tied to KDE except that the development version lives in KDE's SVN repository. It also has search capabilities, which are not really the focus of this article.

The Strigi libraries are used to get information from within files, such as the dimensions of an image, or the length of an audio clip, embedded thumbnails, number of lines in a log, source code licensing info or just to search a text file for a given string. Strigi has other advantages, as it can work inside compressed files, archives, and so forth seamlessly. In fact, it ships a few useful utility programs, called deepgrep and deepfind. These useful command line programs allow you to search for information within binary file formats as easily as using grep or find on plain text files. KDE is inheriting the same libraries, so we also get this unique advantage of being able to pull information out of files that are buried within binary formats, such as .tgz files. There is a prototype kio_jstreams powered by Strigi that treats archives like local folders, allowing you to visit /home/user/tarball.tar.gz/icons/ for example... This works great when you are using solely KDE integrated applications, but there are currently problems when mixing with other programs. For example, if you're browsing with Konq, and click on a file within a tarball, and you want to open it in the Gimp, well passing that sort of directory would obviously break the Gimp. So for the time being, this mode of operation is an experimental io_slave only, and will continue to be until these sorts of problems are solved. (The other problem is making a tgz or odp file behave as both a file and a directory simultaneously.)

There are many useful ways that Strigi can return data, once a query has been performed. For example, Jos notes: "The program xmlindexer is useful for extracting data from files in a very efficient manner. Because it outputs xml, it is easy to use from any program. Other search projects such as Beagle and Tracker would greatly benefit from using xmlindexer." The xmlindexer program is a binary, so programs can easily call it externally without having to link to Qt or use C++. That said, there are many ways to directly use the Strigi libraries...

The KDE libraries have had methods of extracting information (such as meta data via KFileMetaInfo) from files before, but in many cases they were either slow, or of limited functionality. With Strigi, we have seen as much as a several-fold increase in speed for extracting data from PNG files. I am not aware of any other speeds tests actually being performed, but the general impression is that it is much faster at retrieving file data than most of the previously existing methods.

So in KDE, there are not really any good screenshots to show Strigi in action, as it's really just a library. That's not to say that its effects will be invisible though, as things like the File Properties dialogs are already taking advantage of the Strigi backend to pull the data that was previously provided by KFileMetaInfo. Also, for things like thumbnail and other metadata that is being displayed in the file browsers, Strigi is planned to be used (or already in use in some cases) and preliminary results show massive speed improvements. But so far, this has had little effect on the actual KDE experience to the end user, at least in a visual sense. However, as more KDE subsystems become aware of Strigi, we should start to see more unique and useful uses for all the features that Strigi supports.

For example: One of the biggest benefactors of the Strigi work is NEPOMUK. According to Jos: "Nepomuk is a big European research project on enhancing computer applications to make them semantic and connected. Nepomuk-KDE is the work on a KDE implementation of the standards and ideas that come out of that project. I work together with the people of Nepomuk and especially Sebastian Trueg of Nepomuk-KDE to make sure our work fits together. At the moment Sebastian is writing [an] additional index implementation for Strigi that is better able to work with semantic data." This project uses a lot of metadata and other file contents (like the text of IRC logs, for example) to provide a easy to use search system for the desktop. NEPOMUK will undergo a name change before its final implementation is set.

So while Strigi does the actual digging through the data, other applications such as the Dolphin/Konqueror, the File Properties Dialog or NEPOMUK are the applications that will see the fruits of this work. At the moment, however, work is mostly focused on porting the previously existing KFilePlugins to use the new backend classes. For status reports on this effort, check out the Porting KFilePlugins Progress page on the kde wiki.

To learn more about Strigi, visit the website or join #strigi on irc.kde.org.



<  |  >

 

  Related Links
 ·   Articles on KDE Public Relations and Marketing
 ·   Also by Troy Unrau
 ·   Contact author

Thread Threshold:

The Fine Print: The following comments are owned by whomever posted them.
( Reply )

Over 40 comments listed. Printing out index only.
Thank you
by Lans on Wednesday 11/Apr/2007, @10:44
Thank you Troy for another great article about interesting technology behind KDE4.It has become a habit to read these series every week, and I was very happy to see this new article about Strigi today.

Once again, thank you, and keep up the good work.
[ Reply To This | View ]
No mix of files and dirs please.
by Debian User on Wednesday 11/Apr/2007, @11:03
Hi there,

please don't mix files and directories. You will create tremendous confusion.

I do see that I should be able to use tar://path/to/tarfile and file://path/to/tarfile and it sure would be nice, if there were a way to find their relation by means open a double-click, open action in Dolpin of KDE4.

If done correctly, up of the file browser Dolphin would switch to the file:// protocol again.

Unsolved, forever, is the nesting of IO-Slaves, isn't it? What if want to do tar IO-slave over ssh? fish://path/to/tarfile, can't be browsed with tar:// can it? The chaining of IO-Slaves would be nice.

Yours,
Kay
[ Reply To This | View ]
Comparison
by Diederik van der Boor on Wednesday 11/Apr/2007, @11:15
See the following comparison how efficient Strigi is compared to Beagle:
http://www.kdedevelopers.org/node/2639
[ Reply To This | View ]
KIO-FUSE accesses KIO slaves in non-KDE apps
by Bill on Wednesday 11/Apr/2007, @12:21
> For example, if you're browsing with Konq, and click on a
> file within a tarball, and you want to open it in the Gimp,
> well passing that sort of directory would obviously break the
> Gimp.

You can easily view all KIO slaves in non-KDE apps (such as GIMP, Firefox, OpenOffice, even commandline utilities) through KIO-FUSE. It works by mounting remote locations (or tar archives, in your example) into the root filesystem hierarchy:

http://kde.ground.cz/tiki-index.php?page=KIO+Fuse+Gateway
[ Reply To This | View ]
Fear of KDE 4
by Gummi Bear on Wednesday 11/Apr/2007, @12:30
With so many changes for KDE 4 I wonder how long will it take to make KDE 4 stable/usable.
I stopped using Konqueror because it crashes a lot when dealing with embedded multimedia. Kaffeine since a couple of months crashes when I open the playlist and a video is running... I hope we get a stable KDE 4 before jumping to an even cooler KDE 5.
[ Reply To This | View ]
new nepomuk name
by somecoward on Wednesday 11/Apr/2007, @12:38
i, for one, think the name Kumopen would be great. and it starts with a k \o/
[ Reply To This | View ]
support Aaron
by funnyfanny on Wednesday 11/Apr/2007, @14:18
i support Aaron on including Nepomuk in Kde 4.0 already - please see the thread on kde-core-devel

http://lists.kde.org/?t=117613635500003&r=1&w=2

i does not need to freeze the api , but please include it mandantory, not matter if it works on windows or not.
[ Reply To This | View ]
Try it out in Kubuntu !
by KubuntuUserExMandrake on Wednesday 11/Apr/2007, @19:39
If you are using Kubuntu, go ahead install it and give it a shot. I just did. It looks very, very promising (and already useful)

Great work!
[ Reply To This | View ]
kde innovation
by cies breijs on Thursday 12/Apr/2007, @02:45
this is a real nice kde innovation. this application nicely bridges the command line and the desktop.


i sincerely hope this can become a standard for all unix desktops as desktop search is getting more and more important.

hoooray for kde.
[ Reply To This | View ]

 
The Fine Print: The previous comments are owned by whomever posted them.
( Reply )

  "Man, that new web site is pretty nice." -- Miguel de Icaza
KDE®, "K Desktop Environment", "KDE Dot News", "got the dot?" and the KDE Logo® are trademarks or registered trademarks of KDE e.V. in the European Union, the United States and other countries. All other trademarks and copyrights on this page are owned by their respective owners. Comments are owned by the poster. The rest: Copyright © 2000-2008 KDE e.V. for The KDE Project. For further information or comments on this site, please contact the Webmaster.
[ home | post article | flat forty | subscribe | search | rdf ]