The Semantic Desktop Wants You

As explained on Userbase, Nepomuk is a promising new technology which aims to make the user's data easier to find. Not only through the now common search and indexing, but also by making use of more complex relationships between documents, contacts and all sorts of data. While the full potential of Nepomuk is still shrouded in mystery for most of us, for developers it is an exciting area of work where academic research and Free Software development come together. Nepomuk is looking for new developers, read on to find out how you can help!

The KDE team working on Nepomuk aims to bring the Semantic Desktop to KDE 4, allowing applications to share and respond intelligently to meta data about files, contacts, web pages and more. Let us make this short: Nepomuk is an important project for the future KDE desktop. Its goal is to get all the information available on the system to the user. You are receiving an email - Nepomuk should show you information relevant to related projects or persons or tasks. You look at images of a person - Nepomuk should have links to other images of that person or unanswered emails or events you met that person at. You open the video player - Nepomuk should propose to watch the next episode in the series you are currently watching.

These are all but examples of what Nepomuk should provide. It could all become reality. But for that more development power is needed.

Now is the perfect time to enter the world of the semantic desktop. We finally have a decent database back end with Virtuoso support in Soprano (instructions for testing it here). This not only improves critical issues like the memory footprint and the scalability. It also provides us with a range of new features. We can for example embed full text queries directly in SPARQL expressions. We can use aggregate functions such as COUNT or MAX. We can nest SPARQL queries. We can update data using the SPARQL update syntax. And so on. All in all handling the available data gets way more powerful and convenient.

At the same time the integration with Akonadi takes a leap. Akonadi already pushes all its contact, email, and event data into Nepomuk for us to consume and to enrich. This opens up many possibilities, ranging from tagging files, tasks and web pages with a particular contact to associating files with an event or linking a set of emails to a particular project.

Apart from that the Nepomuk playground is already full of examples and prototypes that only wait to be enhanced, re-factored, and reused.

All this information needs to be displayed and enriched via Plasma applets, application plugins, new frameworks, and most importantly your creativity. There are both simple and complex things to be done, from creating a search GUI to improving the data storage system or determining relationships between files.

If you are interested in the semantic desktop, in gathering, presenting and using this information, check out the Nepomuk project page to get an idea of what you could help with. Of course it does not stop there. You can implement your own ideas or work on the core components like improving file indexing and extraction of information. The team would love new blood and any kind of input is appreciated!

Join the Nepomuk mailing list and meet us on irc (freenode - #nepomuk-kde). We have the opportunity to be one step ahead of everybody else this time.

Dot Categories: 


Things like tagging files/emails to a particular project could really change how I work. At present I have a whole load of data files that I can often use in different projects simultaneously but I don't want to make copies for each one (they're big files). So they live in on folder under what I consider to be the main project, but that makes working with them in another project inconvenient. It would be great to be able to tag them and then find them easily or have meta-folders containing all the folders/files tagged to a particular project.

With Amarok and Digikam I don't need to care where my files actually are in the directory structure, because there are much more intelligent ways of organising them by metadata (artist, genre, date, subjects, locations). Nepomuk, I think, has the potential to bring this kind of useful organisation to other types of files (and much more than that).

So I really hope the help comes in let Nepomuk reach its potential.

Isn't this something Dolphin should already be a capable of?
It has tagging for some time.

... not to mention the recent innovation of symlinking ;)

But seriously, this is something I think we could all use. Not least because a resource that is marked as related to a particular project may not be a file at all, but a contact, an email or a remote URL.

We discussed this with Sebastian at the Akonadi meeting in Berlin last weekend. Our first thought is to use the tmo:Task class from the Nepomuk project as the definition of a 'project' - since you can decompose any project into tasks and from there into subtasks, conversely, a project is just a scaled up task.

Have a look at tmo:Task and see if you think it's missing anything.

I did think about saying "I know I could make symlinks to different folders", but I didn't. So of course, you're right :-)

Equally, I could symlink all my music files in to folders by genre or year. What I meant to get at is that basically I'm lazy and tagging is marginally easier and of course can be more easily preserved if I move inidividual files to another machine.

From a very quick look at tmo:task it seems to have the right kind of ideas. I'll have a bit more of a look and try and think more about what I'd really want, ideally.

It's so complicated now after a quick scan of Task that one should almost just make the semantic connection between obelisk, project, and elevating tag importance from a user POV. Let alone symlinks. A tag is worth spending the energy to see as a project in itself, differentiated from the historic purpose of an obelisk. We call such differentiation a project.

Files are easily elevated as projects that are effectively types of interaction. Tags are the way to get there, monikers that make sense ironically on the ball are hard to push in semantics because some demozen has a right to stand there with a cigarette in his drink and semantics off his tongue.

How do we see obelisk(s) (sem.) as opposed to sphynx (sem.) and pyramids (non-sem.)? We avoid such words as a matter for structural linguistics as professionals to show us the way through symlinks to manage our projections a little better on-disk as opposed to on-schedule. tmo:Task looks like it's proving nothing (in this description) from this professional demozen's point of view. As a reference it might prove its worth over time (not quite as old as symlinks) but not now.

Here is an idea (that I also suggested to N900 people):

* Provide more humane file handling
* Allow a user to locate files not by filesystem location but by tags, type and other metadata. Most important are the tags
* Allow applications to locate files/data by tags

Sample schenarios:
* Visit kde-look and download a wallpaper. Save it wherever you like and tag it as "wallpaper". After that, whenever you visit desktop settings you could be presented with all images in your disk(s) that are tagged as "wallpaper". Perhaps sort them by reverse date or have a view where most recent "appropriately tagged files" are listed and immediately find the last wallpaper.
* Download an audio file (e.g. mp3) and tag it as "music". After that amarok can automatically include all files that are tagged as "music"
* Download a couple of PDF files: One is tagged as PHD, another as KDE-tech and another as Request-form. Whenever you attempt to open a PDF file (from okular), you can lookup for all "PHD" files.

How to do it (as far as I can think of it):
* Be persuaded that file tags are essential in file handling.
* Introduce the concept of file type. This is required in many places. This does not have to do with "PDF" etc. It may be more convenient to have Document/PDF (a'la. mime-type).
* Introduce namespaces for tags. There can be a "System" namespace to serve some system activities, with fixed tags (System:wallpaper, System:KDM_Theme, System:Plasma_Applet, etc...).
* Provide a two-way file handling (open/save/etc) dialog: Imagine the well-known "open file" dialog with two tabs: One for "file-based" file handling and one for "tag-based" or "semantic" file handling. Thus we will be able to locate files either the old way or using tags (and other criteria)
* Extend core applications/settings to auto-lookup files by tags. E.g. the wallpaper handling.

by trueg

Let me start by pointing you to one of my blog entries: The smart file dialog targets exactly that idea.
I agree completely that this is the way to go. We also need a file browser that allows to browse documents this way. Of course the filter system is already nice but not really semantic. We can only express direct key/value pairs, not traverse the information graph that is Nepomuk.
And I agree with some of the commentators that the saving dialog should allow to choose the destination in addition to annotations.

One thing is important: as far as tags go we need to be on the same page. Tags in Nepomuk are merely simple strings which are used to annotate resources. In this case it would be more interesting to have ontology knowledge, i.e. have a wallpaper type which can be understood by many applications and desktops.

I noticed the same question was raised for digikam's metadata. It seems people have developed tag hierarchies as simple inheritance trees, given no other way to structure metadata and query it for entire sub-trees of tags. Some of these will be replaced by existing ontologies that describe the same structures. However, there will be other useful user-created structures that are not trivially replaced by existing ontologies. What will the KDE project do to support these? Will it be possible to define ontologies at runtime in the same way that a .ui file can be loaded to generate a UI? Could these be created semi-automatically from users' tag trees? What about ontology transformations between these ad-hoc ontologies and canonical ones as they are developed?

If we only offer a limited set of ontologies, I predict users will begin to project structure onto simple Nepomuk tags using '/' to preserve their investment in tag hierarchies.