Semantic Desktop and KDE 4: State and Plans of NEPOMUK-KDE

Liquidat has posted a nice overview of the technology known as NEPOMUK, a part of KDE 4. An excerpt reads: "Nepomuk-KDE is the basis for the semantic technologies we will see in KDE 4. Sebastian Trüg, the main developer behind Nepomuk-KDE, provided me with some up2date information about the current state and future plans".

Dot Categories: 

Comments

by Martin (not verified)

To me, the hard problem here always was how to share and transfer the metadata between users and machines. For instance, suppose I am using more than one machine; would the Nepomuk information created on one machine even make sense on the other? Would it be possible to sync the databases between machines? Is it possible to distill out a meaningful, privacy-filtered subset of the data and send that to a friend? What happens when I reorganise all of my data using non Nepomuk aware tools?

I see in the article that there are some plans to address these issues, but it seems they have not been solved so far. Any thoughts?

Just a KDE user here wanting to say that is a good question

I hadn't thought about it but the problem you bring up should be thought about. Though I don't know, I assume that any non-nepomuk enabled desktop won't support tags made with a nepomuk desktop. Perhaps if a nepomuk is brought to all of the free desktops than atleast we can be assured that it would work on all the free operating systems.

In a way, KDE is the first largescale test. If it works out, perhaps it'll be adopted elsewhere (gnome, xfce,... perhaps even a proprietary OS like OSX?).

Actually, as far as this goes, I was somewhat certain that OSX already had a framework in place for the support of arbitrary metadata. I'm not sure on details, but a friend of mine was talking about it at length one night

by Geoff Hutchison (not verified)

For Mac OS X, Apple's Spotlight system is very similar, but not exactly the same. You can write metadata importer plugins which can add, index, and search arbitrary metadata.

However, programs like Google Desktop for Mac OS X show that it's possible in some ways to integrate other systems with Apple's Spotlight metadata system.

http://developer.apple.com/macosx/spotlight.html
http://developer.apple.com/documentation/Carbon/Conceptual/MetadataIntro...
http://developer.apple.com/documentation/Carbon/Conceptual/MDImporters/i...

by cies breijs (not verified)

i'm not a semantic web (SW) expert, but i had to study it for school (uni). but for all i know the SW technologies are build with exactly what both you guys ask for: distributed and sharing oriented.

SW basically allows computers to understand that something means, and how that relates to other information. it's all about describing (annotating) data in standard, computer readable way.

by ThePope (not verified)

That sounds like pure and plain xml.

RDF/XML is one format for it. Internally, it's probably going to look more like the Notation3 (N3 format) -- just a list of "triples": lines like "uri1 relationship uri2". For instances, you might declare relationships like "http://x/y photographer_of http://a/z", or "googleearth://postcode location_of ipinfo://yourserver") or "http://hongkonggenerics.com manufacturer_of companyservers://missioncriticalserver1".

As an ancestor post said, this is pretty much perfect for exporting/importing/otherwise sharing info. You can easily create queries based on this data, like "? photographer_of ?", to get a list of all photographers, or "? photographer_of http://companyserver/publicphotos/*" to get a list of all photos published by your company. Then, you just need to provide that list to others in some way. Depending on how its implemented, it might also be possible to mark certain namespaces as private, but make the rest available, so that anything referring to objects such as "myborrowedmp3collection://*" or "topsecretprojects://*" or just "smb://" gets filtered, but everything else is made available. Likewise, and probably more safely, the opposite could be true, with only public namespaces made available.

Interestingly, let's say you have a kde io plugin that understands URIs with unique hashes, and deferences those to the appropriate files: something like "md5://number". By publishing this on some shared site (say nepomuk_repository.kde.org), then every KDE user with that file could automatically gain all the (non-filtered, public) tags of information that any other participating KDE user contributes. So, some KDE user in taiwan might mark set a song attribute such as "amarok://performed_by amarok://artist/Sarah McLachlan", and everyone else's desktop would suddenly know this.

For general queries, let's assume Wikipedia will take up the (already very functional) Semantic MediaWiki Extension at some point. Then, it'll be possible for your desktop to ask Wikipedia for all sorts of complicated information, like "countries with a population of more than 1,000,000, but less than three internet providers", or, for a more basic Unix utility, "languages that include the characters X, Y, Z, but not A". Or, for a person in need of medical help, they might consult a national medical database, along with a blog site, asking for "doctors within coordinates A,B and C,D who specialise in earache and who no one called a sadist". Within an organisation, lots of useful queries, like "people working on project X, who work over lunch" would be possible.

No one's saying the file format (be it XML/RDF, N3, CSV, or something else) is revolutionary (although, in the relative simplicity of N3/RDF, they do make some advances, I suppose). The trick is in taking all these information sources, combining them into a huge database of triples that performs well, and designing the right queries, the right interfaces, the right amount of sharing, and the right security features, so that your desktop "knows" more than it used to, and can work with other systems that know more than they used to, without being bogged down by the terabytes of new data we're soon going to be using for this.

Of course, this all depends on your own/others' ability to organise information, but it's all coming together, from other projects online. This WILL take off, and it will almost certainly be the REAL Web 2.0, that people actually notice, like they noticed Web 1.0. KDE *must* be part of that, and I'm very glad to see it's going to be there.

I DO hope KDE's/NEPOMUK's not going to be limited to simple things like tagging and searching files though, much as I want to see KDE have those features. At the very least, I'm hoping to see what GNOME's (now abandoned, for some insane reason) hint-based system did: let applications actually share knowledge in real time, like "user is working with a document that has subject X" and "Oh, I have files related to subject X". It's unclear whether NEPOMUK will actually allow the kind of things described above. The technology certainly does, though, and Nepomuk is claiming to advance it, as I understand things.

Neopomuk cooperates with freedesktop.org, however I don't know how good this cooperation works, and what standards will be defined there or if those standards will address those issues.

by Frederik (not verified)

On the Mandriva Club there is also an interview with nepomuk-kde developer Sebastien Trüg: http://club.mandriva.com/xwiki/bin/view/Main/TruegInterview

by Ben (not verified)

On Liqudiat's blog there is quite a lot of support for using xattributes, yet it seems like this option won't be used. Can anyone explain why?

by superstoned (not verified)

- not all filesystems support it
- you'll need a database anyway to be able to search through it

I dunno what the performance is, either...

by Ben (not verified)

- not all filesystems support it.

Most of the well used ones support it.

- you'll need a database anyway to be able to search through it

Well yes, but xattributes is about making sure the tags move with the file. Not searching.

by ac (not verified)

metadata you can't search is completely useless.... that's the point of nepomuk, isn't it? finding and connecting stuff through metadata. therefor you need a central index. an index scattered through the whole filesystem is useless. that's why everyone is working on something like strigi...

by Mark Williamson (not verified)

Sure, but you could have some kind of minimal metadata attached to the file to differentiate it from other files, and the full metadata in the index. Otherwise how would you cope with:

echo Hello > ~/myfile

# I go to Dolphin and tag myfile

cp ~/myfile ~/myfile.bak
mv ~/myotherfile ~/myfile
mv myfile.bak ~/myfile

Unless you have some way of disambiguating files *other* than their name, you're going to have issues confusing the metadata of these two different files.

Now, there are various ways you could handle this. xattrs would be the obvious one to me, but I guess you could also have others.

by ac (not verified)

that only works if you want to find the metadata of a file. whats with the other way around? i want to find every file i got per mail.

its quite simple: if you move the file you break the index. the index needs to be updated everytime a file is moved.

by Ben (not verified)

>that only works if you want to find the metadata of a file. whats with the other way around? i want to find every file i got per mail.

That's why you have the same data stored in the file and in the database. Having the metadata in every file means that all applications automatically keep the metadata intact without modification. Having the metadata in a database allows fast searching.

>its quite simple: if you move the file you break the index. the index needs to be updated everytime a file is moved.

but the database is broken every time you move a file even if no metadata is stored in the file.

The database will have to include the location of every file so when you search for files based on metadata you probably want strigi to tell you where to find the file, this means the location of the file has to be in the database and updated every time a file moved.

by George (not verified)

Why have "files" in the first place? Why "copy/move them around"? They are just
sequences of bytes. Why should I have a file manager? Isn't that what we want to
replace? The only reason is different physical computers on a network. But we could imagine even that to be irrelevant in some, not so far, point in the future.

by Ben (not verified)

You have the meta-data in the file and in a database.

It goes in a file to ensure that the file keeps the correct metadata even after mv, cp, dd or being E-mailed. Etc.

It goes in a database to be searched.

by Thomas (not verified)

> or being E-mailed ?

than the metadata needs to be stored inside the _file_. The filesystem is of no help here... or do you want to email your filesystem?

by ac (not verified)

but this doesn't solve the whole problem. if you move a file you still have to change the index, otherwise you could only find the oldlocation of the file. so you don't gain much.

so if you have to change the index everytime a file moves anyway, there is no real gain from storing anything with the file.

also, you don't need a filename or an id to track files. look at modern version controll systems like monotone or git. the identity of a file isn't an id, or a name. its the content - so use a hash. that would automatically solve all copy problems.

the only remaining problem would be tools that alter the file somehow. that should be solved by nepomuk integration into all applications. for legacy apps you could store the location of a file too. so if you overwrite a file, the index should automaticaly "transfer" the metadata, if not told otherwise through the nepomuk api.

so with this in place, the only scenario that could break the data-metadata relationship would be legacy applications (apps without nepomuk support) which create "copies" of files with new content (like converting images).
but that's a case you can't do anything about.

by Richard Moore (not verified)

You also have to remember that most apps move the old file to a backup file (eg. xx~) and write a new file. Unless the app knows to copy the meta data it will be lost at this point.

by accumulator (not verified)

The filesystem _is_ a database. A metadata supporting filesystem can maintain its own indexes. Why put the file and metadata relationship on such a high level if you don't need to? There might be considerable overhead space-wise, but with 1TB harddisks getting mainstream soon this should not be a big problem.

If you put this indexing responsibility on filesystem level you get automatic, default nepomuk support for low level commands like cp and mv.

If you want this information to 'cross over' non-metadata filesystems you can use higher level tools. I could see a project like BasKet fit such a role for example.

Regarding hashes to bind relationships, I think this is not so useful on a filesystem. reading the full content of a couple of ISO files or a large mp3 collection just to get the hashes seems a little inefficient to me. And a hash still isn't as uniquely identifying as a URI.

by Sebastian Trüg (not verified)

I just started a FAQ page for Nepomuk-KDE. The first question I answer there is the xattributes one.
You can find the FAQ at: http://nepomuk-kde.semanticdesktop.org/xwiki/bin/view/Main/FAQ

by Troy Unrau (not verified)

Thanks :)

by Ben (not verified)

Thank you, please could you use this for the second question:

"Will my file lose the metadata associated with it if I use generic rather than Nepomuk specific tools to move files around (FTP, mv, cp, a non-kde file manager, firefox upload. etc)

by kwilliam (not verified)

I'll be curious to see how it turns out. WinFS was supposed to have a metadata based filesystem, but WinFS is vaporware at this point. If KDE beats Microsoft to the relational/metadata/integrated-search desktop, I think a lot of businesses might suddenly become interested. I haven't gotten a chance to try Nepomuk, but I really like Strigi - it's freaking fast (compared to Beagle, which I tried previously) and it doesn't have security holes like Google Desktop Search (which I haven't tried, because of the constant "A new zero-day hole has been found in Google Desktop!" stories on Slashdot).

by Andre (not verified)

The file system argument is a good one: Isn't is all about filesystems? I think distributors need to think more about filesystems. For our home partition a crypto file system should be standard. I don't know whether user space solutions make much sense.

by furester (not verified)

I posted an italian translation of this article:
http://xenos.altervista.org/blogs/index.php?blog=3&title=desktop_semanti...