JAN
8
2003

A New Document Management System

newdocms is a proposal for a new and radically different way of managing your documents in KDE. It is a move away from the now over 30-year-old hierarchical file system towards a meta-data-based document retrieval system. A 0.1 preview has now been released along with a description and screenshots (typical, newdocms, save, open, results -- the GUI isn't all that pretty at this point). Although not yet ready for production use, newdocms has the potential to really change the way users interact with their computer. Herein lies a challenge to the Open Source community in general, and KDE in particular, to take a step ahead of the competition in a truly innovative way. Should newdocms be a standard part of a future KDE release?

Comments

Maybe a solution with redundancy checking on a "graft" or "stow" like basis could help. Some overhead would of course be a consequence.

-Mitch


By Mitch at Fri, 2003/01/10 - 6:00am

This is actually a very user friendly concept but only works if there is a known "base system" with a known set of installed libs.

One approach is described in http://klik.berlios.de/architecture/

It would indeed be nice if an action could be attached to a KDE folder to launch an app instead of opening the folder.


By Jo Edwards at Thu, 2004/05/13 - 5:00am

There are good points on both sides here because there are solid pros and cons. People are by nature inert and will stick with what they know long beyond rational explanation. Some people have very few documents too. Newdocms has advantages and flexibilities that look very useful and it addresses nearly all concerns. But as good as it looks your bleeding edge early adopter tech types will still want their docs in an HFS and with clear names... so I see a limited adoption of this, however good the idea is. Many people have mentioned standard attributes too...

Here is how I believe this could best be implemented and it would take a little more work but it would require less discomfort while providing the best of all worlds and perhaps adding some value. First, use the HFS. Make the metadata an extended attribute of the file name. It is possible to reference this with paths, though it would be nice to have a low level daemon update the database with moves and deletes. Data could be entered in the file save dialog. To enforce the metadata concept a .metarules file could provide information to the file save dialog about what the directory owner requires a string in or provide dropdown lists.

Indexing could still occur from a central database as is currently indicated or it could add a parameter for where in the file tree this must exist to narrow the search. The metadata could be stored in a hidden folder such as ~/.filemeta and communications systems put in place for servers to register their databases to the user who could also select whether to include server and local in a search. Much of this would happen below KDE so it would be usable with very little modification by GNOME and others.

This is a more ambitous and complicated solution than proposed but it would work. newdocms purists could configure save dialogs to automatically construct a name for a file. Intransigent HFS users could still realize a benefit by setting required field flags in data directories to remind them to create document search terms. People who want nothing to do with it need never look at it, interoperability is preserved and other desktops can adopt it easily. It is possible to have "File Open" and "Meta Open" dialogs or a directory/meta open dialog. Everybody wins. I can now use either or both whenever I want with any application because I get it automatically with the KDE standard actions.


By Eric Laffoon at Wed, 2003/01/08 - 6:00am

Sorry, could't test it for now (so you may stop reading here)

I agree with the idea of a better organisation of stored files and (regarding my mom) know the problem of overfull desktops and unorganized harddisks (but "Search" is so powerful).

Adding special information to documents is a good (and important) idea for a better organisation. The problem I see is, that everyone has its own way of storing data - and this doesn't seem to be solved with this approach.

Again my mom's desktop: ever tried to find a special file there? No chance. If you ask where it is you get the answer seach for it (F3)... but what to look for?
Not only "wrong" interpretation of HFS-Trees but wrong interpretation of Keywords is possible. A reviewer would perhaps interpretate "Auther" as original auther he writes about but not himself - perhaps because every document on his computer has been written by him.

Next problem: where to know from, if every document has a valid author? Metadata is not NULL? Metadata has been updated to the document content?

There are many things not only depending on open/save dialogs but going much further into the application. A word processor e.g. could answer the question if you have an artical, a simple note, a telefax, slides itself.
Music (mp3, ogg) has its own system that should be read when saving files from your favorite browser.

So why that?
Because I believe that it is not only the HFS that doesn't provide a good possibility of organisation but also the user, that doesn't want to spend time on it (see full desktop mentioned by David).
"Work's over so just save anywhere and try to reach the bus."
I prefer the mentioned cross referenced-filesystem: User Interface should provide a folder like system where all data should appear everywhere it belong to. Let's say you have common KDE open-file dialog. On the left you would find some sort of what you find now like but you can choose between attributes: "Text" "Sound" "Video". Choosing Text you get something like: By MIME-Types, by Type (Article, Slide), by Author,...
Then I can browse: "I" have wirtten a "TEXT"-file"2 Jears ago" containing "SLIDES" just as a document tree I am used too, but it should be able to reorganise to be able to find it in: Written for the "UNIVERSITY" about "COMPUTER GRAPHICS".

Creating new folders would have the same framework as the nowaday system: Choose your subfolder in your favorite browser and do a "create new".

One other important thing: The system must be transparent for any application: Gnome, KDE, emacs, vi(!). I just want to "cd" in it. I still want to "scp" my files to where I need them and copy them back - without the need of KDE or something similar.

But this would imply a new virtual filesystem provided by a special system wide library. A filesystem capable of collecting, storing and offering meta information for special document paths in user-home-directories.

There really is the need of a better organisation and your ideas could (and should) be part of it - but this is just a small step towards it. A good step, but as you see in other comments, too, a really complicated one. A long way to go but no reason to give up.

Greetz,
godot


By godot at Thu, 2003/01/09 - 6:00am

I'm am pleased that someone has beat me to this :)

First off, I'll make a disclaimer - I'm a star trek fan.

Second, in that show, and yes I'm aware it's fiction, they never access information via filenames. They run a search on given parameters, narrowing it down logically (voice acticated, blah!, that's after the concept is sound). If the information is personal, they access it something along the lines of 'Computer, access personal recording of trip to Risa' and it would start playing the first recording it found that matched that criteria. If they wanted a later recording, something along the lines of 'Forward two days' would jump it forward.

This is how I dream a computer will work one day - everything like Google except cooler. If it's a personal document, it'll be tagged as such, if it's a private document, it'd be tagged as such, and if it was public access (like a web-page?) the same would hold true.

I've been doing some studying and conceptualisation of this privately for the last little while (sorry guys on IRC, I miss you!) but have determined that it would be terribly hard to properly integrate this into an interface as it exists today.

Frist off, the meta-tagging would have to be mostly automatic (which can be done with some smart routines in the save dialog). Second, the user would not be able to be aware of the underlying HFS at *all* for it to work smoothly. A quick-access designation could be assigned to commonly used documents that would bring it up faster than doing the normal meta data by adding 'among recent documents' or somesuch -- anyway - lots of work to do - so little coding skills :P sigh. working on that.

I'm working on an RFC for now - if anyone's interested in more info/assisting/brainstorming - jiilik(a)bytebenders,com (figure it out :P)

oh, and I definately want this to be open source if I can pull it off.

Troy Unrau
used to be troy@kde.org - too much spam there now


By Troy Unrau at Thu, 2003/01/09 - 6:00am

How could you be getting all that spam? Didn't coolo install Spam Assassin? Seems to work fine here.


By Navindra Umanee at Thu, 2003/01/09 - 6:00am

All the spam I'm getting is a) in korean an b) is a result of a serucity hole that my ISP refuses to properly fix. (an email address that when sent to, ends up at the whole subscriber base).

So the spam isn't channelling through the kde.org address - it the address beign forwarded to that's the problem.

I have a new pop adress - just waiting for .forward to get fixed.

Troy Unrau
jiilik(a)bytebenders,com


By Troy Unrau at Fri, 2003/01/10 - 6:00am

I have been working on something like this useing java and a web interface, It allows meta-data to be added to databases/File systems. I am just useing Http as the network protocal which just happens to work with a web browser :-) I have no idea how I would intergrate something like this into the File System viewers.


By james at Fri, 2003/01/10 - 6:00am

Putting all files in the ~/docs folder using crypting file names is an easy solution for the data mangement system but not for all users. I think that a more sophisticated approach should be developed, that support a thrid way:
First way: Classic HFS.
Second way: newdocms in the proposed form
Third way: A mixed mode. Metadata and other info attached to documents outside of the ~/docs-folder with filename and path. The additional data are stored in the document itself (if possible (eg. html,xml,tiff,jpeg), or in an additional .db-file, that is normally invisible in KDE-applications. To maximize coherence the "main"-file and the .db-file can be stored in a RPM-Archive or a common folder. Example a metadata-enriched "picture.raw" would be moved into a folder "picture.raw", together with "picture.raw.db".
A demon monitors consistency-leaks due to pure HFS-programs.
This or another demon should also offer an Virtual FS, that allows browsing files by categories and attributes. This feature would be usefull for pure HFS-progs and KDE-apps as well.


By Georg at Fri, 2003/01/10 - 6:00am

Your FS become a database, like BeOS tried some years ago.

[-]
Snapshot of reiser4 source code can be found at
http://www.namesys.com/snapshots/.

It is set of patches against current Linus BK tree.

Reiser4 is the next version of ReiserFS file system. It was re-written
from the scratch. It supports:

- full data journalling with "wandered logs" ("shadows" in DB
parlance);

- extent-based files;

- delayed allocation of disk space and on-line optimization of disk
layout across file boundaries;

- plugins: infrastructure for easy extention of file system and utils
functionality;

- and a lot more, see http://www.namesys.com/v4/v4.html

Snapshot contains reiser4 proper (fs_reiser4.diff), set of patches
(described in READ.ME) with necessary changes to the core kernel, and
utils package (in particlar, mkfs.reiser4).

It is still crasheable. Do not put critical data on it.

Nikita.
[-]

And here some notes about the speed:
http://www.namesys.com/v4/fast_reiser4.html

Regards,
Dieter

BTW Hans Reiser did a demonstration for Apple on it.


By Dieter Nützel at Sat, 2003/01/11 - 6:00am

Maybe instead of having to enter a category every time, you could have a list of categories you have already set up and select one from the box, then a subcategory... etc. I'd like to see something new like this, but it'd have to be good. Thanks

David


By David Findlay at Sat, 2003/01/11 - 6:00am

The management of metadata is a typical design goal for ReiserFS, because this FS is already designed like database. But it's a disadvantage to design such a system for a specific FS

Doc Funfrock


By Doc Funfrock at Sat, 2003/01/11 - 6:00am

... to this idea, but I love your implementation. Given your inclusion of the the ability to save w/ no extra information other than the MIME type, and the fact that the metadata is all stored in a database, I don't see it being as big a problem as I originally feared.
My suggestions are few:
a) Make it work with any common database backend (i.e., MySQL, Oracle, etc).
b) Make it work with NFS
c) Make sure to copyright your work under GNU... if necessary to protect the idea and ensure it remains opensource, patent any new technologies you've created for it.


By Mike Forbes at Sun, 2003/01/12 - 6:00am

One thing that really annoys me is having to search for things that you would otherwise not have to. For example, file-open boxes that don't give you a place to simply type in the path/filename of whatever you want to open (the quickest option if you know it), but actually having to look through directories and clicking several times to get to it.

I would never opt for a filename-less system because of this. Even if I don't remember what a file is called, or where I put it, all I have to do is ask myself a question: if I was going to save it now, where would I put it? There is something to be said in knowing the way that you think, and forming habits where our memory isn't sufficient.

However, perhaps an 'index' tickbox on the save dialog box, that you can enter searchable information about the doc, and enter it into a central database which you can search later should the filename elude you would be a middle-ground solution, perhaps it could even suggest a filename based on that information for those who struggle?

We seem far too obsessed with making computers require less thought to use, in general, not just data retrieval. Maybe I'm just being old fashioned, and maybe (most likely) it's the way forward for computers. But is it the way forward for mankind?


By Alex at Wed, 2003/01/15 - 6:00am

A system like newdocms is only as useful as the number of (properly tagged) documents it contains. Let's say I'm a technical writer with a big pile of documentation... the first thing I'd have to do is add all my existing documents to the database. Tagging all those files by hand would be daunting enough to keep many people from even starting.

However, a lot of files essentially describe themselves to a point. An otherwise-untagged PNG or JPEG file could be assumed to be a picture, for example. When I come back looking for that picture, I can just look for pictures and not Kword files. Some digital cameras add a little extra info, perhaps dating it, and that info could be useful (say I'm looking for a picture I took on January 3 but I added it to the database on March 7 -- I know I took the picture some time in early January, and I don't care when I put it in the database). MP3s have lots of handy tags as well.

Text files contain even more metadata that could be assimilated. If I write a paper using troff -ms macros, there are tags like .AU (author), .AI (institution), .TL (title), and some specifying the document type (report etc.). Many XML doc types, and even some binary formats, have at least as much info. All of that could be used when adding a file to the database; newdocms could simply take advantage of it and present the user with a suggested set of tags (which the user could approve or change).

Being able to analyze incoming documents would go a long way toward general acceptance of something like newdocms. An adaptive system, that learns how each user likes to categorize things over time, could be difficult to implement but would be even more difficult to walk away from. :-)

-- Dirt Road


By Dirt Road at Thu, 2003/01/16 - 6:00am

My idea to face with the organisational problem of files was to replace the TREE relationship with a FOREST relationship. That way, using "directories" as metadata you can find easily (by "descending" in the forest) the document you are searching for.

http://www.prism.uvsq.fr/~dedu/docs/kb/ provides more information, if you find it useful.


By Eugen Dedu at Thu, 2003/01/16 - 6:00am

Pages