KOffice ODF Sprint Report

The two days of the KOffice ODF sprint were very productive. Most time was spent on group discussions, and designing specific parts of KOffice in smaller groups. Of course, code was written as well, and for an overview of what happened, read on!

The first real KOffice ODF day kicked off at 9am with a presentation by David Faure, who talked about the technical side of the Open Document standard. He explained how the format works, and how to read and write it. He also talked about the way the standard came into existence, and the process of improving it. David urged developers with valuable input to join the OASIS committee, which is responsible for ODF. Philip Rodrigues, who spent the weekend working on documentation, condensed all this information on the KOffice wiki. And not only a basic ODF-architecture overview, but also a lot of deeper, more technical information for those who need it.

Then it was time for lunch. It took some time to get the developers on their feet, as the talk by David sprung off a lively discussion. Finally, we decided to get the food to them. During and after lunch the discussions continued with further detailed talks about ODF, the underlying XML structure and its implementation in KOffice. Okular wants to use KOffice technology for displaying ODF files, thus Tobias' input here ensures that the KOffice architecture supports this implementation need.

Brainstorming

After the ODF discussions, Inge Wallin started a brainstorm about the vision he has for the future of ODF: every KDE application should be able to work with it. The developers talked about this for hours, as it is new and uncharted territory. What use cases are there for ODF on the common desktop? How would they consolidate the rich ODF functionality available in KOffice within a library, to be used by applications? Till Adam (from Kmail and Akonadi fame) started daydreaming about VCards and loading and saving emails to ODF.
Anne-Marie would love to have this available to use in KVocTrain. KVocTrain works with files which contain vocabulary data, and it has an internal editor to create these data files. But it is a seperate fileformat, so you need use the built-in editor for creating data. Now an ODF library would make it relatively easy to use an ODF-based fileformat for this data, so you can use, for example, KSpread to enter the data, or use it to import from all kinds of datasets. Here you can clearly see the flexibility and power such a technology would bring to KDE. Reuse of code and co-operation on standards are important cornerstones of KDE, and this would bring them to a new level.

However, there are many pitfalls to overcome before this can be implemented. The first step will be to allow applications that can output documents (through saving or printing) to create ODF documents. The KOffice hackers will create a library which makes writing to ODF documents as easy as possible. Just like Qt currently makes it possible to output practically all content to SVG or PDF, the KODFlib (the name is still undecided) will make it possible for applications to output ODF files. Thomas Zander digested the results of the meeting in an email to the KOffice mailinglist, which makes for an interesting (though technical) read.

In time, this "KODF" technology might make KDE the preferred platform for everyone who values open communication and the sharing of knowledge. Governments and companies alike can enjoy the advantages of this integration in the Free Desktop. Now there's an imporant question for the developer- and user community: the KOffice people, and specifically those designing this library, are looking for use cases for this technology. Having those would enable them to make a more generic solution, and thus advance faster. So think about where you can use it, and how you would like to use it. If you have a concrete example, report this to the KOffice developers. You can use the KOffice developer mailinglist or add comments to this article. It will be appreciated!

Architecture

Next, a small group of core developers gathered in a seperate room. A talk to David made clear what they had been working on: working out the details of loading data in Flake shapes. Loading data might sound simple (it does to me) but David explained how hard it can be. If an application loads a document, objects (parts of other documents, like a picture, spreadsheet, vector graphic) can be embedded in it. So the application has to identify the right Flake shape for each of those embedded objects.

In KOffice 1.x, the strategy was the chosen solution across the office industry: practically just load the other application within the current one so you have all the needed capabilities for the object. In KDE, we have the KParts technology for this. Both applications need to suport this, of course - the embedding app has to load the one to be embedded, which has to take care not to interfere with the embedding application. It works, but it is complex in terms of technology and user interface, and is very heavy on resources. The new Flake technology allows you to just use what you need - the basic display, load and storage technology and the primary manipulation tools. Flakes integrate in the application on a much lower level, allowing for less overhead and more flexibility. You're only embedding an object, not a full document. Now, as ODF supports having an object inside another object, a Flake must be able to load a Flake. And sometimes, data can be loaded by two Flakes, and you have to identify the right one. Everything had to be carefully designed to ensure there are no clashes. This technology is also needed for drag'n'drop and copy-and-paste, as those are essentially about loading objects.

While these developers were discussing loading in Flake, the other hackers were hacking away, talking and discussing architectural issues, or trying to work out some details. It proved to be very difficult to get them out for dinner, but we managed to convince them it would be healthy to stop working for a few hours. As it was raining heavilly, we were lucky dinner was planned at a Turkish restaurant almost next to the office.

Implementing the new ideas

After dinner, the developers continued implementing the ideas they have. The office was full of hacking people until after 2am. And the next day, you could hear keyboards being abused at 9am already, so I'm pretty sure most hackers will need to catch up on their sleep. But they clearly wanted to get as much out of this meeting as possible. It proved to be very useful to be in the same room, as much new technology was being integrated into KOffice. This resulted in a lot of questions, and having the person who wrote a certain piece of library you need just sitting next to you is very efficient.

Alfredo and Martin got the KFormula flake loading and rendering, and Jan has hopes to get Karbon at the same level before he leaves. The discussion about loading ODF also paid dividends on Sunday, when the hackers were slowly getting the infrastructure to pass testcase after testcase. Tobias went to work on an Okular tutorial for writing plugins, to ease the process of writing an ODF reader in the future. Pierre Ducroquet and Sebastian Sauer where working on the basics of KWord, and finally were able to get it to display background colors and some other testcases from the OpenDocument testsuite.

So people were both talking and hacking, and working on both creating and implementing new things all day. Overall, it was clear what the greatest benefit of this meeting was: the design and other results of the discussions. Though there has been hacking, most time was spent talking to each other, and trying to flesh out the details of the KOffice infrastructure. This meeting will ensure the architecture is sane, powerful and usable. And of course, we had fun. There was good food (thank you, KDAB!) and some beer as well. It's always nice to meet your fellow hackers, and see what faces belong to the nicknames you see so often.

We want to thank the sponsors for their support, as it is really a big boost to KOffice!


Klarälvdalens Datakonsult AB (KDAB), who provide consultancy, training and development on Qt and KDE-based applications, provided us with their office in Berlin. Several KDAB people joined us, and not just paid for, but also knew where to get the good food!


Further, Trolltech (the guys behind the Qt toolkit) are sponsoring the travel costs of the ODF Meeting through our own KDE e.V., so we owe much to them as well.

Links to the articles, blogs, pictures and other information is being aggregated on the Koffice ODF Sprint site in the Galleries. Have a look to see a more detailed and personal side of the meeting!

Dot Categories: 

Comments

by superstoned (not verified)

for those who want to know a who-is-who, check the KOffice ODF sprint site (sorry for the typo in the link above)...

by djouallah mimoune (not verified)

eh that you jos, i hope you enjoyed the food :-)

by Marcel (not verified)

Hi,

the ODF-lib could be very handy when creating apps who should support freelancers and small companies. They could create nice looking invoices directly out of some workhours-log which then can not only be printed and archived in PDF but also loaded and edited in KOffice if some details must be changed. The templates which are used by this application to create the invoices could also be created in KOffice using some placeholders.

M

by ben (not verified)

I've always wondered why we need another document format. Where is HTML not sufficient?

text documents, presentations, spreadsheets, all of that is already done today using HTML. Separation of style & content is also possible, javascript can be used to add the more advanced things etc.

So, why not html/css/js?

by Christian A. Reiter (not verified)

Hm - I would say: look at the difference Konqueror, Mozilla, Opera and IE produce out of ONE html page ;-)

by bluestorm (not verified)

I'm not sure that's the right explanation here :
Why aren't web pages displayed the same way on every web engine ? Maybe because the specs are complex.
Is ODF simpler ? I can't help thinking it is not the case.

I don't know ODF. Perhaps it makes easier to describe the structure of the page, or richer as it comes to the specific needs of a particular app (XHTML is display-oriented, not work-oriented), and it may have a lot of other advantages, but i don't see it as an answer to the "multiple apps, multiple output" problem.
Of course this problem can be solved too, and developpers discussions (between Koffice, and with OpenOffice/Abiword/whatever devs) is certainly the right way to go. Let's cooperate !

by Chris (not verified)

HTML was originally designed as a method of serving text and graphics over the internet. Since then, much more thought has gone into the best way to do this. Among other things, the concept of separating the actual content from how its displayed has been well developed. Since at this point HTML is a bit of a patchwork, and because developing it further would make backwards compatibility difficult to achieve, XHTML has been developed as a replacement. XHTML does an excellent job at separating content from display.

Why would we want this? If you are using your mobile phone or PDA, would you want websites displayed in the same way they are when you're on your widescreen monitor at home? Probably not. This is why the separation between content and display is so important, because with it you can display the same content but in a form that is most appropriate for the given user/environment.

If KOffice uses a flexible format like XHTML, it can display the same content, but in a different way depending on the O/S, display manager, resolution, user preferences, etc.

Finally, if ensuring uniformity across platforms is really what's important, you need to choose a format that has that as its goal. PDF is by far the most common, although there are others.

by Morty (not verified)

Using PDF as a format for a document you work on is a no go, since it's essentialy a read only format. Even if you can edit it, PDF was never meant or designed to be editable. You have to realise, PDF is your printout. Even if it's stored digitally, it's the finished product.

by Thomas Zander (not verified)

Html is purely about presentation, ODF is not.

Just one post above there is a request for saving ODF from an invoicing application. The result would probably be a spreadsheet. HTML does not have a spreadsheet, it just has a table with text. How can you load that table into the application again? Things like doing =SUM(A1, A10) are lost, you only see the result without any meta information.

End result is that while HTML might be useful to *show* something to a client, it is useless to have as a fileformat to exchange structured data.

by Richard Dale (not verified)

Although, I agree that HTML is about presentation it doesn't mean that ODF is the best way to describe arbitrary semantic data. For instance, I'm not sure about this comment:

"KVocTrain works with files which contain vocabulary data, and it has an internal editor to create these data files. But it is a seperate fileformat, so you need use the built-in editor for creating data. Now an ODF library would make it relatively easy to use an ODF-based fileformat for this data, so you can use, for example, KSpread to enter the data"

I would have thought some RDF based format would be better for a KVocTrain database, and I don't really see the advantage of using a Spreadsheet to enter language vocabularies.

by Thomas Zander (not verified)

The idea that you can use a spreadsheet to open the native fileformat is something that the author of this article added; I don't see that as a really useful thing either.
The point of using an ODF (spreadsheet) fileformat here is based mostly on the fact that its better to reuse a known (xml based) fileformat then to invent a new one. Especially if there is a library that allows you to build on the shoulders of others.

Also; your reference to language vocabularies makes me realize you misunderstood; the idea was for the fileformat for saving user data (how well the user performs etc), not for saving the datafiles.

by Richard Dale (not verified)

OK, I see - obviously RDF is a well known file format and using it for something like a vocabulary is what it is intended to be used for.

Saving stuff like that on a local hard disc seems a bit limiting, as the data much more useful if combined with other people's scores on the web. Then you can ask questions about what foreign language features are hardest to learn by a particular group of native speakers.

If you want to save the user data on the web, I think it would make sense to use something RDF based like rss or atom rather than a document format like ODF.

by Aaron Seigo (not verified)

the most interesting aspect to this plan is to use a common file format for storing data that all applications can use. odf seems to be a set of formats that are well described via an openly crafted specification and which can handle the needs of all these apps.

encapsulating data unnecessarily in file formats specific to a handful of use cases means that we can't easily move data around between applications and instead have to have filters and other conversion mechanisms.

taking your suggestion for RDF (or a derivative), we'd lose this data portability. and for what benefit? perhaps one might wish to export to RDF, and if there is one odt->rdf or ods->rdf filter written then all apps can use it, not just the one app that is specially coded to save its data in RDF. replace 'RDF' with any other file format and think of all the apps that save data.

what will people use this for? honestly, i don't think we can fortell every possible use. that's one of the neat things about technology: once created, others can and usually do find interesting and creative uses for it. but this is only possible if the technology is an enabler rather than a straight-jacket.

i think it is time we moved beyond the data balkanization that happens due to the rather unnecessary proliferation of file formats.

the other aspect of this is that it makes data storage a simpler problem for people writing applications. instead of creating a file format, which is usually just a means to an end rather than a key function of the application, or coding up support for an existing file format, having a common library will make it easy to deal with saving and loading data programmatically. by sharing that library amongst apps, when it comes to data storage it will make it easier for developer of application 'a' to work on application 'b' since the code for saving and loading files will be minimal and near identical in both apps. instead of having to worry both about the internal data structures that are being saved out and the format to which it is being saved, one need only care about the former.

having looked at and worked on a number of kde applications that deal with files and have their own home brew document classes, i am pretty happy about that possibility.

by Richard Dale (not verified)

"taking your suggestion for RDF (or a derivative), we'd lose this data portability. and for what benefit?"

No, quite the opposite. RDF allows you to combine semantic data (such as test scores from using edu apps), with other semantic data that can be found on the web. This would allow you to combine the kde edu app test scores with other data, such as school sports results perhaps, as long as both were RDF and used common vocabularies such as Dublin Core and FOAF.

I agree it isn't always obvious when data should be held as an ODF document and when it is better stored in, say in an RDF format. And you might describe ODF docs with RDF meta-data to combine both.

by Aaron Seigo (not verified)

yes, you can associate RDF metadata with a document. on the other hand, i don't see RDF being able to cover all the data needs of applications. in the case of test scores, afaik RDF doensn't give access to "sum this column, divide by the count" like a spreadsheet does.

so given that RDF doesn't cover all the needs of applications, it can never become the "native" file format for many applications. ODF can be. and that means, quite obviously, that ODF grants greater data portability.

as you note, it comes down to both apps using the same formats and vocabularies. well, that's what ODF is: an agreed upon format that has broader reach. you're talking about theoretically having apps store things in RDF, the koffice people are talking about a practical way of having many apps store things in a common (set of) format(s): ODF.

that doesn't mean RDF is bad, it means that ODF is simply a more general purpose solution and therefore a better selection for primary data storage.

nepomuk/strigi bring RDF to the desktop, but i don't think that is an appropriate primary storage system. awesome possibilities for augmenting the primary storage, yes =)

by Cornelius Schumacher (not verified)

I love the idea of having a commonly available library to operate on ODF formats, using them as storage format or interacting with applications using ODF or whatever else. There are countless exciting use case. I would for example like to be able to edit a template document for something like an invoice in KWord/OpenOfficeWriter/... and then fill in the actual data from a specialized application knowing about the business logic behind the specific invoice (maybe a combination of say KMyMoney, KAddressBook and Kraft).

On the other hand I can only support Richard's suggestion to use RDF. This goes beyond the notion of a file format for storage of data. It includes relating data, interacting with web resources, sharing of information and much more of the things user want to do in the Web 2.0 times. We shouldn't miss out on this opportunity just because we usually think in the traditional lines of storage file formats.

by Thomas Zander (not verified)

I agree with you that using an existing fileformat is good; the choice you state is RDF. I don't have a lot of experience with it.

I guess what I am saying is that we should aim for integration of one fileformat in as many different applications and usages as possible for a couple of reasons. FIrst is obviously code reuse, which Aaron touched on and what makes it easier for the programmers. The developer might like to load the file in a spreadsheet for debugging purposes.

A bigger reason is that when ODF becomes omnipresent the tools for handling and debugging them get more focus and developers and users alike will see these documents as more than simple blobs of incomprehensible data.

I guess its similar to the concept of 'everything is a file' under unix. Using one interface to as many different things as possible might not give you the best solution for each individual case, but on the whole the network of possibilities increases magnitudes.

So, while you may be right that RDF is better (I honestly don't know) for this individual use case, it does not invalidate the idea of a separate library for all apps to use and grow upon. The example elsewhere in this thread that an OCR scanning app saves its text as well as its pixmap data into an ODF container seems like a really good idea to me.
And if we start to have a lot of ODF support all through the industry, then kvoctrain will benefit more from ODF in the long run than from using RDF which will ultimately be less well supported.

by Bruce (not verified)

Just for the record, ODF 1.2 will allow you to use RDF within ODF; layering addtional semantics on top of existing document content. So no need to choose between them; they can be quite complementary.

by superstoned (not verified)

[i]The idea that you can use a spreadsheet to open the native fileformat is something that the author of this article added; I don't see that as a really useful thing either.[/i]

If I recall this right, Annma mentioned this herself ;-)
I try not to make up things, I might only try to make 'em sound a bit more sexy...

Anyway, I might have misunderstood this as well, then, I also thought it was for saving datafiles. Maybe Annma can explain what she meant?

by Davide Ferrari (not verified)

Ask you this: do you have the concept of "page" in HTML+CSS?

by Diederik van de... (not verified)

Good! :-)

Next challenge: footnotes, tab spacings, TOC's, field references, WordArt, vertical-aligned elements, marco's, slide animations, ...

The parent has a point: XHTML/CSS is not mature enough to be used as page format. It needs to be extended severely to compete with Word documents. Features have to be rushed into XHTML while it's not ready for that. Right now the web is a presentation format, not a document format with features like "tab spacings" or "field references".

by Diederik van de... (not verified)

It might not be XHTML, but note that ODF text documents are relatively easily converted to XHTML using XSL! ODF uses similar concepts for formatting tags and markup.

by bluestorm (not verified)

Actually, such tools already exists, thanks to the support of the OpenDocument Fellowship : http://opendocumentfellowship.org/projects/odftools

by Mark (not verified)

Kdissert would be a candidate profiting from just using the odf-format. It then would be possible to create mindmaps just out of an existing document structure. It then could become also another form of viewing a document.

by Morty (not verified)

The other way is already possible, exporting a mindmap to a odf file. From a workflow point of view that's more important I think. You brainstorm and create the document structure in a mindmap, then fill in the details in a regular document.

by Thomas Zander (not verified)

Yes, I agree.
It actually has been a feature request of mine for quite some time to be able to start kdissert right from inside KWord and show the document structure there.
Moving the chapter-nodes around or annotating them would then alter the document inside KWord.
KDissert would be great for coupling a KWord doc with a KPresenter presentation as well; where loading just shows all the chapter nodes and exporting to KPresenter allows you to save all the graphics but also create pretty graphics based on textual structure information like lists etc.

Anyway; we just need someone to actually do it, which tends to be easier said than done ;)

by Mario Fux (not verified)

This might be a stupid question but did you ask the author of Semantik [1] (aka KDissert) already?

[1] http://freehackers.org/~tnagy/kdissert.html

by Thomas Zander (not verified)

I did :)

But I have to admit that that was well over a year ago...

by Mark (not verified)

Me too :-)

But even more than a year ago. But it was said it is impossible at that time. I made that comment because this actual discussions might show that it is not impossible.

by ita_est (not verified)

Some kind of Koffice integration will be provided, but not until KDE4.

The Semantik code base is too young at the moment, and KWord 2.0 is not ready either.

by ita_est (not verified)

Merging the two documents into one will lead to a conflicts as the two views do not represent the same thing:

* when the mindmap is changed, the text formatting in the text representation will be silently broken (picture positions, paragraphs, line breaks ..)
* when the text representation changes then the map structure will be broken (and parsing is hard(tm)).

by Wade Olson (not verified)

"After the ODF discussions, Inge Wallin started a brainstorm about the vision he has for the future of ODF: every KDE application should be able to work with it."

The topic and the resulting implementation ideas - exactly what I was hoping would be discussed.

When you see Export->Text/HTML/PDF ubiquitously, I'd love for ODF to be added to that list. Users should expect to see that option throughout the desktop and apps, developers should expect to have a unified way of doing this (just like help menus, etc) and who better to implement a standard set of ODF libraries than the KOffice team?

Very cool.

by Dr. Nobody (not verified)

Sounds like a real creativity festival. Congrats (and thanks) to the KOffice developers! And to the sponsors who helped make it possible.

by Oscar (not verified)

One thing that worries me a bit is that wile ODF is quite good at being platform indipendant, macros aren't. From what I know I can't open a macro-using Spreadsheet-ODF from OpenOffice in KOffice and get the functionality I expect. Or can I?
I'd love to use KOffice more but since I have to exchange documents with OOffice users, from what I understand I'd better stick with that. Or am I wrong?

If I'm not wrong, is there any thought going into the idea of having platform indipendant macros?

Oscar

by Thomas Zander (not verified)

Macros are application-level helpers, not document level as MS has created them in 1995, and most people perceive them now.
So they are surely tied to the application and actually never saved into a document which makes your question a bit odd ;)

Reason for this separation is virusses. It is a basic mistake to allow macros to be shipped in the same container as data.
Note that MS also is separating those in newer office versions; so it took them a bit longer, but they agree with us now :)
http://security.blogs.techtarget.com/2007/05/11/microsoft-moice-will-sec...

by Oscar (not verified)

Thanks for the answer and an explanation.
I'm still a bit confused though. From what I can see I'll still be able to send my macro-infested MS-word-doc to someone and have them behave they way I expect.

Is there, or will there be, some way to do the same with ODF? I think I can do it with OpenOffice, but not between OpenOffice and KOffice.

Perhaps I'm making even less sense now but being able to send a macro-using-document is something that I want to do from time to time. Typically spreadsheets.

by superstoned (not verified)

you can send the old, macro-infested files between the old MS Office versions and (I guess) to newer MS Office versions. and thanks to the huge amount of work on vbscripting, also to OpenOffice. KOffice won't be able to open them, no VB script support. And ODF doesn't have embedded macro's (afaik) so that's a no-go as well.

by Oscar (not verified)

I don't think I'm being clear enough but that might be because I don't really know what I really want.

I'm not asking for (and probably don't want) VB-support in KOffice. What I'm looking for is making ODF similar to MS-doc in the way that I can include a macro and it will work on the machine that opens that document-file.

So when I make my spreadsheet and some clever macros in it that formats the data in some way, I can send it to my collegue and he can fill it with data that will be formatted according to the rules I set up. Regardless of which program or platform we use to open this ODF-Spreadsheet.

Sorry if I'm too stupid to understand the answers given.

by superstoned (not verified)

I think you might be able to send the macro seperately, but not in the document, as that's a big security risk (and that's why MS doesn't allow that anymore). But I might be mistaken, maybe a KOffice dev can explain/tell more about this.

by Thomas Zander (not verified)

Superstoned is right; in ODF we force a strict separation of document and application (and a macro is an application).

You have a huge set of formulas and other features available to you in a spreadsheet document, but you will not have features like downloading or printing or even for loops.

It would be good to know what kind of things you need to do in your macro so we can find a good (and secure) solution for them. If you can give exact and specific examples of actions taken, that would be nice to know :)

by Sergio T. (not verified)

Maybe we can distinguish between "document level" operating macros and "system level" operating macros. The first one cannot manipulate files and directories, nor connecting arbitrarily to any outer source (DBs, net resources, other files, websites..), while the second does. In the first case you can release the macro embedded in the document and, as far as i see it, the language can be standardized (for example in a spreadsheet it should perform only operations in cells, creating new sheets and so on..).
As a matter of security, i think that (simply) a document, even with embedded macros, can be forced to start with disabled macros.

What do you think about it?

by chris (not verified)

Wouldn't it be nice to save a scanned image in Kooka (with OCR) to ODF? That would be amazing.

by superstoned (not verified)

yeah, that's an idea. And the Krita ppl are also working on a new rasterpicture standard, so you could have both the OCR-ed text AND the picture in one document, I think. PDF supports something like that, I believe, but that's not editable of course.

by Cyrille Berger (not verified)

No need for OpenRaster to have both OCR and the pixture in one document (and honestly it would be the worse usuage of it), you can embed a png in an odt, and that would be perfect for that job. OpenRaster is only usefull if you have multiple layer (either image or effect layers) and after scanning you only have one layer.

by Daniel G (not verified)

There are formats better suited for scanned images than PNG. the DejaVu (DJV) file format does a good job at this.

And there's a use for OCR + image: you may want to *search* text, specially in multi-page documents, but keep the typography (which you may not have the right to distribute), layout, pictures... and keep in mind that OCR is not perfect.

by Cyrille Berger (not verified)

Sure there is use for OCR + image, that's what you get with ODT :) And I do believe that as soon as OCR is involved a text centric format is much more usefull than a graphic format.

by Inge Wallin (not verified)

This is *exactly* the type of things I was looking for when I proposed the ODF library: applications that I would never have thought of myself. No matter how smart we (the koffice devs) are, we can never think of all the use cases ourselves. If we create the library, the applications will come!

by ubuntuuser (not verified)

like already some said here, I think that it is very important to integrate a real standard. Many Windowsuser already use OpenOffice. It must be sure, that every KOffice document can read by OpenOffice and every OpenOffice Document by KOffice. Differences are dangerous for the stability of OpenSource standards. Start to cooperate with Sun and the AbiWord developers to create a documentformat that you can use with all these programms and on every platform!

by Andre (not verified)

odf is the format!