An Analysis of KDE Speed

Our recent poll (courtesy on the upcoming KDE 2.2 suggests that the area of
greatest concern for KDE users is speed -- at this time, out of 3,463 votes, over 24% consider speed as most important for developers to address. Waldo Bastian, who developed the kdeinit speed hack among other things, has written a paper entitled "Making C++ ready for the desktop", in which he analyzes the various startup phases of a C++ program. Noting that one component of linking -- namely, library relocations -- is currently slow, he offers some suggestions for optimizations. An interesting read.


I'm so glad Waldo is working on KDE and not GNOME...

By Christian A Str... at Tue, 2001/05/08 - 5:00am

Hmm... I surely am very glad to know Waldo and I'm honoured to call myself his friend. Yet, it should be a matter of pride for us to avoid the kind of remarks tou did. Waldo *did publish* his study, thus hoping *all* the Linux community will profit/help/contribute. If an inventive Gnome developer will take Waldo's notes, have a strike of genius and find a marvelous solution to our speed problem, I'm sure we *all* will profit. And it goes like this for every other thing in Free Code. And that's why we like it so much, even if we don't always realize this actual reason.

I would like to thank to Waldo for his constant preocupation for KDE's performance aspects. He is one of the major contributors to KDE-2's at least equal to KDE-1 performance, despite of it (KDE-2) being at least 3 times more complex.

By Inorog at Wed, 2001/05/09 - 5:00am

"Yet, it should be a matter of pride for us to avoid the kind of remarks tou did. Waldo *did publish* his study, thus hoping *all* the Linux community will profit/help/contribute."

It was a joke.. relax ;)

By Christian A Str... at Wed, 2001/05/09 - 5:00am

I see. I used to use KDE all the time. I don't at the moment, although it't not for any advocacy reasons. Comments like yours don't make me want to come running back (nor will similar comments from Gnome advocates make me proud).

By Ian A. Marsman at Wed, 2001/05/09 - 5:00am

I can't see how you could take this that way?

First of, I AM glad that Waldo is working on KDE, as GNOME is not using C++ and had he been working on GNOME the article (if one had been made) would have had to do with C and not C++, and since KDE is a big part of my life, I am happy that a person with this much insight and intelligent is working on making it better.. This was not meant as a KDE-GNOME thing, I mentioned GNOME 'cause they use C and not C++...

Relax.. Don't see wars where there aren't any..

By Christian A Str... at Wed, 2001/05/09 - 5:00am

What about Gnome-- and Gtk--?
They're C++ binding right?

By dc at Wed, 2001/05/09 - 5:00am

I didn't take this the wrong way at all - sounded to mee more like a "I'm glad you're on the team" comment...

By EKH at Wed, 2001/05/09 - 5:00am

Actually, if you read the article, there are a number of things that would help Gnome as much as it would help KDE. For one, the dirty pages that are created when libraries are relocated take up valuable memory, and increase the amount of I/O required to load something. I/O tends to never be fast, so it's something everyone wants to avoid if possible. A more optimized relocator in the would not only help KDE, but also Gnome as the memory footprint could very well shrink for both projects.

If you have 213 dirty pages per application, that means that every task (except one) is about 800k bigger than it could've been. Not only that, that also means that you have to load another 800k of code from the disk, which adds to the start-up time if you were unlucky enough to have the content expire from the cache (and on small or low-memory systems, it's likely the case). If you assume that the average consumer drive can do 5-10MB/sec, reading off the drive platters, that's already a tenth of a second wasted right there. Having more free memory pages for cache never hurt anyone.

By Chad Kitching at Sun, 2001/05/13 - 5:00am

Shouldn't that be "I'm so glad Waldo is working for KDE not Winblow$" ?
People, Micro$oft is our enemy, not Gnome!

By dc at Wed, 2001/05/09 - 5:00am

I like KDE, GNOME and Windows. M$ is not the enemy, there is no enemy.

By AC at Thu, 2001/05/10 - 5:00am

I'm so glad you make KDE users look like complete dick heads, and not GNOME users.

By ac at Fri, 2001/05/11 - 5:00am

Many have said this on LinuxToday already: This is one heck of a paper!

By KDE User at Tue, 2001/05/08 - 5:00am

Interesting esp. the part about classes. How is this done in windows ? does it have a better c++, less classes ?

By Pyretic at Wed, 2001/05/09 - 5:00am

Windoze is written in C. C has no classes, so it doesn't have the same overhead.

By Uhmmmm at Wed, 2001/05/09 - 5:00am

No, C has the same problem, just less of it. Just remember that kmail has about 60 000 relocations, and the (fun) GTK game freeciv has a mere 1172.

But Windows does fix it, so even (e.g.,) Qt/Professional with nearly the same amount of relocations will start fast on windows. But this is only because Windows's dynamic linker isn't very versatile, while Unix's is very versatile (and therefore, slower).

By Charles Samuels at Wed, 2001/05/09 - 5:00am

You just can decide between a fast program or a small-sized program. C++ is mostly small-sized, if it has a good style of code reuse. That means, you've to call more function, more time to spend in.
-- But that doesn't mean, that you can't write fast code in c++.

Christian Parpart.

By Christian Parpart at Wed, 2001/05/09 - 5:00am

One question I have is:
Isn't it possible to reduce the number of static functions in QT and KDE?
For me it looks like as static functions are "bad" functions.

By Schwann at Wed, 2001/05/09 - 5:00am

AFAIK, Windows DLLs have a "preferred address". If the dynamic linker finds that the preferred address is unallocated and there's enough room to take up the DLL at that address, it is loaded there, and no relocation is needed. Otherwise, it loads the DLL at a different address and relocates it.

I think that this is the best thing to do: When a library is linked, it has to be decided in some way what the preferred address is. This must take into account the shared objects that the library depends on.

By Johannes Sixt at Wed, 2001/05/09 - 5:00am

DLLs do have a preferred address. However, as soon as a DLL has to be relocated, it can't be shared anymore, because windows has no concept of Posistion Independent Code (PIC): all function calls inside the DLL have to be changed when it is relocated.

Linux (and all other Unices I know of) does support PIC: it is essential to the ELF binary format, AFAIK. This means relocation isn't expensive: the pages of the libary won't have to be touched, making them shareable.

The thing both the Windows and Unix dynlinkers have to do is to resolve the calls made by an application to a library (or lib -> lib): the app makes a call to a fixed address, inside the relocation table. The dynlinker generates a call on this fixed address to the dynamically linked function.

The problem with KDE is the sheer number of function calls exposes the inefficiency of the dynlinker.

Loading a library on a preferred address may speed up this proces. It may break ELF though.

By Erik Hensema at Wed, 2001/05/09 - 5:00am

I didn't understand - all calls in PIC code, even inside a single DLL, are done indirectly using a PLT table ? Are there no relative call instructions in assembler !? But Windows simply modifies DLL code replacing all adresses ? - a text segment is writable ? Uff !

By Hi at Thu, 2001/05/10 - 5:00am

If theres a PLT, why can't the dynlinker cache the plt and load the shared library the next time it's needed to the same address ?
I could even think of saving this cache to disc.

By remi at Sat, 2001/05/12 - 5:00am

Excellent paper. I had always wondered about the performance of dynamic linking with C++, and library caching in general.

While the "kdeinit speed hack" is called a hack, it actually sounds like the right way to do it. What better way to keep the libraries loaded? It's possible that certain libraries could even be preloaded, like the filemanager components (IMO, the only application that really requires instantaneous loading).

On a side note, the only real other speed problem in KDE would be Konqueror's re-rendering of content. It takes a long time to load large pages, like for instance a huge message board. However, the re-rendering is killer when you've finished reading a post and you're clicking "back". Same goes for pages with many images (like Konq's thumbnail render). Perhaps these final page renderings should be cached somehow. Does anyone know how Netscape and IE go back and forth so fast between alread-visited views?


By Justin at Wed, 2001/05/09 - 5:00am

I completely agree with what you said. The filemanager/browser is one of the applications that really should start as fast as possible. Also going back and forward in the history is at present somewhat slow. Dirk Mueller is doing some nice optimizations concerning khtml at the moment, so I have the feeling this will be solved/fixed somehow. For me the main problem is drawing the page on my screen. On my fast hardware (not being under load at that moment) I can very often actually see the page (slashdot for example) being 'sweeped' on my screen from top to bottom. I really don't have a clue what is causing this behaviour (Hardware ? Qt ? Khtml ?) :(

Well, keep up the good work !


By Jelmer Feenstra at Wed, 2001/05/09 - 5:00am

I wasn't sure if you were serious here or not...
Surely you are aware that konqueror has a cache just like other browsers? It has even been enhanced in 2.2 with auto-synching and offline-viewing mode.
You set it on the web-browsing proxy page. (Dont ask me! Its prob on the proxy page because a cache IS a proxy to all intents)

And the slow rendering you are talking about has nothing to do with rendering. Konq is acknowledged as having one of the fastest renderers in the world beating even IE. The delay you are talking about is just your slow net connection downloading the hundreds of entries in a post-list, Once downloaded it renders that almost instantly because it is mainlt text.

God I wish people would read their manuals. There are so many badly set up linux systems out there and all blame kde rather than their own lack of intrest in setting up properly, Now that would give more speed increase than optimising the whole dynamic loader!

By Andrew Kar at Wed, 2001/05/09 - 5:00am

Whoa there. You completely missed what I meant.

While Konq may render a page faster than other browsers (and it certainly does), and it caches the content, it does not cache the "render".

I was recently visiting a large SuSE forum that took almost a minute to load. This is tolerable, but it immediately became a problem when I read a comment and then clicked "back". I had to wait another minute as Konq re-rendered the forum. My solution? Create a split view and drag links to comments into the other view. A better solution? Konqueror should have a method for rapidly rendering previously viewed content. Perhaps storing the final rendered canvas into memory for the current session.

And don't worry, I read my manuals. I'm a programmer, after all. I might even want to contribute to solve this problem, but I'm hesitant since Dirk didn't accept one of my other patches.


By Justin at Thu, 2001/05/10 - 5:00am

Depending on the color depth this could take a bit of memory.

How about a fast compression of the previous render stored in memory or disk?

By John at Thu, 2001/05/10 - 5:00am

Why the fsck can Windows do this instantaneously?

IE loads in < 1 second. Rerenders pages instantaneously, and is viewable offline.

It just pisses me off the Konq can't do it yet. I love working in KDE, but once you have used KDE for a few days and you fire up Windows again (because my online banking does not work under Konq.) you then realise just how slow KDE is, in general, I am able to load IE, go to google and perform a search and be redirected to the first on the list before Konq. has even started.

I wait patiently, or not so :) for KDE to become the only OS I use :)

By Wanting-not-to-... at Thu, 2001/05/10 - 5:00am

The reason Windows GUI's are, in general, faster is that most graphics cards implement GDI calls in the hardware. Try disabling hardware acceleration in your graphics settings and you'll see exactly what I mean.

By Vladimir Annenkov at Thu, 2001/05/10 - 5:00am

X-Free is using the hardware-acceleration al well (if possible). So this should not be the reason

By Schwann at Fri, 2001/05/11 - 5:00am

test, ignore

By test1 at Sun, 2001/05/20 - 5:00am

test, ignore

By test1 at Sun, 2001/05/20 - 5:00am

test, ignore

By test1 at Sun, 2001/05/20 - 5:00am

test, ignre

By test1 at Sun, 2001/05/20 - 5:00am

test, please ignore...

By test1 at Sun, 2001/05/20 - 5:00am


By test1 at Sun, 2001/05/20 - 5:00am


By test1 at Sun, 2001/05/20 - 5:00am


By test1 at Sun, 2001/05/20 - 5:00am


(for the curious: I'm trying to see how
a weblog copes with indentation related
to threading messages)

By test1 at Sun, 2001/05/20 - 5:00am


By test1 at Sun, 2001/05/20 - 5:00am

PalmOS devices have been doing something like this for a while due to their limited memory and CPU horsepower. I believe iSilo (judging by its speed) does this. The open source Plucker for PalmOS documents a file format that is essentially a prerendered web page, and that saves both CPU and memory. Prerendering does not have to mean saving the entire bitmap. A compressed format that saves prerendered bitmaps, preparsed text and previously made formatting decision would save both space and time. It can be used in place of the raw html and jpg files.

By Chris W at Thu, 2001/05/10 - 5:00am

It can be done. It's not a Windows-specific feature either, as some have suggested to your post. Have you tried Opera? It's back/forward cache performance is lightning fast, way ahead of anything else on Linux. If you play with its settings, you will find a "cache rendered images" setting as well. So yes, something much like what you suggested can and has been done on Linux.

By Chris W at Thu, 2001/05/10 - 5:00am

A simple strace on konsole shows more than 200
failed opens (ENOENT). I.E. It fails to find the dynamic dependencies 200+ times. This transforms into so many open/access/stat calls which obviously slows down the startup. I just tried to adjust the links to SOs so that all required dependencies are resolved at first shot. Believe me, it speeds up things to a good extent. Can't anything be done to hardcode the paths in binaries at compile time? Also of libc and libqt seems to be a good idea for "KDE
only" users.

-- Parag.

By parag at Sat, 2001/05/19 - 5:00am

I've been unable to turn up any links on the kdeinit speed hack (via Google) -- does anyone have information on this? It sounds like something I'd like to check out...

By kendroid at Wed, 2001/05/09 - 5:00am


its described in this paper -- did you read it?


By Jason Katz-Brown at Wed, 2001/05/09 - 5:00am

I read it and I can follow the link as well as you can. But there is nothing called "kdeinit" in this directory. So, the question stands: Where do you download kdeinit?

By Jim Philips at Thu, 2001/05/10 - 5:00am

You already have kdeinit!

if u use kde, u already use kdeinit....

I was just noting u can read what kdeinit does by reading the .cpp files in kdelibs/kinit/

Kood luck


By Jason at Thu, 2001/05/10 - 5:00am

Basically, "kdeinit" is a daemon, initialized with LD_BIND_NOW, which causes it to be loaded entirely, all at once (rather than as the parts are required), then we dlopen the program, fork() kdeinit, and enter the program.

With LD_BIND_NOW, the relocations of all the kde libraries (that's a lot of relocations!) are done only once. And they won't be done again on that fork()

By Charles Samuels at Wed, 2001/05/09 - 5:00am

By Konrad Wojas at Wed, 2001/05/09 - 5:00am

The kdeinit speed "hack" is mostly internal to KDE. However, you can use it with the 'kdeinit_wrapper' binary.

Run 'kdeinit_wrapper ', for instance: 'kdeinit_wrapper konqueror --profile webbrowsing'. Yeah, this is on the command-line.

By David B. Harris at Sat, 2001/05/12 - 5:00am

A question springs to mind ...
Why would the per process memory footprint drop by 800k?
Does this mean that under normal conditions, some libraries get loaded twice?

By Nobody in particular at Wed, 2001/05/09 - 5:00am