An Analysis of KDE Speed

Our recent poll (courtesy KDE.com) on the upcoming KDE 2.2 suggests that the area of
greatest concern for KDE users is speed -- at this time, out of 3,463 votes, over 24% consider speed as most important for developers to address. Waldo Bastian, who developed the kdeinit speed hack among other things, has written a paper entitled "Making C++ ready for the desktop", in which he analyzes the various startup phases of a C++ program. Noting that one component of linking -- namely, library relocations -- is currently slow, he offers some suggestions for optimizations. An interesting read.

Dot Categories: 

Comments

by Christian A Str... (not verified)

I'm so glad Waldo is working on KDE and not GNOME...

by Inorog (not verified)

Hmm... I surely am very glad to know Waldo and I'm honoured to call myself his friend. Yet, it should be a matter of pride for us to avoid the kind of remarks tou did. Waldo *did publish* his study, thus hoping *all* the Linux community will profit/help/contribute. If an inventive Gnome developer will take Waldo's notes, have a strike of genius and find a marvelous solution to our speed problem, I'm sure we *all* will profit. And it goes like this for every other thing in Free Code. And that's why we like it so much, even if we don't always realize this actual reason.

I would like to thank to Waldo for his constant preocupation for KDE's performance aspects. He is one of the major contributors to KDE-2's at least equal to KDE-1 performance, despite of it (KDE-2) being at least 3 times more complex.

by Christian A Str... (not verified)

"Yet, it should be a matter of pride for us to avoid the kind of remarks tou did. Waldo *did publish* his study, thus hoping *all* the Linux community will profit/help/contribute."

It was a joke.. relax ;)

by Ian A. Marsman (not verified)

I see. I used to use KDE all the time. I don't at the moment, although it't not for any advocacy reasons. Comments like yours don't make me want to come running back (nor will similar comments from Gnome advocates make me proud).

by Christian A Str... (not verified)

I can't see how you could take this that way?

First of, I AM glad that Waldo is working on KDE, as GNOME is not using C++ and had he been working on GNOME the article (if one had been made) would have had to do with C and not C++, and since KDE is a big part of my life, I am happy that a person with this much insight and intelligent is working on making it better.. This was not meant as a KDE-GNOME thing, I mentioned GNOME 'cause they use C and not C++...

Relax.. Don't see wars where there aren't any..

by dc (not verified)

What about Gnome-- and Gtk--?
They're C++ binding right?

by EKH (not verified)

I didn't take this the wrong way at all - sounded to mee more like a "I'm glad you're on the team" comment...

by Chad Kitching (not verified)

Actually, if you read the article, there are a number of things that would help Gnome as much as it would help KDE. For one, the dirty pages that are created when libraries are relocated take up valuable memory, and increase the amount of I/O required to load something. I/O tends to never be fast, so it's something everyone wants to avoid if possible. A more optimized relocator in the ld.so would not only help KDE, but also Gnome as the memory footprint could very well shrink for both projects.

If you have 213 dirty pages per application, that means that every task (except one) is about 800k bigger than it could've been. Not only that, that also means that you have to load another 800k of code from the disk, which adds to the start-up time if you were unlucky enough to have the content expire from the cache (and on small or low-memory systems, it's likely the case). If you assume that the average consumer drive can do 5-10MB/sec, reading off the drive platters, that's already a tenth of a second wasted right there. Having more free memory pages for cache never hurt anyone.

by dc (not verified)

Shouldn't that be "I'm so glad Waldo is working for KDE not Winblow$" ?
People, Micro$oft is our enemy, not Gnome!

by AC (not verified)

I like KDE, GNOME and Windows. M$ is not the enemy, there is no enemy.

by ac (not verified)

I'm so glad you make KDE users look like complete dick heads, and not GNOME users.

by KDE User (not verified)

Many have said this on LinuxToday already: This is one heck of a paper!

by Pyretic (not verified)

Interesting esp. the part about classes. How is this done in windows ? does it have a better c++, less classes ?

by Uhmmmm (not verified)

Windoze is written in C. C has no classes, so it doesn't have the same overhead.

by Charles Samuels (not verified)

No, C has the same problem, just less of it. Just remember that kmail has about 60 000 relocations, and the (fun) GTK game freeciv has a mere 1172.

But Windows does fix it, so even (e.g.,) Qt/Professional with nearly the same amount of relocations will start fast on windows. But this is only because Windows's dynamic linker isn't very versatile, while Unix's is very versatile (and therefore, slower).

by Christian Parpart (not verified)

You just can decide between a fast program or a small-sized program. C++ is mostly small-sized, if it has a good style of code reuse. That means, you've to call more function, more time to spend in.
-- But that doesn't mean, that you can't write fast code in c++.

Greetings,
Christian Parpart.

by Schwann (not verified)

One question I have is:
Isn't it possible to reduce the number of static functions in QT and KDE?
For me it looks like as static functions are "bad" functions.

by Johannes Sixt (not verified)

AFAIK, Windows DLLs have a "preferred address". If the dynamic linker finds that the preferred address is unallocated and there's enough room to take up the DLL at that address, it is loaded there, and no relocation is needed. Otherwise, it loads the DLL at a different address and relocates it.

I think that this is the best thing to do: When a library is linked, it has to be decided in some way what the preferred address is. This must take into account the shared objects that the library depends on.

by Erik Hensema (not verified)

DLLs do have a preferred address. However, as soon as a DLL has to be relocated, it can't be shared anymore, because windows has no concept of Posistion Independent Code (PIC): all function calls inside the DLL have to be changed when it is relocated.

Linux (and all other Unices I know of) does support PIC: it is essential to the ELF binary format, AFAIK. This means relocation isn't expensive: the pages of the libary won't have to be touched, making them shareable.

The thing both the Windows and Unix dynlinkers have to do is to resolve the calls made by an application to a library (or lib -> lib): the app makes a call to a fixed address, inside the relocation table. The dynlinker generates a call on this fixed address to the dynamically linked function.

The problem with KDE is the sheer number of function calls exposes the inefficiency of the dynlinker.

Loading a library on a preferred address may speed up this proces. It may break ELF though.

by Hi (not verified)

I didn't understand - all calls in PIC code, even inside a single DLL, are done indirectly using a PLT table ? Are there no relative call instructions in assembler !? But Windows simply modifies DLL code replacing all adresses ? - a text segment is writable ? Uff !

by remi (not verified)

If theres a PLT, why can't the dynlinker cache the plt and load the shared library the next time it's needed to the same address ?
I could even think of saving this cache to disc.

by Justin (not verified)

Excellent paper. I had always wondered about the performance of dynamic linking with C++, and library caching in general.

While the "kdeinit speed hack" is called a hack, it actually sounds like the right way to do it. What better way to keep the libraries loaded? It's possible that certain libraries could even be preloaded, like the filemanager components (IMO, the only application that really requires instantaneous loading).

On a side note, the only real other speed problem in KDE would be Konqueror's re-rendering of content. It takes a long time to load large pages, like for instance a huge message board. However, the re-rendering is killer when you've finished reading a post and you're clicking "back". Same goes for pages with many images (like Konq's thumbnail render). Perhaps these final page renderings should be cached somehow. Does anyone know how Netscape and IE go back and forth so fast between alread-visited views?

-Justin

by Jelmer Feenstra (not verified)

I completely agree with what you said. The filemanager/browser is one of the applications that really should start as fast as possible. Also going back and forward in the history is at present somewhat slow. Dirk Mueller is doing some nice optimizations concerning khtml at the moment, so I have the feeling this will be solved/fixed somehow. For me the main problem is drawing the page on my screen. On my fast hardware (not being under load at that moment) I can very often actually see the page (slashdot for example) being 'sweeped' on my screen from top to bottom. I really don't have a clue what is causing this behaviour (Hardware ? Qt ? Khtml ?) :(

Well, keep up the good work !

Jelmer

by Andrew Kar (not verified)

I wasn't sure if you were serious here or not...
Surely you are aware that konqueror has a cache just like other browsers? It has even been enhanced in 2.2 with auto-synching and offline-viewing mode.
You set it on the web-browsing proxy page. (Dont ask me! Its prob on the proxy page because a cache IS a proxy to all intents)

And the slow rendering you are talking about has nothing to do with rendering. Konq is acknowledged as having one of the fastest renderers in the world beating even IE. The delay you are talking about is just your slow net connection downloading the hundreds of entries in a post-list, Once downloaded it renders that almost instantly because it is mainlt text.

God I wish people would read their manuals. There are so many badly set up linux systems out there and all blame kde rather than their own lack of intrest in setting up properly, Now that would give more speed increase than optimising the whole dynamic loader!

by Justin (not verified)

Whoa there. You completely missed what I meant.

While Konq may render a page faster than other browsers (and it certainly does), and it caches the content, it does not cache the "render".

I was recently visiting a large SuSE forum that took almost a minute to load. This is tolerable, but it immediately became a problem when I read a comment and then clicked "back". I had to wait another minute as Konq re-rendered the forum. My solution? Create a split view and drag links to comments into the other view. A better solution? Konqueror should have a method for rapidly rendering previously viewed content. Perhaps storing the final rendered canvas into memory for the current session.

And don't worry, I read my manuals. I'm a programmer, after all. I might even want to contribute to solve this problem, but I'm hesitant since Dirk didn't accept one of my other patches.

-Justin

by John (not verified)

Depending on the color depth this could take a bit of memory.

How about a fast compression of the previous render stored in memory or disk?

by Wanting-not-to-... (not verified)

Why the fsck can Windows do this instantaneously?

IE loads in < 1 second. Rerenders pages instantaneously, and is viewable offline.

It just pisses me off the Konq can't do it yet. I love working in KDE, but once you have used KDE for a few days and you fire up Windows again (because my online banking does not work under Konq.) you then realise just how slow KDE is, in general, I am able to load IE, go to google and perform a search and be redirected to the first on the list before Konq. has even started.

I wait patiently, or not so :) for KDE to become the only OS I use :)

by Vladimir Annenkov (not verified)

The reason Windows GUI's are, in general, faster is that most graphics cards implement GDI calls in the hardware. Try disabling hardware acceleration in your graphics settings and you'll see exactly what I mean.

by Schwann (not verified)

X-Free is using the hardware-acceleration al well (if possible). So this should not be the reason

by test1 (not verified)

test, ignore

by test1 (not verified)

test, ignore

by test1 (not verified)

test, ignore

by test1 (not verified)

test, ignre

by test1 (not verified)

test, please ignore...

by test1 (not verified)

test

by test1 (not verified)

test

by test1 (not verified)

test

by test1 (not verified)

test

(for the curious: I'm trying to see how
a weblog copes with indentation related
to threading messages)

by test1 (not verified)

test

by Chris W (not verified)

PalmOS devices have been doing something like this for a while due to their limited memory and CPU horsepower. I believe iSilo (judging by its speed) does this. The open source Plucker for PalmOS documents a file format that is essentially a prerendered web page, and that saves both CPU and memory. Prerendering does not have to mean saving the entire bitmap. A compressed format that saves prerendered bitmaps, preparsed text and previously made formatting decision would save both space and time. It can be used in place of the raw html and jpg files.

by Chris W (not verified)

It can be done. It's not a Windows-specific feature either, as some have suggested to your post. Have you tried Opera? It's back/forward cache performance is lightning fast, way ahead of anything else on Linux. If you play with its settings, you will find a "cache rendered images" setting as well. So yes, something much like what you suggested can and has been done on Linux.

by parag (not verified)

A simple strace on konsole shows more than 200
failed opens (ENOENT). I.E. It fails to find the dynamic dependencies 200+ times. This transforms into so many open/access/stat calls which obviously slows down the startup. I just tried to adjust the links to SOs so that all required dependencies are resolved at first shot. Believe me, it speeds up things to a good extent. Can't anything be done to hardcode the paths in binaries at compile time? Also ld.so.preload of libc and libqt seems to be a good idea for "KDE
only" users.

-- Parag.

by kendroid (not verified)

I've been unable to turn up any links on the kdeinit speed hack (via Google) -- does anyone have information on this? It sounds like something I'd like to check out...

by Jason Katz-Brown (not verified)

kdelibs/kinit/*

its described in this paper -- did you read it?

Jason

by Jim Philips (not verified)

I read it and I can follow the link as well as you can. But there is nothing called "kdeinit" in this directory. So, the question stands: Where do you download kdeinit?

by Jason (not verified)

You already have kdeinit!

if u use kde, u already use kdeinit....

I was just noting u can read what kdeinit does by reading the .cpp files in kdelibs/kinit/

Kood luck

Jason

by Charles Samuels (not verified)

Basically, "kdeinit" is a daemon, initialized with LD_BIND_NOW, which causes it to be loaded entirely, all at once (rather than as the parts are required), then we dlopen the program, fork() kdeinit, and enter the program.

With LD_BIND_NOW, the relocations of all the kde libraries (that's a lot of relocations!) are done only once. And they won't be done again on that fork()

by Konrad Wojas (not verified)
by David B. Harris (not verified)

The kdeinit speed "hack" is mostly internal to KDE. However, you can use it with the 'kdeinit_wrapper' binary.

Run 'kdeinit_wrapper ', for instance: 'kdeinit_wrapper konqueror --profile webbrowsing'. Yeah, this is on the command-line.

by Nobody in particular (not verified)

A question springs to mind ...
Why would the per process memory footprint drop by 800k?
Does this mean that under normal conditions, some libraries get loaded twice?