Leon Bottou: Faster KDE Startups?

As a follow-up to Waldo Bastian's analysis of KDE startup times, Leon Bottou has implemented an inspired hack to improve the startup of C++ programs under GNU/Intel systems. "Waldo Bastian's document demonstrates that the current g++ implementation generates lots of expensive
run-time relocations. This translates into the slow startup of large C++
applications (KDE, StarOffice, etc.). The attached program "objprelink.c" is designed to reduce the problem. Expect startup times 30-50% faster." Update: 08/01 4:52 AM by N: Consult Leon's objprelink page for some great details and up-to-date information on this hack as well as on the prelinker mentioned by Bero. Thanks to freekde for the tip-off.

If I understand correctly, Leon's hack works around the problem by adding a level of indirection - a stub - to each function in a class's virtual table, and changing references to the function to point to the new stub instead -- thereby eliminating a whole lot of symbol lookups and relocations.

Check out Leon's email for the exact juicy details and for the Intel/GCC-specific C code of the program you will need to process object files before linking. One possible downside of this optimization is that virtual function invocations may now be slower due to the extra indirection involved.

And of course, no matter how brilliant the hack, we are still working around faults in the GNU linker. Apparently some work is going on in that area as well as can be seen in this email from Jakub Jelinek.

Dot Categories: 

Comments

by Asif Ali Rizwaan (not verified)

I am also bothered by KDE's not-so-fast performance. I recompiled KDE packages but didn't saw any improvement. But when I recompiled Linux Kernel 2.4-2 on RH 7.1, I saw 40% improvement in KDE + its apps.

This will be really great if it has no undesirible side effects.

But I use solaris at work, and of course gcc is used, so will this mechanism also work ( with the "minor" change of i386 to SUN - or whatever ) to speed up KDE on solaris ?

CPH

BTW: For any company trying to compete in the same space as KDE, the rate of advance because of the open source model, must be frightening them !

by Jeremy M. Jancsary (not verified)

Could this eventually be integrated into g++?

Man, this is really great news. Talking about HUGE performance improvements. Imagine applying this to server software written in C++ ...

Of course, it will be great for KDE, too, but that is not even the area where it will matter most, IMHO.

by Sam (not verified)

server software ???

The improvement is in program *startup* (or actually the time it takes for the program to find and move around all internal symbols that it needs), not overall performance.

Another package that I believe will benefit from this is Star Office and in some parts Net^H^HMozilla

/Sam

by Jeremy M. Jancsary (not verified)

OK, I'll have to explain this ... otherwise I might end up looking like an idiot.

I agree that what I wrote can easily be misunderstood :)

I was talking about CGI applications etc. Apps that will have to be started lots of times.

I suppose a website might be able to handle a lot more traffic if the underlying CGIs start up more quickly (I might be wrong of course).

by Jonathan Brugge (not verified)

I think you can do something like that already with apache + mod_perl...not sure though whether it's only for the perl-compilation or for both the compilation and the execution.

by Holstein (not verified)

It's for both.

There is several way to use mod_perl, but most of the time, you will use it to cache the compilation process of Perl on your script, and then you will simply re-call it the next time it is request. Then, your re-call of your script will be handled by mod_perl like a function call.

Subject says it all - I'm running on a completely prelinked system these days.

Source available at
prelink-0.1.3-2.src.rpm

You'll also need the corresponding binutils patches, part of
binutils-2.11.90.0.8-5.src.rpm

by Karl Garrison (not verified)

The pub is missing from the above link:

ftp://rawhide.redhat.com/pub/redhat/linux/rawhide/SRPMS/SRPMS

I'm trying it out now. :-)

-Karl

by Navindra Umanee (not verified)

Thanks, might have been my fault.

-N.

by Karl Garrison (not verified)

Does KDE have to be rebuilt to see the effects of this? I installed it, and startup times do seem faster, but it may just be wishful thinking on my part. ;-)

-Karl

No need to recompile - you need to prelink the
applications though (run prelink --all).

by Timothy R. Butler (not verified)

So you can run this on a binary install of KDE? I might give it a try after all then...

Thanks,
Tim

by Timothy R. Butler (not verified)

I just realized the program you refer to is different then the one in the news article. Does it do the same thing?

Thanks,
Tim

Anyone know if a debian version of this hack exists ??

by Timothy R. Butler (not verified)

If you follow the link to Leon's original message you will find the source code for the hack.

-Tim

I wouldn't know what to do with it, even if you payed me...

Yes, I'm stupid - but I have a urge for speed

No you are lazy... all the instructions are on Leon's webpage including how to compile the objprelink.c file. And about being stupid: even you can copy and paste the gcc line into a konsole and copy the resulting objprelink executable into /bin, /usr/bin/ or /usr/local/bin, where ever you want it.

stupid question: is glibc 2.2 required or is 2.1 enough

It doesn't work on my debian potato system(glibc 2.1). But I think it's binutils, which are not new enough.

I updated my binutils to 2.11. Now it works.
It's really fast.

I tried it on mandrake 7.2 glibc 2.1.3 with newest binutils (2.11.0.8) and libelf(0.7). Wouldn't compile (some missing declarations, STV_DEFAULT and others) after some playing to include these declarations from binutils it compiled and runs. prelink with the n option (dryrun) seems to work fine, but if I want to prelink for real it bails out with something like: no space for dynamic.
Whatever that means...

I think it could be made to work, but prolly has no real priority since everyone goes to 2.2...

Danny

Tried going to the links - get "unable to login" messages?? Do you have to be a registered Rad Hat customer? I'm running Mandrake v8.0 w/ KDE v2.1.1. Is there anywhere else I might find it??

Thanks -

I get a lot of errors about not having enough room to add .dynamic entry
This seems to happen because of an empty .bss and/or .sbss in the library

Any suggestions?

Thanks

Hmm, trying to install prelink-0.1.3-2.src.rpm, by doing a rpm --rebuild - I get

cxx.c:200: `STV_DEFAULT' undeclared (first use in this function)

( amongst some other warnings. ) This is a RH6.1 based system, although with many updated bits ( including libelf 0.7.) Any ideas? ( I also tried objprelink, which compiles but seg faults. )

I think its due to glibc 2.1. I could make it compile but after that it still doesn't work (see my other post).

Danny

Yep, installed glibc-2.2.3 and it works fine now... except that nothing will prelink because ld-linux-2.2.3 won't prelink ( "not enough room to add .dynamic entry" ) :-(

hey..than it wasn't due to 2.1.3 since I managed to compile it on 2.1.3 but also get this .dynamic error(and thought it to be due to glibc). If it happens in 2.2 as well it must be something else? Maybe I'll send a mail to the author this evening.

Danny

by KDE User (not verified)

Already people are reporting great speedups with this hack. Everyone seems in favor of including it in KDE 2.2. What does this mean for KDE Init and for distributions with prelinking already? Is it still worthwhile?

by ced (not verified)

seems this trick has lots of advantages (speed especially), so why not always compile kde with the speed improvements from now on?

KDE is better than any other WM, EXCEPT when launching applications (it's so slow!).

If we can improve KDE's speed by up to 50%, then all new release should be tuned like this (I really dunno why all of a sudden KDE is capable to be so fast and that nobody discovered or put it on focus before)

by Carbon (not verified)

>nobody discovered or put it on focus before

well, people have been talking about it for a while, actually. I believe there was a dot article about it a while back

by Craig (not verified)

Texstar has some kde 2.2 beta Mandrake 8.0 rpms built with the new code. You can get them at www.pclinuxonline.com

Craig

Hmm, trying to install prelink-0.1.3-2.src.rpm, by doing a rpm --rebuild - I get

cxx.c:200: `STV_DEFAULT' undeclared (first use in this function)

( amongst some other warnings. ) This is a RH6.1 based system, although with many updated bits ( including libelf 0.7.) Any ideas? ( I also tried objprelink, which compiles but seg faults. )

by dc (not verified)

Does this works in C too?

by Tschortsch (not verified)

No

by Fredrik Corneliusson (not verified)

While we are talking about speed, has there been any improvement on image rendering/decoding, last time I checked (2.1.1) Konqueror and Pixie where unusable as a thumbnail viewers because of the horrible preview speed. I believe they both use the same libs(Qt or KDE core?) for this. Would’nt there be a big performance boost for the whole environment if it were to be optimised (or at least for Konqueror and Pixie).

by Mosfet (not verified)

I'm not sure if it got into 2.1 or not, but Pixie's thumbnail manager has supported load on demand for quite some time that's extremely fast when browsing existing thumbnails. I've also just implemented load on demand for mimetype data as well, so you can enter a directory of > 2000 thumbnailed images (I took all my photos and makde a bunch of copies ;-) and start browsing any thumbnail essentially immediately. It used to take around 5-6 seconds, not bad but this is even better. It's faster than anything else I've been able to compare it to, both on Linux and Windows. A new version should be released in about a week. If load on demand wasn't implemented in KDE 2.1, I strongly suggest you upgrade. You'll get a new UI and other goodies as well.

by Fredrik Corneliusson (not verified)

Hi mosfet,
I was not commenting on the speed of viewing existing the thumbnails, sorry if that was unclear.
It is the speed that KDE handles pics, if you for example click on an jpeg image in konq you can see it gradually appearing, but for example in GQview it's displayed immediately. I don’t know what makes it so could it be kio? But I seem to remember a thread on the mailing list concerning poor performance in KDE image libs, no optimised ASM code for instance.
I’ll check out Pixie as soon as possible, will the new release be based on KDE 2.2?

by Mosfet (not verified)

Well, you said thumbnails, so you were pretty unclear ;-) Your seeing it slower in Konq because it's incrementally loading and rendering it. Good for web based images, bad for local files. Use a different component for viewing images, not the HTML widget (which is what you have it set to ;-).

This was never an issue with Pixie, which never did incremental loading (it's be nice to add for remote images, tho). I don't think you used it much... As far as ASM and other things for loading images, that won't help at all. The main bottleneck in loading images is disk. It could help for things like smoothscaling thumbnails, but 2/3 of the time is spent in disk I/O (I checked), so not much. The "poor performance" of KDE/QT image loading is mostly people not knowing what they are talking about. For example, both Qt and imlib both call libgif in essentially the same way, same for libjpeg, libpng, etc... for loading data.

by Mosfet (not verified)

BTW, sorry for bad grammar, 6:10am and I haven't slept yet >:) Working on Pixie ;-)

by Mike (not verified)

I can't imagine that looking up even a few tens of thousand symbols in a symbol table should make any appreciable difference in program startup time for a properly implemented symbol table.

While this is a neat hack, it sounds to me that the problem is not with g++ but with the data structures that the runtime system uses for relocation (linear search?). Probably that should get fixed, and that would speed things up generally, not just in this special case.

by Kuba (not verified)

Well, the startup time of non-prelinked binary, at least on my machine, is mostly filled with hard drive seek tests (double 450Mhz PIII, kernel.org linux-2.4.5). I wonder whether prelinking doesn't streamline some disk accesses at the same time by coincidence, maybe just by not referring to the pages which don't need to be accessed at startup. I imagine that if the relocation tables are spread around the binary, there will be a decent amount of seeking at startup, just to get the right pages in.

Well, my assumption is that linux does some memory<->disk mapping of binaries' pages, if it doesn't then I'm obviously wrong.

Would linear search be sooo slow given that there really aren't that many symbols to look-up (I doubt it's tens of thousands). I imagine that a typical symbol table would be - well, the one in libc-2.2 is about 2k symbols. Maybe that really goes up to tens of thousands for kde+qt apps??? :-(

by Ben Ploni (not verified)

KDE apps tend to have 50000+ symbol resolves.

run this (no quotes):
"LD_DEBUG=statistics kwrite"

by Holger Lehmann (not verified)

I ran a couple of tests with amazing results:

holle@chaos:~/.p > LD_DEBUG=statistics /opt/kde/bin/kedit
01109: number of relocations: 14053
holle@chaos:~/.p > LD_DEBUG=statistics /opt/kde2/bin/kedit
01110: number of relocations: 47329
holle@chaos:~/.p > LD_DEBUG=statistics /opt/gnome/bin/gedit
01111: number of relocations: 13878
01113: number of relocations: 1466
01113: number of relocations: 794

So Gnome (version 1.2.1) had about the same amount of relocations as KDE1 (1.1.2). The big hit came with KDE2 (2.1.1/2). This is, so I think, directly related to the DCOP stuff and all the other things going on in the background. Look at kwrite from KDE2 starting:

holle@chaos:~/.p > LD_DEBUG=statistics /opt/kde2/bin/kwrite
01151: number of relocations: 51023
01152: number of relocations: 1466
01152: number of relocations: 46994
DCOPServer up and running.

Now that is a lot ...

I think we need to streamline the API a little bit. Make more use of inline functions and try to get rid of function duplicates i.e. two functions doing mainly the same thing.
Maybe we can come up with a late binding feature like python has, where the functions code gets bound at the very moment it is used and not earlier (and for again and again for python ...)

- Holger

by Rik Hemsley (not verified)

Using inlining for methods is not a good idea for
a C++ library. Great for apps, bad for libs. Think
of what happens when you try to change the
implementation later. I needed to change some
kstyle* stuff a while ago and couldn't. Argh.

Rik

by Mosfet (not verified)

Sorry, KStyle has no inline methods... doh! KThemeStyle does, but that is not called by any other applications, only dynamically loaded by the theme engine. Either way, inline methods are very common in libs (look at Qt: grep inline *h | wc --lines gives you 697 occurances).

by Rik Hemsley (not verified)

And it is KThemeStyle which is the problem.
I wrote a global pixmap server for KDE, with
the intention of alleviating the overhead
generated by KThemeStyle when loading
pixmaps on app start.

Then I found that I couldn't re-implement
parts of KThemeStyle, so now this has to
wait for KDE 3.

Rik

by Mosfet (not verified)

Write a new style plugin based off of KThemeStyle, it's a plugin that provides the theme engine, remember. Those headers are included in the KDE libraries simply so people could derive from them, but no one ever did (people wrote very few styles period).

If you really do have a style you can release it today.

Your change would of also most certainly required private and protected member and data changes anyways, so still doesn't make an argument against inline methods, which are used hundreds of times in both KDE at QT.
Should we dump private and protected members as well?

by Rik Hemsley (not verified)

We already did dump private members by
using the 'pimpl' paradigm, for this exact
reason.

Putting code in headers causes BC issues
later on. Don't do it.

Rik

by Mosfet (not verified)

A) This does not prevent you from making a new KStyle or KThemeStyle, as you claimed. I made very sure you can do anything you want with the plugin mechanism and saying BC prevents you from doing anything is just incorrect.

B) KDE headers currently include 1,797 incidents of inline methods. They are a very good way to optimize code and are the equivalent of #define macros in C. Dropping them isn't what I'd recommend to any developer, unless if you like unneeded method call overhead.

Just correcting some falsehoods...