Leon Bottou: Faster KDE Startups?

Friday, 27 July 2001 | Numanee

As a follow-up to Waldo Bastian's analysis of KDE startup times, Leon Bottou has implemented an inspired hack to improve the startup of C++ programs under GNU/Intel systems. "Waldo Bastian's document demonstrates that the current g++ implementation generates lots of expensive run-time relocations. This translates into the slow startup of large C++ applications (KDE, StarOffice, etc.). The attached program "objprelink.c" is designed to reduce the problem. Expect startup times 30-50% faster." Update: 08/01 4:52 AM by N: Consult Leon's objprelink page for some great details and up-to-date information on this hack as well as on the prelinker mentioned by Bero. Thanks to freekde for the tip-off.

If I understand correctly, Leon's hack works around the problem by adding a level of indirection - a stub - to each function in a class's virtual table, and changing references to the function to point to the new stub instead -- thereby eliminating a whole lot of symbol lookups and relocations.

Check out Leon's email for the exact juicy details and for the Intel/GCC-specific C code of the program you will need to process object files before linking. One possible downside of this optimization is that virtual function invocations may now be slower due to the extra indirection involved.

And of course, no matter how brilliant the hack, we are still working around faults in the GNU linker. Apparently some work is going on in that area as well as can be seen in this email from Jakub Jelinek.

Comments:

Kernel re-Compile will also speed up KDE - Asif Ali Rizwaan - 2001-07-27

I am also bothered by KDE's not-so-fast performance. I recompiled KDE packages but didn't saw any improvement. But when I recompiled Linux Kernel 2.4-2 on RH 7.1, I saw 40% improvement in KDE + its apps.

Exceptional! But what about non intel platforms ? - CPH - 2001-07-27

This will be really great if it has no undesirible side effects. But I use solaris at work, and of course gcc is used, so will this mechanism also work ( with the "minor" change of i386 to SUN - or whatever ) to speed up KDE on solaris ? CPH BTW: For any company trying to compete in the same space as KDE, the rate of advance because of the open source model, must be frightening them !

G++? - Jeremy M. Jancsary - 2001-07-27

Could this eventually be integrated into g++? Man, this is really great news. Talking about HUGE performance improvements. Imagine applying this to server software written in C++ ... Of course, it will be great for KDE, too, but that is not even the area where it will matter most, IMHO.

Re: G++? - Sam - 2001-07-27

server software ??? The improvement is in program *startup* (or actually the time it takes for the program to find and move around all internal symbols that it needs), not overall performance. Another package that I believe will benefit from this is Star Office and in some parts Net^H^HMozilla /Sam

Re: G++? - Jeremy M. Jancsary - 2001-07-27

OK, I'll have to explain this ... otherwise I might end up looking like an idiot. I agree that what I wrote can easily be misunderstood :) I was talking about CGI applications etc. Apps that will have to be started lots of times. I suppose a website might be able to handle a lot more traffic if the underlying CGIs start up more quickly (I might be wrong of course).

Re: G++? - Jonathan Brugge - 2001-07-30

I think you can do something like that already with apache + mod_perl...not sure though whether it's only for the perl-compilation or for both the compilation and the execution.

Re: G++? - Holstein - 2001-07-30

It's for both. There is several way to use mod_perl, but most of the time, you will use it to cache the compilation process of Perl on your script, and then you will simply re-call it the next time it is request. Then, your re-call of your script will be handled by mod_perl like a function call.

prelinking in the linker is working these days. - bero - 2001-07-27

Subject says it all - I'm running on a completely prelinked system these days. Source available at <a href="ftp://rawhide.redhat.com/pub/redhat/linux/rawhide/SRPMS/SRPMS/prelink-0.1.3-2.src.rpm">prelink-0.1.3-2.src.rpm</a> You'll also need the corresponding binutils patches, part of <a href="ftp://rawhide.redhat.com/pub/redhat/linux/rawhide/SRPMS/SRPMS/binutils-2.11.90.0.8-5.src.rpm">binutils-2.11.90.0.8-5.src.rpm</a>

Corrected link - Karl Garrison - 2001-07-30

The pub is missing from the above link: ftp://rawhide.redhat.com/pub/redhat/linux/rawhide/SRPMS/SRPMS I'm trying it out now. :-) -Karl

Re: Corrected link - Navindra Umanee - 2001-07-30

Thanks, might have been my fault. -N.

Re: prelinking in the linker is working these days - Karl Garrison - 2001-07-30

Does KDE have to be rebuilt to see the effects of this? I installed it, and startup times do seem faster, but it may just be wishful thinking on my part. ;-) -Karl

Re: prelinking in the linker is working these days - bero - 2001-07-30

No need to recompile - you need to prelink the applications though (run prelink --all).

Re: prelinking in the linker is working these days - Timothy R. Butler - 2001-07-30

So you can run this on a binary install of KDE? I might give it a try after all then... Thanks, Tim

Re: prelinking in the linker is working these days - Timothy R. Butler - 2001-07-31

I just realized the program you refer to is different then the one in the news article. Does it do the same thing? Thanks, Tim

Re: prelinking in the linker is working these days - Lovechild - 2001-07-31

Anyone know if a debian version of this hack exists ??

Re: prelinking in the linker is working these days - Timothy R. Butler - 2001-07-31

If you follow the link to Leon's original message you will find the source code for the hack. -Tim

Re: prelinking in the linker is working these days - Lovechild - 2001-07-31

I wouldn't know what to do with it, even if you payed me... Yes, I'm stupid - but I have a urge for speed

Re: prelinking in the linker is working these days - lafaard - 2001-08-02

No you are lazy... all the instructions are on Leon's webpage including how to compile the objprelink.c file. And about being stupid: even you can copy and paste the gcc line into a konsole and copy the resulting objprelink executable into /bin, /usr/bin/ or /usr/local/bin, where ever you want it.

Re: prelinking in the linker is working these days. - aleXXX - 2001-07-30

stupid question: is glibc 2.2 required or is 2.1 enough

Re: prelinking in the linker is working these days. - Tschortsch - 2001-07-31

It doesn't work on my debian potato system(glibc 2.1). But I think it's binutils, which are not new enough.

Re: prelinking in the linker is working these days. - Tschortsch - 2001-08-02

I updated my binutils to 2.11. Now it works. It's really fast.

Re: prelinking in the linker is working these days - Danny - 2001-07-31

I tried it on mandrake 7.2 glibc 2.1.3 with newest binutils (2.11.0.8) and libelf(0.7). Wouldn't compile (some missing declarations, STV_DEFAULT and others) after some playing to include these declarations from binutils it compiled and runs. prelink with the n option (dryrun) seems to work fine, but if I want to prelink for real it bails out with something like: no space for dynamic. Whatever that means... I think it could be made to work, but prolly has no real priority since everyone goes to 2.2... Danny

Re: prelinking in the linker is working these days. - Greg - 2001-07-31

Tried going to the links - get "unable to login" messages?? Do you have to be a registered Rad Hat customer? I'm running Mandrake v8.0 w/ KDE v2.1.1. Is there anywhere else I might find it?? Thanks -

Re: prelinking in the linker is working these days. - Alin Vaida - 2001-07-31

I get a lot of errors about not having enough room to add .dynamic entry This seems to happen because of an empty .bss and/or .sbss in the library Any suggestions? Thanks

prelink-0.1.3-2.src.rpm won't install, objprelink - Adam Hill - 2001-07-30

Hmm, trying to install prelink-0.1.3-2.src.rpm, by doing a rpm --rebuild - I get cxx.c:200: `STV_DEFAULT' undeclared (first use in this function) ( amongst some other warnings. ) This is a RH6.1 based system, although with many updated bits ( including libelf 0.7.) Any ideas? ( I also tried objprelink, which compiles but seg faults. )

Re: prelink-0.1.3-2.src.rpm won't install, objprel - Danny - 2001-07-31

I think its due to glibc 2.1. I could make it compile but after that it still doesn't work (see my other post). Danny

Re: prelink-0.1.3-2.src.rpm won't install, objprel - Adam Hill - 2001-07-31

Yep, installed glibc-2.2.3 and it works fine now... except that nothing will prelink because ld-linux-2.2.3 won't prelink ( "not enough room to add .dynamic entry" ) :-(

Re: prelink-0.1.3-2.src.rpm won't install, objprel - Danny - 2001-08-01

hey..than it wasn't due to 2.1.3 since I managed to compile it on 2.1.3 but also get this .dynamic error(and thought it to be due to glibc). If it happens in 2.2 as well it must be something else? Maybe I'll send a mail to the author this evening. Danny

vs KDE Init & Prelinking? - KDE User - 2001-07-30

Already people are reporting great speedups with this hack. Everyone seems in favor of including it in KDE 2.2. What does this mean for KDE Init and for distributions with prelinking already? Is it still worthwhile?

always compile that speed up stuff - ced - 2001-07-30

seems this trick has lots of advantages (speed especially), so why not always compile kde with the speed improvements from now on? KDE is better than any other WM, EXCEPT when launching applications (it's so slow!). If we can improve KDE's speed by up to 50%, then all new release should be tuned like this (I really dunno why all of a sudden KDE is capable to be so fast and that nobody discovered or put it on focus before)

Re: always compile that speed up stuff - Carbon - 2001-07-31

>nobody discovered or put it on focus before well, people have been talking about it for a while, actually. I believe there was a dot article about it a while back

Mandrake rpms - Craig - 2001-07-30

Texstar has some kde 2.2 beta Mandrake 8.0 rpms built with the new code. You can get them at <a href="http://www.pclinuxonline.com/">www.pclinuxonline.com</a> Craig

prelink-0.1.3-2.src.rpm won't install, objprelink - Adam Hill - 2001-07-30

Re: - dc - 2001-07-31

Does this works in C too?

Re: - Tschortsch - 2001-07-31

Another Speed Issue - Fredrik Corneliusson - 2001-07-31

While we are talking about speed, has there been any improvement on image rendering/decoding, last time I checked (2.1.1) Konqueror and Pixie where unusable as a thumbnail viewers because of the horrible preview speed. I believe they both use the same libs(Qt or KDE core?) for this. Wouldnt there be a big performance boost for the whole environment if it were to be optimised (or at least for Konqueror and Pixie).

Huh? Pixie ain't slow. - Mosfet - 2001-07-31

I'm not sure if it got into 2.1 or not, but Pixie's thumbnail manager has supported load on demand for quite some time that's extremely fast when browsing existing thumbnails. I've also just implemented load on demand for mimetype data as well, so you can enter a directory of > 2000 thumbnailed images (I took all my photos and makde a bunch of copies ;-) and start browsing any thumbnail essentially immediately. It used to take around 5-6 seconds, not bad but this is even better. It's faster than anything else I've been able to compare it to, both on Linux and Windows. A new version should be released in about a week. If load on demand wasn't implemented in KDE 2.1, I strongly suggest you upgrade. You'll get a new UI and other goodies as well.

Re: Huh? Pixie ain't slow. - Fredrik Corneliusson - 2001-07-31

Hi mosfet, I was not commenting on the speed of viewing existing the thumbnails, sorry if that was unclear. It is the speed that KDE handles pics, if you for example click on an jpeg image in konq you can see it gradually appearing, but for example in GQview it's displayed immediately. I dont know what makes it so could it be kio? But I seem to remember a thread on the mailing list concerning poor performance in KDE image libs, no optimised ASM code for instance. Ill check out Pixie as soon as possible, will the new release be based on KDE 2.2?

Re: Huh? Pixie ain't slow. - Mosfet - 2001-07-31

Well, you said thumbnails, so you were pretty unclear ;-) Your seeing it slower in Konq because it's incrementally loading and rendering it. Good for web based images, bad for local files. Use a different component for viewing images, not the HTML widget (which is what you have it set to ;-). This was never an issue with Pixie, which never did incremental loading (it's be nice to add for remote images, tho). I don't think you used it much... As far as ASM and other things for loading images, that won't help at all. The main bottleneck in loading images is disk. It could help for things like smoothscaling thumbnails, but 2/3 of the time is spent in disk I/O (I checked), so not much. The "poor performance" of KDE/QT image loading is mostly people not knowing what they are talking about. For example, both Qt and imlib both call libgif in essentially the same way, same for libjpeg, libpng, etc... for loading data.

Re: Huh? Pixie ain't slow. - Mosfet - 2001-07-31

BTW, sorry for bad grammar, 6:10am and I haven't slept yet >:) Working on Pixie ;-)

Seems odd to me. - Mike - 2001-07-31

I can't imagine that looking up even a few tens of thousand symbols in a symbol table should make any appreciable difference in program startup time for a properly implemented symbol table. While this is a neat hack, it sounds to me that the problem is not with g++ but with the data structures that the runtime system uses for relocation (linear search?). Probably that should get fixed, and that would speed things up generally, not just in this special case.

Re: Seems odd to me. - Kuba - 2001-07-31

Well, the startup time of non-prelinked binary, at least on my machine, is mostly filled with hard drive seek tests (double 450Mhz PIII, kernel.org linux-2.4.5). I wonder whether prelinking doesn't streamline some disk accesses at the same time by coincidence, maybe just by not referring to the pages which don't need to be accessed at startup. I imagine that if the relocation tables are spread around the binary, there will be a decent amount of seeking at startup, just to get the right pages in. Well, my assumption is that linux does some memory<->disk mapping of binaries' pages, if it doesn't then I'm obviously wrong. Would linear search be sooo slow given that there really aren't that many symbols to look-up (I doubt it's tens of thousands). I imagine that a typical symbol table would be - well, the one in libc-2.2 is about 2k symbols. Maybe that really goes up to tens of thousands for kde+qt apps??? :-(

Re: Seems odd to me. - Ben Ploni - 2001-08-03

KDE apps tend to have 50000+ symbol resolves. run this (no quotes): "LD_DEBUG=statistics kwrite"

Re: Seems odd to me. - Holger Lehmann - 2001-08-04

I ran a couple of tests with amazing results: holle@chaos:~/.p > LD_DEBUG=statistics /opt/kde/bin/kedit 01109: number of relocations: 14053 holle@chaos:~/.p > LD_DEBUG=statistics /opt/kde2/bin/kedit 01110: number of relocations: 47329 holle@chaos:~/.p > LD_DEBUG=statistics /opt/gnome/bin/gedit 01111: number of relocations: 13878 01113: number of relocations: 1466 01113: number of relocations: 794 So Gnome (version 1.2.1) had about the same amount of relocations as KDE1 (1.1.2). The big hit came with KDE2 (2.1.1/2). This is, so I think, directly related to the DCOP stuff and all the other things going on in the background. Look at kwrite from KDE2 starting: holle@chaos:~/.p > LD_DEBUG=statistics /opt/kde2/bin/kwrite 01151: number of relocations: 51023 01152: number of relocations: 1466 01152: number of relocations: 46994 DCOPServer up and running. Now that is a lot ... I think we need to streamline the API a little bit. Make more use of inline functions and try to get rid of function duplicates i.e. two functions doing mainly the same thing. Maybe we can come up with a late binding feature like python has, where the functions code gets bound at the very moment it is used and not earlier (and for again and again for python ...) - Holger

Re: Seems odd to me. - Rik Hemsley - 2001-08-04

Using inlining for methods is not a good idea for a C++ library. Great for apps, bad for libs. Think of what happens when you try to change the implementation later. I needed to change some kstyle* stuff a while ago and couldn't. Argh. Rik

Re: Seems odd to me. - Mosfet - 2001-08-04

Sorry, KStyle has no inline methods... doh! KThemeStyle does, but that is not called by any other applications, only dynamically loaded by the theme engine. Either way, inline methods are very common in libs (look at Qt: grep inline *h | wc --lines gives you 697 occurances).

Re: Seems odd to me. - Rik Hemsley - 2001-08-04

And it is KThemeStyle which is the problem. I wrote a global pixmap server for KDE, with the intention of alleviating the overhead generated by KThemeStyle when loading pixmaps on app start. Then I found that I couldn't re-implement parts of KThemeStyle, so now this has to wait for KDE 3. Rik

Re: Seems odd to me. - Mosfet - 2001-08-04

Write a new style plugin based off of KThemeStyle, it's a plugin that provides the theme engine, remember. Those headers are included in the KDE libraries simply so people could derive from them, but no one ever did (people wrote very few styles period). If you really do have a style you can release it today. Your change would of also most certainly required private and protected member and data changes anyways, so still doesn't make an argument against inline methods, which are used hundreds of times in both KDE at QT. Should we dump private and protected members as well?

Re: Seems odd to me. - Rik Hemsley - 2001-08-04

We already did dump private members by using the 'pimpl' paradigm, for this exact reason. Putting code in headers causes BC issues later on. Don't do it. Rik

Re: Seems odd to me. - Mosfet - 2001-08-04

A) This does not prevent you from making a new KStyle or KThemeStyle, as you claimed. I made very sure you can do anything you want with the plugin mechanism and saying BC prevents you from doing anything is just incorrect. B) KDE headers currently include 1,797 incidents of inline methods. They are a very good way to optimize code and are the equivalent of #define macros in C. Dropping them isn't what I'd recommend to any developer, unless if you like unneeded method call overhead. Just correcting some falsehoods...

Re: Seems odd to me. - Rik Hemsley - 2001-08-04

You're right, I could make a new style plugin. I had forgotten that I could change the plugin used by a 'themed style' by altering the .themerc file. I _would_ recommend dropping out inlined methods wherever you can. Only keep them in for code which is on a critical path. If you fill your headers with code, no-one can override that code later. Optimisation rule #1: Don't. Not until you've measured. Rik

Re: Seems odd to me. - Mosfet - 2001-08-04

If you do what your recommending you have apps (that may or may not be in a critical loop, how are you to know as a library developer?), in absurd situations like being in loops making method calls for things like "a=b;" and "++i;". This is insane and may be acceptable to you, but certainly isn't to me and judging by it's use not acceptable to others as well. This is exactly why people use inline. Also, you can't rely on the compiler optimizing small methods inline automatically like you would if a C function did the same thing.

Doesn't seem too odd for me - Joe Zbiciak - 2002-09-26

One thing to keep in mind is that each relocation also results in a dirty page. If the relocations are spread throughout the application, then you're going to generate a lot of page faults very quickly, and make many trips into and out of the kernel. If any of these pages would be sharable, they get COW'd (Copy on Write). Ouch. A secondary benefit of prelinking is that apps which use the same libraries are more likely to share pages. This reduces system memory footprint and regains one of the key advantages of shared libraries.

? - me - 2001-08-06

Anybody has a solution for the "no space for .dynamic" error???