An Analysis of KDE Speed

Our recent poll (courtesy KDE.com) on the upcoming KDE 2.2 suggests that the area of
greatest concern for KDE users is speed -- at this time, out of 3,463 votes, over 24% consider speed as most important for developers to address. Waldo Bastian, who developed the kdeinit speed hack among other things, has written a paper entitled "Making C++ ready for the desktop", in which he analyzes the various startup phases of a C++ program. Noting that one component of linking -- namely, library relocations -- is currently slow, he offers some suggestions for optimizations. An interesting read.

Dot Categories: 

Comments

by Jeff Brybajer (not verified)

When kde is compiled using --enable-debug, what exactly does this change as far as speed or memory usage?

What I have always compared it to is the .dbg files generated in visual studio. I know relatively little about them but they enable you to pinpoint problems with a debugger, enabling you to view the call stack. These files are not part of the normal executable, but enable you to get the debug symbols without the extra memory usage. Is this a wrong anology, and what would prevent kde from compiling debugging information to a file?
I believe this would be beneficial if implemented in kde so people would not have to compile from source code to be able to give backtraces when submitting a bug reports and then not have to deal with any extra memory usage that having them built in?

by Erik (not verified)

What I started to do recently is to compile everything with --enable-debug so that the binaries contain the symbols. Then I install it (to /usr/local/kde) and execute "strip bin/* lib/*". This saves a LOT of disk space. If I need to report a bug with backtrace I only have to reinstall the package, not recompile it.

by stephane PETITHOMME (not verified)

I am not so sure about what i will suggest here, Comment are welcome:

- It is possible to change ld algorithm so that:

-- Any share library is loaded at boot time, relocated to a different space, and saved back to the disk using this default adressing (may be replacing the original library??).ld will reserve these adress spacea for ever, making it impossible for any library not in ld normal cache to used it.

-- this does not need to be executed at each reboot, as probability is high that nothing as changed...

-- Any furhter attempt to uses this library is done from the relocated version. There is not need to do the job anymore.

-- any configuration changes (Add library, ...) is performed as usual... or as during boot phase to improve the mechanism.

The previous idea as the advantage to ensure:
- maximum usage of share pages (All library are always at the same location)
- slow a little bit he boot process, especially after new installation.
- reduce loading time.

Any comment??

Stephane

by Waldo Bastian (not verified)

Yes, something like that would work.

Cheers,
Waldo

by Johan Veenstra (not verified)

I see a great deal of potential in this

Give the libraries their own place in memory so:

- nothing has to be relocated.
- libraries can be precached.

Doesn't solve the 800kb of duplicated vtables per kde program, but heh, it's a start.

regards,

ohan Veenstra

by Johan Veenstra (not verified)

I see a great deal of potential in this

Give the libraries their own place in memory so:

- nothing has to be relocated.
- libraries can be precached.

Doesn't solve the 800kb of duplicated vtables per kde program, but heh, it's a start.

regards,

Johan Veenstra

by george moudry (not verified)

The relocation problem exists on windows also.
But to avoid relocations on every app start, microsoft released a tool which pre-relocates each DLL (.so) to a different address, so they don't collide. Here's a link to MS Systems Journal: bugslayer
((Hmmm sorry URL is funny - I only see Plain encoding in the submit form??))

Maybe a similar utility could work for linux, and maybe it already exists? Then SuSe and other distros would run it before packaging their product, and we could save the half-second.
Hope this is correct,
/george

by Jan Hanstede (not verified)

I'm not really into the technical stuff of KDE so this may not make sence but
isn't it an idea to get al the GUI and render stuff hardware accelerated? I believe Enlightenment is doing this and I suppose this will improve the speed dramatically.

by Johann Lermer (not verified)

Hmm. In Waldo's tables not only linking is the performance killer, but also the application specific initialization, isn't it? I mean, how can a simple application like KEdit consume 0.48 seconds for it's initialisation? Or is KEdit more complicated than I always thought?

by Nils Holland (not verified)

Well, ok, probably the speed of KDE can be improved. However, from my own experience, I can say that I have used KDE 2.1.1 on everything from a K6-2 with 300 Mhz and 96 MB of RAM to an AMD Athlon at 1000 Mhz and 512 MB of RAM. On all of these machines it ran fast enough for my taste.

In the KDE poll, I voted for improvements in KOffice. Why? Well, konqueror has already replaced Netscape as my default browser since I find it really great. What I'm still missing is a KOffice that is able to replace StarOffice as my default office suite. That's what I believe is one of the most important things in the future of KDE. Of course, this does not mean that speed is not an issue that should be looked at, but I'd rather wait some time for a desktop environment with some really great features than to have some fast desktop that lacks some very useful things.

Bottom line is: It's probably a bad idea to work only on improving speed or only on improving KOffice. A broad range of improvements in all areas is probably what will make people happy in the end.

Greetings,
Nils

by Robbin Bonthond (not verified)

Ohyeah, KDE needs to be alot faster !

I mean, emacs is so much faster on my 386 DX40 with 8mb memory (though I am thinking of buying a color monitor....)

If you do not have the money to buy a new machine to match KDE performance, than run a window manager that was designed to run on slow hardware like blackbox !

Or stop whining and start to help the KDE people so it will run on your anchient hardware !

Sorry about this but it is getting a bit anoying to see people complaining about something they get for free (and can help to improve)

by oliv (not verified)

people are not complaining here. they answer to a poll from KDE.

by fsa (not verified)

i totally agree

by Robbin Bonthond (not verified)

Perhaps it is possible to make a program that benchmarks several parts of KDE. Also it could have the abilitie to post it to a database on the web, along with information about the hardware in the machine that KDE is running on, so we all can see how a certain KDE version reacts on different configurations ? At least it would help to get a better idea of the "speed of KDE"

by Robbin Bonthond (not verified)

Perhaps it is possible to make a program that benchmarks several parts of KDE. Also it could have the abilitie to post it to a database on the web, along with information about the hardware in the machine that KDE is running on, so we all can see how a certain KDE version reacts on different configurations ? At least it would help to get a better idea of the "speed of KDE"

by Robbin Bonthond (not verified)

Perhaps it is possible to make a program that benchmarks several parts of KDE. Also it could have the abilitie to post it to a database on the web, along with information about the hardware in the machine that KDE is running on, so we all can see how a certain KDE version reacts on different configurations ? At least it would help to get a better idea of the "speed of KDE"

by Robbin Bonthond (not verified)

Perhaps it is possible to make a program that benchmarks several parts of KDE. Also it could have the abilitie to post it to a database on the web, along with information about the hardware in the machine that KDE is running on, so we all can see how a certain KDE version reacts on different configurations ? At least it would help to get a better idea of the "speed of KDE"

by Jelmer Feenstra (not verified)

Perhaps you shouldn't post the same 4 times in a row.

Jelmer

by Charles McCormick (not verified)

Could you post that a few more times ?

by Nothingman (not verified)

I would to "debloat" kde disabling all unnecessary daemons but I can't find any documentations that explains the services needed by kde.
For example, what is the daemons kwrited ?

After using konqueror, there are many process in memory: kdeinit:kio_http ......
Why are these process still in memory even if a close konqueror?

--
Nothingman

by Chris W (not verified)

I'm with you. How about disabling the sound stuff for systems without sound? How about disabling kxmlrpcd? What the heck are khotkeys, kded or kwrited? It would be nice to have a UI to disable some of that herd of daemons that KDE starts up. Even some non-programmer documentation would be helpful.

Chris

by not me (not verified)

"How about disabling the sound stuff for systems without sound?"

It is disabled. If artsd is not running, the "sound stuff" is disabled. If artsd doesn't find any sound hardware, it exits.

"What the heck are khotkeys, kded or kwrited?"

KHotKeys recognizes hotkeys, of course! kded probably has something to do with Drag-and-Drop compatibility. I don't know what kwrited is. If you want a performance increase, though, you're looking in the wrong place. Daemons like these use only a little piddling amount of memory (which gets swapped out to disk when they're idle) and next to no CPU time (on the order of seconds of CPU time per week they are running - unless they are used). Try killing them and then see if you can tell the difference in speed. I guarantee you won't be able to tell (unless you have 8 MB of RAM or something).

by Chris W` (not verified)

It is not so much performance that I am thinking of, but complexity. In any case, kxmlrpcd should be disabled, yet there is no obvious way to do so. Why should the ability to execute remote procedure calls on your machine be automatically enabled in a desktop environment? It's a security hazard and a generally unnecessary feature.

by AC (not verified)

I would like not to use arts because I have a SB Live!, but that's impossible with KDE.

by fsa (not verified)

uhh? that makes no sense

arts works with all sound cards.

it doesn't access the hardware directly, but rather through your snd driver :p

by P.Braun (not verified)

Exactly!
What are all these daemons and processes good for, if there is no difference wether they are killed or not? These number of processes is already dangerously close to what can be seen in windoze an its stability. Related with this is memory footprint in RAM.
So what is needed is:
1. Less memory usage.
2. Better control of which processes shall be started.
3. Possibility to use KDE2 applications from other environments, ie. gnome.
4. Improvements to Koffice, especially MS filters.

by fsa (not verified)

kwrited has nothing to do with kde. it is a kernel daemon

by Chris W (not verified)

Do a "ps ax" some time. Here is what I got:

17924 ? S 0:00 kdeinit: kwrited

The kernel would not start its modules with kdeinit, would it?

by Ivan (not verified)

Hi i had same problem, i dont know what its for but i found out how to turn autostart off.
do nano /opt/kde/share/config/kwritedrc

[General]
Autostart=true

change it to

[General]
Autostart=false

by Hi (not verified)

The same problem appears probably when loading StarOffice and many other apps. So system-level solution would be much better.

Excellent paper. As I have experienced it, this
is a problem that spans many applications. If
a shared is located in the 'next available' vm
address then we are begging for trouble because
each application has unique offsets and library
load orders. Staroffice is a prime example. It
takes way too long and beats the disk to death
just to get the logo up...

Compaq Tru-64 (aka DEC OSF/1) solved this problem
by loading shared objects at fixed VM addresses
and storing the address maps in a 'so_locations'
file. The linker had access to this file at
link time and the dynamic loader used it to
prime the mmap. The end result was that reloc
was already done. There is also the option
to reloc on the fly if there is an address space
confict at the expense of lots of loader work.
This was done because 8 byte address fixups are
twice as bad as 4 byte ones.

Sorry, I don't have access to an alpha at this
time and I don't know enough about the dirty
details of the GNU tool chain to say more at this
time other than I have not seen anything
indicating such. If it is not there, it should
be. Anyone with KDE on Tru-64 to prove/disprove
my assumptions?

Windows does this as well, kind of, DLLs have a prefered address on windows.

by Joeri Sebrechts (not verified)

Yeah, but they have a 64-bit address space, so you have less chance of hitting address space twice when loading stuff.
32-bits systems give you 4 gigs of ram, so you could predefine all these libs to load in the 2 to 4 gig area, but soon all systems will be carrying 4 gigs of ram, and then you'll notice you've only filled a hole by digging another one because your apps will want to load where your libraries are set to preload.

by Martin Fick (not verified)

I don't really understand everything that's going on, so excuse me if this sounds absurd. Would it be possible to make ld.so have the same hack that kdeinit has so that every program gets the "magic" speedup!?

by Steve Lawrance (not verified)

As I remember, Win32 DLLs define a base address that it pops into. Microsoft defines different base addresses for its DLLs such that the standard DLLs don't overlap and thus no address fixups required. If an application uses the default base addresses in its DLLs, Windows fixes up the addresses while loading the DLLs, making the process slower, especially in huge programs.

Are Win32 DLL address fixups similar in spirit to address relocations? Is there any way to predefine base addresses in the libraries so that the relocations are not necessary, or am I missing the point?

by Steve Lawrance (not verified)

Every library that is needed by an application gets loaded to an address that
is unique within that process. This address may vary each time the library is
loaded.

    Code that references addresses in the library must be adjusted for the
    address the library is loaded to, this is called relocation.

...

Hmm.. It looks like Win32 base addresses in DLLs might be exactly the same thing. Does the GNU toolchain allow you to specify a predefined base address in shared libraries so that the dynamic linker can simply pop in the shared objects and not worry about relocations until a conflict occurs, as in Win32?

Loading up Notepad, Word, Excel, Access, and PowerPoint in WinNT on this PIII 500 with 256MB of RAM comes up instantly, as they almost did when I used to run NT4 on my dual P1-233 with 64MB (now it runs Linux 2.4.2, KDE 2.1, and has 192MB, and KDE programs take a little while to load).

kde 1.0 seemed faster and actually timed out to be about 10-15% faster on NetBSD and FreeBSD than it was on Linux 2.2 - this could've just been defaiult kernel and optimization settings being different or other factors.

In any case a large part of the problme seems to derive from ld.so being designed before graphics, GUI, multimedia and point and click existed

There's a lot more launching of apps going on in a GUI than in c-line environment: libs and memory are used differently.

How do other or commercial Unices (like Irix, Solaris, AIX etc) deal with this??

kde 1.0 seemed faster and actually timed out to be about 10-15% faster on NetBSD and FreeBSD than it was on Linux 2.2 - this could've just been defaiult kernel and optimization settings being different or other factors.

In any case a large part of the problme seems to derive from ld.so being designed before graphics, GUI, multimedia and point and click existed

There's a lot more launching of apps going on in a GUI than in c-line environment: libs and memory are used differently.

How do other or commercial Unices (like Irix, Solaris, AIX etc) deal with this??

kde 1.0 seemed faster and actually timed out to be about 10-15% faster on NetBSD and FreeBSD than it was on Linux 2.2 - this could've just been defaiult kernel and optimization settings being different or other factors.

In any case a large part of the problme seems to derive from ld.so being designed before graphics, GUI, multimedia and point and click existed

There's a lot more launching of apps going on in a GUI than in c-line environment: libs and memory are used differently.

How do other or commercial Unices (like Irix, Solaris, AIX etc) deal with this??

kde 1.0 seemed faster and actually timed out to be about 10-15% faster on NetBSD and FreeBSD than it was on Linux 2.2 - this could've just been defaiult kernel and optimization settings being different or other factors.

In any case a large part of the problme seems to derive from ld.so being designed before graphics, GUI, multimedia and point and click existed

There's a lot more launching of apps going on in a GUI than in c-line environment: libs and memory are used differently.

How do other or commercial Unices (like Irix, Solaris, AIX etc) deal with this??

kde 1.0 seemed faster and actually timed out to be about 10-15% faster on NetBSD and FreeBSD than it was on Linux 2.2 - this could've just been defaiult kernel and optimization settings being different or other factors.

In any case a large part of the problme seems to derive from ld.so being designed before graphics, GUI, multimedia and point and click existed

There's a lot more launching of apps going on in a GUI than in c-line environment: libs and memory are used differently.

How do other or commercial Unices (like Irix, Solaris, AIX etc) deal with this??

kde 1.0 seemed faster and actually timed out to be about 10-15% faster on NetBSD and FreeBSD than it was on Linux 2.2 - this could've just been defaiult kernel and optimization settings being different or other factors.

In any case a large part of the problme seems to derive from ld.so being designed before graphics, GUI, multimedia and point and click existed

There's a lot more launching of apps going on in a GUI than in c-line environment: libs and memory are used differently.

How do other or commercial Unices (like Irix, Solaris, AIX etc) deal with this??

kde 1.0 seemed faster and actually timed out to be about 10-15% faster on NetBSD and FreeBSD than it was on Linux 2.2 - this could've just been defaiult kernel and optimization settings being different or other factors.

In any case a large part of the problme seems to derive from ld.so being designed before graphics, GUI, multimedia and point and click existed.

There's a lot more launching of apps going on in a GUI than in c-line environment: libs and memory are used differently.

How do other or commercial Unices (like Irix, Solaris, AIX etc) deal with this??

This multiple posting is really becoming a problem. If a Dot admin is reading this, PLEASE put some BIG RED TEXT next to the "add" button noting that even if your browser times out on the connection and doesn't display the confirmation page, the post was STILL POSTED! It's happened to me more than once.

As a better solution, you might look into why the Dot responds only sporadically to requests for pages. I have a 75 ms ping time to the Dot, but pages sometimes take minutes to request. Sometimes my browser gives up and times out. Once the page starts to download, though, it downloads very fast. Something is not quite right.

I think that slashcode prevents posting messages with identical text. Doing that here as well looks much better to me than writing a warning to the user.
The connection will also time out if a comment could not be posted successfully!

by Charles Samuels (not verified)

Well, sure compilation takes forever, kdelibs is somewhere in the hundreds of thousands of Lines of Code, and gcc has always been very inneficient when it comes to compiling C++. But then again, people only compile things once normally, and C++ is faster than C for these things anyway, at run-time.

by John Reiser (not verified)

My project http://www.BitWagon.com/elfvector.html speeds up application start by providing transfer vectors. PLT relocation goes from 1 per symbol to 1 per library. The symbols covered cannot be overridden, but it is easy to exclude an arbitrary list (such as malloc,calloc,realloc,free,memalign) by regular expression matching on the symbol name. You can even retrofit elfvector into existing shared libraries and existing apps; no recompilation and no relinking are required. What is required is maintenance: the ordered list of symbols is now very important, and it must be versioned, etc.

The instructions generated for calling a virtual function could be changed to delay relocation of the vtable until the first call through it, by adding one more level of indirection. After the first call, the ongoing cost would be 2 cycles per call.

More generally, a shared library linked as an ET_EXEC file loaded at a fixed address could be handled by _dl_map_object_from_fd() in glibc-2.2/elf/dl-load.c. Currently anything that is not ET_DYN is rejected, but there is essentially no difference in file format between ET_EXEC and ET_DYN. The only real problem is detecting address conflicts at runtime, in order to diagnose errors. The runtime linker could use a bitmap of 1 bit per page (32-bit address space with pages of 4KB requries 128KBytes). To handle conflicts between dynamic linking and malloc(), it would be nice to have a binary interface to /proc/self/maps, perhaps implemented via ioctl(). sbrk() should be deprecated in favor of a user-space page manager which provides services to both malloc, mmap, dlopen, etc.

And while we're here, there is a missing piece in , namely
ElfW(Ehdr) *l_elfehdr;
in the "public protocol" section of struct link_map:
-----
struct link_map
{
/* These first few members are part of the protocol with the debugger.
This is the same format used in SVR4. */

ElfW(Addr) l_addr; /* Base address shared object is loaded at. */
char *l_name; /* Absolute file name object was found in. */
ElfW(Dyn) *l_ld; /* Dynamic section of the shared object. */
struct link_map *l_next, *l_prev; /* Chain of loaded objects. */
ElfW(Ehdr) *l_elfhdr; /* Public, so everyone can find it */
-----

Yes, this partially duplicates
const ElfW(Phdr) *l_phdr; /* Pointer to program header table in core. */
ElfW(Half) l_phnum; /* Number of program header entries. */
later in the structure, but these are "private" to the implementation.

[Neither Netscape nor Opera could select anything other than Plain Text for Encoding this reply!]

by not me (not verified)

[Neither Netscape nor Opera could select anything other than Plain Text for Encoding this reply!]

Yes, HTML posting was disabled some time ago after someone posted some destructive Javascript. I can't understand why they haven't re-enabled it yet, though, especially since the malicious poster used the name and title fields for his javascript and not the comment field. The comment field itself is already protected from malicious javascript, it has always been that way. Please give us HTML posting back! There is no reason not to!

by not me (not verified)

Oh yeah, I wanted to comment on your post as well. Looks very cool, but very technical. What must a user do to take advantage of this now? Can KDE use this even if other components of the system don't and still be compatible and portable? What kind of speed increase and size decrease can be expected for libraries the size of KDE?

perhaps , if possible , a better place for the solution would be in the linker because *ALL* apps qould benefit , not only the one's done with the speed increase programm you discribe.

if it cant be resolved (i dont know as much about resolving symbols as you do) your solution would be nice...