An Analysis of KDE Speed

Our recent poll (courtesy KDE.com) on the upcoming KDE 2.2 suggests that the area of
greatest concern for KDE users is speed -- at this time, out of 3,463 votes, over 24% consider speed as most important for developers to address. Waldo Bastian, who developed the kdeinit speed hack among other things, has written a paper entitled "Making C++ ready for the desktop", in which he analyzes the various startup phases of a C++ program. Noting that one component of linking -- namely, library relocations -- is currently slow, he offers some suggestions for optimizations. An interesting read.

Dot Categories: 

Comments

by Koochicoo (not verified)

1) start machine
2) start all necessary apps (could take time)
3) leave apps and machine running until next kernel upgrade (every year or so for me).

Laptops have battery back up and deep sleep you never have to turn them off ;-)

4) Buy earplug for those who happens to have the computer in the bedroom.

:-)

by Alex Wulms (not verified)

Yeah, sure. And increase my electricity bill with a factor 10.

And run the risk of finding my house burned-down if a short-circuit happens in the powersupply while I'm not at home...

by Yonik Seeley (not verified)

Could vtable relocations be made lazy also through the use of virtual memory somehow? Have all pages with vtables not accessible, and when access is attempted, then you do the relocations. I guess this might want to be in the kernel.

-Yonik

this script opens a new konqueror window with an url really fast if there are already konqueror windows open:

#!/bin/sh
if ps auxwwwwww | grep konqueror | grep -v kio_http | grep -v grep >/dev/null;
dcop konqueror KonquerorIface openBrowserWindow $*
else
konqueror $*
fi

by C. D. Thompson-Walsh (not verified)

In my experience, KDE tends to be algorithmically pretty fast... What slows it down is the page faults, when it has to swap stuff in. I find KDE more than fast enough on a K6-233 w/64+ Mb of ram... But when it does slow down, it is because of ram utilisation. (And 32mb is painful)
Reducing the memory footprint of KDE, then, would actually result in a massive speed boost on a lot of lower end machines (which are what we're worrying about in the first place...)

by chris (not verified)

i find kde REALLY slow on my 1400mhz 512 ram machine , it never swaps , but still it is soooo slow !!!

you see , slowlyness is relative...

chris

by Juha Manninen (not verified)

About lower end machines, what would you say about porting Qt + KDE without X-window into desktop PCs, too. That would be a huge improvement in speed and memory usage.
Now the Qt/Embedded is meant only for PDAs etc.

Juha Manninen

by Geert Jansen (not verified)

The solution of having shared libraries load at fixed addresses seems like a great solution to me. Windows and Tru64 apparently do this too.

It could be implemented in a rather straightforward manner in ldconfig. After searching the library path for shared libs, it should assign a unique address to each one. This value can be stored in either a special ELF section or in a system wide file (maybe a system wide file is better). At runtime, ld.so uses this information to load the library at the appropriate address. If it sees that it must load libraries that are not in this "base address cache", it relocates the shared libraries in the usual way.

I never looked at the source of ldconfig or ld.so but the above change seems easy enough.

Apologies if I'm far off the mark -- I'm not versed in the details of how the dynamic linker works.

I think at first glance, it would seem that the problem would be solved by giving every shared library a unique base address. This would presumably mean that libraries are slurped into preferred spots in the address space, and symbols end up at predictable addresses.

But it is not so much the base address which matters, but the offsets of each symbol in the library. Symbols change in size and may be ordered differently between versions of the library. Knowing a base address would no longer help.

I would guess that this is why shared libraries contain huge hash tables of strings against symbol offsets: other than a string name, the program has no reliable way of naming a symbol it is interested in.

So, it seems that this approach would require more than caching the base address: it would also require that libraries and programs be linked symbolically at build time, linked again at install time with known base addresses in mind, then linked again at runtime if symbols move underfoot.

I think the install-time link would be prohibitively slow, especially for people who keep a constant churn of upgrades on their systems.

Apologies if I'm far off the mark -- I'm not versed in the details of how the dynamic linker works.

I think at first glance, it would seem that the problem would be solved by giving every shared library a unique base address. This would presumably mean that libraries are slurped into preferred spots in the address space, and symbols end up at predictable addresses.

But it is not so much the base address which matters, but the offsets of each symbol in the library. Symbols change in size and may be ordered differently between versions of the library. Knowing a base address would no longer help.

I would guess that this is why shared libraries contain huge hash tables of strings against symbol offsets: other than a string name, the program has no reliable way of naming a symbol it is interested in.

So, it seems that this approach would require more than caching the base address: it would also require that libraries and programs be linked symbolically at build time, linked again at install time with known base addresses in mind, then linked again at runtime if symbols move underfoot.

I think the install-time link would be prohibitively slow, especially for people who keep a constant churn of upgrades on their systems.

Apologies if I'm far off the mark -- I'm not versed in the details of how the dynamic linker works.

I think at first glance, it would seem that the problem would be solved by giving every shared library a unique base address. This would presumably mean that libraries are slurped into preferred spots in the address space, and symbols end up at predictable addresses.

But it is not so much the base address which matters, but the offsets of each symbol in the library. Symbols change in size and may be ordered differently between versions of the library. Knowing a base address would no longer help.

I would guess that this is why shared libraries contain huge hash tables of strings against symbol offsets: other than a string name, the program has no reliable way of naming a symbol it is interested in.

So, it seems that this approach would require more than caching the base address: it would also require that libraries and programs be linked symbolically at build time, linked again at install time with known base addresses in mind, then linked again at runtime if symbols move underfoot.

I think the install-time link would be prohibitively slow, especially for people who keep a constant churn of upgrades on their systems.

by Ian Chiew (not verified)

What does `speed' mean?

It seems that some have taken it to mean application startup time, and others the general performance of KDE.

by Richard Dale (not verified)

A good way to speed up application startup time is to not load all the code at startup, but only when it is needed. So you could speed up an app by splitting it into a small core, along with a number of dynamically loaded KParts. The KParts can be loaded lazily on demand, and you don't need to always load code for the areas of the UI that the user only very occasionally might want to touch - eg preferences or help.

-- Richard

From what I can understand of Waldo's paper, it seems that the startup time problem would then have two parts. There's the time it takes for the base KDE libraries to load, and there is the incremental cost of introducing more functionality in the form of libraries.

It appears that both are particularly bad.

I agree that KParts and the general practice of breaking up applications into large blocks of demand-loaded functionality is a great way to tackle the second problem.

That, though, still leaves the first problem. Even small applications like KEdit (which use a minimal subset of the KDE framework) seem to take many times longer (and a great deal more processor time) than their GNOME or Windows counterparts to start up.

by Bojan (not verified)

When I took my computer home, where I have no Internet connection, all Qt based applications took like 1 minute to start (otherwise they took like 1 second). So it seems to me, that each Qt application looks for something on the Internet. Maybe this is the reason for slow startup of the Qt based applications or am I just wrong?

by John Reiser (not verified)

Here's how to get rid of nearly all relocations, by assigning fixed virtual addresses to shared libraries. You can run it and measure its performance today. Assign the addresses from the space below 0x08048000; leave a gap between adjacent libraries for expansion, etc.

# demo: main() calls foo(); foo() printf()s a message
gcc -c main.c foo.c # compile only

# Create a "shared library" at a fixed address using foo.o
# elf_i386.x is a copy of /usr/lib/ldscripts/elf_i386.x,
# EXCEPT that `0x08048000' has been changed, such as to `0x07ff0000'.
gcc -Wl,-E -nostartfiles -o foo.so -T elf_i386.x foo.o

cat >dyn4exec.c <
#include
#include

main(int argc, char const *const *const argv)
{
int const fd = open(argv[1], O_RDWR, 0);
Elf32_Ehdr *const ehdr = (Elf32_Ehdr *)mmap(0, 4096,
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ehdr->e_type!=ET_EXEC) {
return 1;
}
ehdr->e_type = ET_DYN;
return 0;
}
EOF

gcc -o dyn4exec dyn4exec.c
./dyn4exec foo.so # sets Elf32_Ehdr.e_type = ET_DYN

gcc -o main main.o foo.so # link main() to new shared lib

export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH
./main # test run

by Andrea (not verified)

I've found this article digging into mozilla's
performance newsgroup. It may be of interest:

From the FreeBSD mailing lists:

jdp 2001/05/05 16:21:05 PDT

Modified files:
libexec/rtld-elf rtld.c rtld.h
libexec/rtld-elf/alpha reloc.c
libexec/rtld-elf/i386 reloc.c
Log:
Performance improvements for the ELF dynamic linker. These particularly help programs which load many shared libraries with a lot of relocations. Large C++ programs such as are found in KDE are a prime example.

While relocating a shared object, maintain a vector of symbols which have already been looked up, directly indexed by symbol number.
Typically, symbols which are referenced by a relocation entry are referenced by many of them. This is the same optimization I made to the a.out dynamic linker in 1995 (rtld.c revision 1.30).

Also, compare the first character of a sought-after symbol with its symbol table entry before calling strcmp().

On a PII/400 these changes reduce the start-up time of a typical KDE program from 833 msec (elapsed) to 370 msec.

MFC after: 5 days

Revision Changes Path
1.52 +22 -6 src/libexec/rtld-elf/rtld.c
1.22 +11 -2 src/libexec/rtld-elf/rtld.h
1.12 +12 -5 src/libexec/rtld-elf/alpha/reloc.c
1.7 +10 -5 src/libexec/rtld-elf/i386/reloc.c