MAR
24
2004

KDE Developer's Corner: Common Programming Mistakes

I'm very happy to announce a new document for inspiring KDE hackers, entitled "Common Programming Mistakes". The document aims to combine the experience of many of the top KDE developers about the Qt and KDE frameworks dos and don'ts. The way they were usually passed on to the next generation was by letting the youngsters make the mistakes and then yell at them in public. We will go over things, which are not necessarily bugs, but which make the code either slower or less readable. The document will be expanding as we see the need for it.

Comments

Some of those tidbits are like gold! I'm especially annoyed at not realising the QValueList iterator one. I should have realised that at some point.


By Max Howell at Wed, 2004/03/24 - 6:00am

I agree, these are pure gold! There are a lot of these "mistakes" in kstars...looks like I have some work to do :)


By LMCBoy at Wed, 2004/03/24 - 6:00am

As an old FORTRAN programmer, I know that you shouldn't do this:

QCString someCString = ...;
for( int i = 0; i < someCString.length(); ++i ) {
//Do something
}

You should do:

QCString someCString = ...;
int temp = someCString.length()
for( int i = 0; i < temp; ++i ) {
//Do something
}

But, my question is: doesn't the optimizing compiler take care of this?

And my pet programing mistake. What is wrong with the example??

Answer: The index of a loop should be: "unsigned int" (as should: "temp" in my example!). And when you get the warning about comparison between signed and unsigned integers, you should fix it.

--
JRT


By James Richard Tyrer at Wed, 2004/03/24 - 6:00am

Why does it matter if you make them unsigned? Do they not still take up the same memory space? Isn't there no use for the extra positive ints? Just curious.


By Jay at Wed, 2004/03/24 - 6:00am

I'm not sure, but I guess you'll run into trouble when the loop gets very large, larger then 2^(len(int)-1) iterations, because the last bit is used for positive/negative for normal ints, and is just another bit for unsigned ints.


By André Somers at Wed, 2004/03/24 - 6:00am

I guess that writing code correctly is part of the difference between being a software engineer and being a hacker.

:-)

But seriously, it will help you find bugs, unless you simply ignore the compiler warnings.

--
JRT


By James Richard Tyrer at Wed, 2004/03/24 - 6:00am

The thing with using uints, is that while using a Uint will allow higher values than an int, we're still talking 32 bit numbers here. If you really are worried about large loop indices, then use a long at least. Mind, even then, there is always potential to overflow the type.

Honestly, as a programmer, you should always be aware of the potential iterations of any given loop, and ultimately, you should put checks in place in very large loops to ensure that you can't roll over.


By Dawnrider at Wed, 2004/03/24 - 6:00am

longs aren't longer than ints on x86; they are 32 bits. C's types are dumb. IMHO there should be int8, int16, int32, int64, and int128. That way there would be no confusion about the length of types, and ints wouldn't magically change size and break all your code when you recompile for Alpha or x86-64 or 286 or Z80.


By Spy Hunter at Wed, 2004/03/24 - 6:00am

Include C99's stdint.h, and you get types like uint8_t, uint16_t, etc...


By AC at Wed, 2004/03/24 - 6:00am

The point is that length() returns an unsigned int that could be >= 2^31 which will be interpreted as a negative number.

Funny thing is that since the loop comparison involves a signed and un unsigned the signed will be cast to unsigned before comparison and the loop will go on as expected. There might be other problems with an "negative" index, but not in the example.


By Robin at Fri, 2004/03/26 - 6:00am

How do you think compiler should optimise that ?
This would be rather strange, since that method can be:

for( ;a.bla();a.bleble() )
{
a.do_something_nasty();
}

in example above a.bla() can be testing for something, and return true or false.
So there is no way compiler should even approach optimisation.
I know that code above suffers from - can be neverending story. But it's just example :-D


By gregj at Wed, 2004/03/24 - 6:00am

The compiler can easily check whether the string is modified inside the loop. If it's not modified then the value of string.length() will be calculated only once.

And in your example: If bla(), bleble() and do_something_nasty() are all const (i.e. they don't modify a) then a.bla() can (and will) be calculated before the loop. I only wonder what happens if one of those methods change a mutable member variable of a.


By Ingo Klöcker at Wed, 2004/03/24 - 6:00am

Because of exactly this compilers must not rely on 'const' as method attributes for optimization purpose, and that's exactly why blah() will be called each time.


By Anonymous at Wed, 2004/03/24 - 6:00am

Two things:

1. The compiler doesn't see the implementation of QCString::length() and therefore can't check that
2. QCString::length() doesn't do:

int QCString::length() const {
return const_cast( this )->mLen++;
}


By Marc Mutz at Wed, 2004/03/24 - 6:00am

This is an issue that has been discussed a million times already in several fora. The best description of the problem and its subtleties can be found at the venerable Guru of the Week site:
Constant Optimization?
It is a little dated if you are using a globally optimizing compiler along with monolithic applications, but for the rest of us mere mortals, it is accurate. For KDE, assuming the use of dynamic libraries, it is accurate.

Hmm, I can't select anything other than "Plain Text" for encoding with Konq from HEAD.


By Ravi at Wed, 2004/03/24 - 6:00am

> Hmm, I can't select anything other than "Plain Text" for encoding with Konq from HEAD.

That's because dot.kde.org only offers "Plain Text" to mortals.


By Waldo Bastian at Wed, 2004/03/24 - 6:00am

> How do you think compiler should optimise that ?

One of the things an optimizing compiler does is move invariant (constant) expressions outside of the loop.

--
JRT


By James Richard Tyrer at Wed, 2004/03/24 - 6:00am

Yes, but how is the compiler to know that the expression is invariant? If
.length() is an inline it may be able to work it out from the fact that someCString
is never updated, but if it isn't it's hard to see how this would work. Or does C++
have some way of telling the compiler that length() has no side-effects?


By ac at Wed, 2004/03/24 - 6:00am

Yes of course. QCString::length () is defined as
uint length () const


By Eggert Ehmke at Wed, 2004/03/24 - 6:00am

Well, this means that lenght() does not modify (unmutable) members of the object it is applied to. It does not mean that lenght() has no side effects (it can modify global variables, for instance).


By glettieri at Wed, 2004/03/24 - 6:00am

just put const: it indicates exactly that (at least for that invocant, recipient or object [this] :-), length() will not change anything that is not marked mutable... ;-)

size_t QCString::length() const {
...
}


By Humberto Massa at Wed, 2004/03/24 - 6:00am

Yes, but that's not enough for the compiler to treat it as invariant. You'd need to say
that it didn't touch any global state either.


By ac at Wed, 2004/03/24 - 6:00am

Just add 'return random();' to that member function, and you cannot rely on that returning the same number every time. The function still won't modify the object, but it illustrates why const doesn't allow the compiler to optimize away calling length each time through.


By CK at Thu, 2004/03/25 - 6:00am

True.

Help is at hand, however. gcc allows you to specify that a function called with the same arguments always returns the same values, and thus can be optimised out of a loop, by use of __attribute__((const)). This is a gcc extension specifically for cases like the above.


By Jon at Thu, 2004/03/25 - 6:00am

I disagree about using the temp variable. There are two obvious reasons to use it (a) avoid function call and computation overhead of length() call--or whatever you use to limit the loop--by calling once and saving/caching, and (b) in concurrent environments length() could change during the loop, so you're screwed.

well, most libraries (stl most definitely) inline the call to length()/size(), and the function itself just returns an attribute of the class, so there is no advantage over the caching method you state.

for (b), you'd probably want to use a mutex or something like that to shield in that situation anyway.

So basically what that method does is create an unnecessary obfuscation and more code to read.

unsigned int may seem more correct, however (a) most places in the code don't need this kind of protection (you know you won't go that high), (b) in situations where it will, there's probably a smarter constraint (don't forget that architectures with different word sizes define unsigned int differently!), and (c) no way I'm going to type more than I need.


By matus telgarsky at Wed, 2004/03/24 - 6:00am

> well, most libraries (stl most definitely) inline the call to
> length()/size(), and the function itself just returns an attribute of the
> class, so there is no advantage over the caching method you state.

The problem is that QCString is a pretty stupid class (it's a normal C-string with reference counting).

http://doc.trolltech.com/3.3/qcstring.html#length says:
"Returns the length of the string, excluding the '\0'-terminator. Equivalent to calling strlen(data())."

So QCString::length() is not O(1) but O(n) where n is the length of the string and thus highly inefficient.


By Ingo Klöcker at Wed, 2004/03/24 - 6:00am

well, that sucks. I'm not a QT guy, so my comment was uninformed in this context...


By matus telgarsky at Wed, 2004/03/24 - 6:00am

It appears that the problem is ... .

I suppose that this isn't the best place to state that C syntax sucks. But it does. One of the main offenders is the 'for' loop construct. This can be used for many kinds of loops, but code efficiency suffers. If we used an indexed loop, then the problem wouldn't exist because it is required that the parameters for an indexed loop ARE constants.

QCString someCString = "...";
START(N=0, someCString.length())
//Do something
LOOP: END

Much better. And, you don't have to define: "N" because it isn't really a variable (but it is unsigned).

--
JRT


By James Richard Tyrer at Thu, 2004/03/25 - 6:00am

> So basically what that method does is create an unnecessary obfuscation and
> more code to read.

I think that my point is that you no longer have to do this because the compiler will do it for your.

> unsigned int may seem more correct

Yes that is why you do it, because it is correct. Integers that can never be less than 0 should be unsigned.

> no way I'm going to type more than I need.

Yes, it is much better to hunt for bugs than to take the extra effort to write the code correctly.

:-) And I guess that you, therefore, would never want to try PL/Fiv.

--
JRT


By James Richard Tyrer at Wed, 2004/03/24 - 6:00am

Hi,
I was careful to say "may _seem_ more correct", because it's naive to assume that unsigned int is a good idea on all architectures, and especially in all contexts. Usually there is a better choice if you are worry about overflow, like ptrdiff_t, size_t, class::type, etc.

oh and while I'm being anal, isn't using != rather than < much prettier in many loops? I use that styling very often, and definitely whenever dealing with sequential incrementation of integers..


By matus telgarsky at Wed, 2004/03/24 - 6:00am

It's not only about the overflow. Suppose you want to do something like
textfield.setlength(i);
where i is a loop index

Now Textfield::setlength should really be defined as
setlength(unsigned int l)
because you don't want to have to handle the case of a negative length for a textfield.

If you have warnings turned on the call to textfield.setlength(i) will give you a warning. And you really want to avoid warnings because too many of them will start to hide those that you really should see because they indicate a possible bug.


By Anonymous at Wed, 2004/03/24 - 6:00am

yes yes, agreed, I write that way all the time for the reason you stated. But the fragment in question was just part of a routine, and it's possibly we could know we don't need the extra amount of int afforded to us this way.

but yeah for any API type stuff, whenever I take something vaguely like an offset I use unsigned so that the compiler can help me.


By matus telgarsky at Fri, 2004/03/26 - 6:00am

Am i right to propose this oslution :

QCString someCString = ...;
for( unsigned int i = 0; someCString.data()[i] != 0; ++i )
{
// Do something
}

This way i values are visited sequentially,
and (if someCString has a \0 ending char) then
string length (string end) is tested with a O(1) complexity.

Constraint : the string size can stay unmodified,
or can increase, but should not decrease.

To always respect the constraint : someCString could be const (if possible)


By Paul Balez at Tue, 2005/08/09 - 5:00am

This is a good thing to see happening, and I hope you can perhaps turn this into a series.

Although I'm not familiar with the Qt/KDE aspects (yet!), one thing I will say is that people 'just hacking' (and people who should know better :)) ignore compiler warnings all too often. "Oh, well, it compiles and the application runs...". "Oh it's OK, it'll get caught by the compiler.". Those warnings are there for a purpose, and more often than not, are the sign of redundant (and/or inefficient) code - something that you really do not want in a project the size and scope of KDE. The worst case scenario is that your code is compiled with a different compiler version, or on a different platform, and the compiler throws a wobbler. A very real problem is that in a project the size of KDE the problems tend to multiply the more that people do this, and remember that people may actually be reusing your code and depending on it. Please, if possible, try and work for a handful of minutes to get them darn warnings down to zero, if possible. You'll be satisfied with a cleanly compiling bit of code, and it will save you (and others) a ton of time and effort in the future.

Something that also came home to me in the 'QString is Null' example (a bit of a mild example in this context) was that people do not use the existing methods available to them enough. If you reuse existing methods and code available as much as possible then you spend less time getting something done, you know where any problems occur, if and when they do, you know how to solve them much quicker, more developers can help you because they are using the same methods and the same code as you.... There is also a responsibility on you to try and expose as much useful code to others as you can, and make sure that everyone knows about it :). Because of the size of a project like KDE, if you do these things, the advantages, speed and efficiency just multiply exponentially. I'm sure we all know this anyway.

None of this is code-specific in any way, but then again, good, productive programming never is - it is unseen a lot of the time unfortunately.

Just a couple of pennys' worth.


By David at Wed, 2004/03/24 - 6:00am

As something useful for open source developers; compile releases with "-O -Wall -Werror", and make sure it compiles (obviously, -O could be -O2 or -Os, or whatever). This turns all warnings into errors, and makes sure you can fix them. Where a warning is something silly (some lints require you to cast the result of printf() to void), it forces you to put something into the code that makes your intent quite clear; -Wall only turns on warnings that make your code unclear, and thus helps with making sure it's correct. -W also helps with finding errors, since it warns where your code is doing something where the "natural" interpretation doesn't fit the spec.

If you're developing code, it's well worth turning on as many warnings as possible ("info:/gcc/Warning Options" for KDE users who've got the info documentation installed), and using -Werror to make sure that your release gives no warnings on your compiler. It won't catch every bug, but it'll force you to look at code that's not 100% clear and perfect.

FWIW, when I'm finishing code, I use "-O2 -Wall -Werror -W -Wfloat-equal -Wundef -Wshadow -Wpointer-arith -Wcast-qual -Wwrite-strings -Wconversion -Wredundant-decls". It's a lot of warnings, but even though I'm the only user of my code, it cuts down on the trouble I have if ever I want to go back to an old project and play with it. Plus, if ever I think my code is worth releasing (usually it's not), I've got it into a nice state.


By Simon Farnsworth at Fri, 2004/03/26 - 6:00am

A nice article IMHO, two things came to my mind while reading it:

- Regarding the NULL pointer issue, in particular setting 'ptr' to 0 after calling delete on it (or rather, what it points to): it might be a good idea to mention QGuardedPtr in that context (IIRC QGuardedPtr does just that, deleting something and setting the pointer to zero in the dtor. Just the name is a little suboptimal).

- Regarding iterators: a very common mistake I spotted is that iterators are acquired from temporary lists. Usually, this happens if some function returns a list (e.g. a QValueList) by value and you fail to make a local copy. So, assuming you have a function 'QValueList numbers();', you shouldn't do

QValueList::ConstIterator end = numbers().end();

but

QValueList list = numbers();
QValueList::ConstIterator it = list.end();

or something like that, I hope the point I'm trying to make became apparent.

- Frerich


By Frerich Raabe at Thu, 2004/03/25 - 6:00am

A nice article IMHO, two things came to my mind while reading it:

- Regarding the NULL pointer issue, in particular setting 'ptr' to 0 after calling delete on it (or rather, what it points to): it might be a good idea to mention QGuardedPtr in that context (IIRC QGuardedPtr does just that, deleting something and setting the pointer to zero in the dtor. Just the name is a little suboptimal).

- Regarding iterators: a very common mistake I spotted is that iterators are acquired from temporary lists. Usually, this happens if some function returns a list (e.g. a QValueList) by value and you fail to make a local copy. So, assuming you have a function 'QValueList numbers();', you shouldn't do

QValueList::ConstIterator end = numbers().end();

but

QValueList list = numbers();
QValueList::ConstIterator it = list.end();

or something like that, I hope the point I'm trying to make became apparent (the problem is that in the former version, the iterator gets invalidated at the end of the line since the list it was acquired for got destroyed, as it's a temporary).

- Frerich


By Frerich Raabe at Thu, 2004/03/25 - 6:00am