KDE Developer's Corner: Common Programming Mistakes

I'm very happy to announce a new document for inspiring KDE hackers, entitled "Common Programming Mistakes". The document aims to combine the experience of many of the top KDE developers about the Qt and KDE frameworks dos and don'ts. The way they were usually passed on to the next generation was by letting the youngsters make the mistakes and then yell at them in public. We will go over things, which are not necessarily bugs, but which make the code either slower or less readable. The document will be expanding as we see the need for it.

Dot Categories: 

Comments

by Max Howell (not verified)

Some of those tidbits are like gold! I'm especially annoyed at not realising the QValueList iterator one. I should have realised that at some point.

by LMCBoy (not verified)

I agree, these are pure gold! There are a lot of these "mistakes" in kstars...looks like I have some work to do :)

by James Richard Tyrer (not verified)

As an old FORTRAN programmer, I know that you shouldn't do this:

QCString someCString = ...;
for( int i = 0; i < someCString.length(); ++i ) {
//Do something
}

You should do:

QCString someCString = ...;
int temp = someCString.length()
for( int i = 0; i < temp; ++i ) {
//Do something
}

But, my question is: doesn't the optimizing compiler take care of this?

And my pet programing mistake. What is wrong with the example??

Answer: The index of a loop should be: "unsigned int" (as should: "temp" in my example!). And when you get the warning about comparison between signed and unsigned integers, you should fix it.

--
JRT

by Jay (not verified)

Why does it matter if you make them unsigned? Do they not still take up the same memory space? Isn't there no use for the extra positive ints? Just curious.

by André Somers (not verified)

I'm not sure, but I guess you'll run into trouble when the loop gets very large, larger then 2^(len(int)-1) iterations, because the last bit is used for positive/negative for normal ints, and is just another bit for unsigned ints.

by James Richard Tyrer (not verified)

I guess that writing code correctly is part of the difference between being a software engineer and being a hacker.

:-)

But seriously, it will help you find bugs, unless you simply ignore the compiler warnings.

--
JRT

by Dawnrider (not verified)

The thing with using uints, is that while using a Uint will allow higher values than an int, we're still talking 32 bit numbers here. If you really are worried about large loop indices, then use a long at least. Mind, even then, there is always potential to overflow the type.

Honestly, as a programmer, you should always be aware of the potential iterations of any given loop, and ultimately, you should put checks in place in very large loops to ensure that you can't roll over.

by Spy Hunter (not verified)

longs aren't longer than ints on x86; they are 32 bits. C's types are dumb. IMHO there should be int8, int16, int32, int64, and int128. That way there would be no confusion about the length of types, and ints wouldn't magically change size and break all your code when you recompile for Alpha or x86-64 or 286 or Z80.

by AC (not verified)

Include C99's stdint.h, and you get types like uint8_t, uint16_t, etc...

by Robin (not verified)

The point is that length() returns an unsigned int that could be >= 2^31 which will be interpreted as a negative number.

Funny thing is that since the loop comparison involves a signed and un unsigned the signed will be cast to unsigned before comparison and the loop will go on as expected. There might be other problems with an "negative" index, but not in the example.

by gregj (not verified)

How do you think compiler should optimise that ?
This would be rather strange, since that method can be:

for( ;a.bla();a.bleble() )
{
a.do_something_nasty();
}

in example above a.bla() can be testing for something, and return true or false.
So there is no way compiler should even approach optimisation.
I know that code above suffers from - can be neverending story. But it's just example :-D

by Ingo Klöcker (not verified)

The compiler can easily check whether the string is modified inside the loop. If it's not modified then the value of string.length() will be calculated only once.

And in your example: If bla(), bleble() and do_something_nasty() are all const (i.e. they don't modify a) then a.bla() can (and will) be calculated before the loop. I only wonder what happens if one of those methods change a mutable member variable of a.

by Anonymous (not verified)

Because of exactly this compilers must not rely on 'const' as method attributes for optimization purpose, and that's exactly why blah() will be called each time.

by Marc Mutz (not verified)

Two things:

1. The compiler doesn't see the implementation of QCString::length() and therefore can't check that
2. QCString::length() doesn't do:

int QCString::length() const {
return const_cast( this )->mLen++;
}

by Ravi (not verified)

This is an issue that has been discussed a million times already in several fora. The best description of the problem and its subtleties can be found at the venerable Guru of the Week site:
Constant Optimization?
It is a little dated if you are using a globally optimizing compiler along with monolithic applications, but for the rest of us mere mortals, it is accurate. For KDE, assuming the use of dynamic libraries, it is accurate.

Hmm, I can't select anything other than "Plain Text" for encoding with Konq from HEAD.

by Waldo Bastian (not verified)

> Hmm, I can't select anything other than "Plain Text" for encoding with Konq from HEAD.

That's because dot.kde.org only offers "Plain Text" to mortals.

by James Richard Tyrer (not verified)

> How do you think compiler should optimise that ?

One of the things an optimizing compiler does is move invariant (constant) expressions outside of the loop.

--
JRT

by ac (not verified)

Yes, but how is the compiler to know that the expression is invariant? If
.length() is an inline it may be able to work it out from the fact that someCString
is never updated, but if it isn't it's hard to see how this would work. Or does C++
have some way of telling the compiler that length() has no side-effects?

by Eggert Ehmke (not verified)

Yes of course. QCString::length () is defined as
uint length () const

by glettieri (not verified)

Well, this means that lenght() does not modify (unmutable) members of the object it is applied to. It does not mean that lenght() has no side effects (it can modify global variables, for instance).

by Humberto Massa (not verified)

just put const: it indicates exactly that (at least for that invocant, recipient or object [this] :-), length() will not change anything that is not marked mutable... ;-)

size_t QCString::length() const {
...
}

Yes, but that's not enough for the compiler to treat it as invariant. You'd need to say
that it didn't touch any global state either.

Just add 'return random();' to that member function, and you cannot rely on that returning the same number every time. The function still won't modify the object, but it illustrates why const doesn't allow the compiler to optimize away calling length each time through.

True.

Help is at hand, however. gcc allows you to specify that a function called with the same arguments always returns the same values, and thus can be optimised out of a loop, by use of __attribute__((const)). This is a gcc extension specifically for cases like the above.

by matus telgarsky (not verified)

I disagree about using the temp variable. There are two obvious reasons to use it (a) avoid function call and computation overhead of length() call--or whatever you use to limit the loop--by calling once and saving/caching, and (b) in concurrent environments length() could change during the loop, so you're screwed.

well, most libraries (stl most definitely) inline the call to length()/size(), and the function itself just returns an attribute of the class, so there is no advantage over the caching method you state.

for (b), you'd probably want to use a mutex or something like that to shield in that situation anyway.

So basically what that method does is create an unnecessary obfuscation and more code to read.

unsigned int may seem more correct, however (a) most places in the code don't need this kind of protection (you know you won't go that high), (b) in situations where it will, there's probably a smarter constraint (don't forget that architectures with different word sizes define unsigned int differently!), and (c) no way I'm going to type more than I need.

by Ingo Klöcker (not verified)

> well, most libraries (stl most definitely) inline the call to
> length()/size(), and the function itself just returns an attribute of the
> class, so there is no advantage over the caching method you state.

The problem is that QCString is a pretty stupid class (it's a normal C-string with reference counting).

http://doc.trolltech.com/3.3/qcstring.html#length says:
"Returns the length of the string, excluding the '\0'-terminator. Equivalent to calling strlen(data())."

So QCString::length() is not O(1) but O(n) where n is the length of the string and thus highly inefficient.

by matus telgarsky (not verified)

well, that sucks. I'm not a QT guy, so my comment was uninformed in this context...

by James Richard Tyrer (not verified)

It appears that the problem is ... .

I suppose that this isn't the best place to state that C syntax sucks. But it does. One of the main offenders is the 'for' loop construct. This can be used for many kinds of loops, but code efficiency suffers. If we used an indexed loop, then the problem wouldn't exist because it is required that the parameters for an indexed loop ARE constants.

QCString someCString = "...";
START(N=0, someCString.length())
//Do something
LOOP: END

Much better. And, you don't have to define: "N" because it isn't really a variable (but it is unsigned).

--
JRT

by James Richard Tyrer (not verified)

> So basically what that method does is create an unnecessary obfuscation and
> more code to read.

I think that my point is that you no longer have to do this because the compiler will do it for your.

> unsigned int may seem more correct

Yes that is why you do it, because it is correct. Integers that can never be less than 0 should be unsigned.

> no way I'm going to type more than I need.

Yes, it is much better to hunt for bugs than to take the extra effort to write the code correctly.

:-) And I guess that you, therefore, would never want to try PL/Fiv.

--
JRT

by matus telgarsky (not verified)

Hi,
I was careful to say "may _seem_ more correct", because it's naive to assume that unsigned int is a good idea on all architectures, and especially in all contexts. Usually there is a better choice if you are worry about overflow, like ptrdiff_t, size_t, class::type, etc.

oh and while I'm being anal, isn't using != rather than < much prettier in many loops? I use that styling very often, and definitely whenever dealing with sequential incrementation of integers..

by Anonymous (not verified)

It's not only about the overflow. Suppose you want to do something like
textfield.setlength(i);
where i is a loop index

Now Textfield::setlength should really be defined as
setlength(unsigned int l)
because you don't want to have to handle the case of a negative length for a textfield.

If you have warnings turned on the call to textfield.setlength(i) will give you a warning. And you really want to avoid warnings because too many of them will start to hide those that you really should see because they indicate a possible bug.

by matus telgarsky (not verified)

yes yes, agreed, I write that way all the time for the reason you stated. But the fragment in question was just part of a routine, and it's possibly we could know we don't need the extra amount of int afforded to us this way.

but yeah for any API type stuff, whenever I take something vaguely like an offset I use unsigned so that the compiler can help me.

by Paul Balez (not verified)

Am i right to propose this oslution :

QCString someCString = ...;
for( unsigned int i = 0; someCString.data()[i] != 0; ++i )
{
// Do something
}

This way i values are visited sequentially,
and (if someCString has a \0 ending char) then
string length (string end) is tested with a O(1) complexity.

Constraint : the string size can stay unmodified,
or can increase, but should not decrease.

To always respect the constraint : someCString could be const (if possible)

by David (not verified)

This is a good thing to see happening, and I hope you can perhaps turn this into a series.

Although I'm not familiar with the Qt/KDE aspects (yet!), one thing I will say is that people 'just hacking' (and people who should know better :)) ignore compiler warnings all too often. "Oh, well, it compiles and the application runs...". "Oh it's OK, it'll get caught by the compiler.". Those warnings are there for a purpose, and more often than not, are the sign of redundant (and/or inefficient) code - something that you really do not want in a project the size and scope of KDE. The worst case scenario is that your code is compiled with a different compiler version, or on a different platform, and the compiler throws a wobbler. A very real problem is that in a project the size of KDE the problems tend to multiply the more that people do this, and remember that people may actually be reusing your code and depending on it. Please, if possible, try and work for a handful of minutes to get them darn warnings down to zero, if possible. You'll be satisfied with a cleanly compiling bit of code, and it will save you (and others) a ton of time and effort in the future.

Something that also came home to me in the 'QString is Null' example (a bit of a mild example in this context) was that people do not use the existing methods available to them enough. If you reuse existing methods and code available as much as possible then you spend less time getting something done, you know where any problems occur, if and when they do, you know how to solve them much quicker, more developers can help you because they are using the same methods and the same code as you.... There is also a responsibility on you to try and expose as much useful code to others as you can, and make sure that everyone knows about it :). Because of the size of a project like KDE, if you do these things, the advantages, speed and efficiency just multiply exponentially. I'm sure we all know this anyway.

None of this is code-specific in any way, but then again, good, productive programming never is - it is unseen a lot of the time unfortunately.

Just a couple of pennys' worth.

by Simon Farnsworth (not verified)

As something useful for open source developers; compile releases with "-O -Wall -Werror", and make sure it compiles (obviously, -O could be -O2 or -Os, or whatever). This turns all warnings into errors, and makes sure you can fix them. Where a warning is something silly (some lints require you to cast the result of printf() to void), it forces you to put something into the code that makes your intent quite clear; -Wall only turns on warnings that make your code unclear, and thus helps with making sure it's correct. -W also helps with finding errors, since it warns where your code is doing something where the "natural" interpretation doesn't fit the spec.

If you're developing code, it's well worth turning on as many warnings as possible ("info:/gcc/Warning Options" for KDE users who've got the info documentation installed), and using -Werror to make sure that your release gives no warnings on your compiler. It won't catch every bug, but it'll force you to look at code that's not 100% clear and perfect.

FWIW, when I'm finishing code, I use "-O2 -Wall -Werror -W -Wfloat-equal -Wundef -Wshadow -Wpointer-arith -Wcast-qual -Wwrite-strings -Wconversion -Wredundant-decls". It's a lot of warnings, but even though I'm the only user of my code, it cuts down on the trouble I have if ever I want to go back to an old project and play with it. Plus, if ever I think my code is worth releasing (usually it's not), I've got it into a nice state.

by Frerich Raabe (not verified)

A nice article IMHO, two things came to my mind while reading it:

- Regarding the NULL pointer issue, in particular setting 'ptr' to 0 after calling delete on it (or rather, what it points to): it might be a good idea to mention QGuardedPtr in that context (IIRC QGuardedPtr does just that, deleting something and setting the pointer to zero in the dtor. Just the name is a little suboptimal).

- Regarding iterators: a very common mistake I spotted is that iterators are acquired from temporary lists. Usually, this happens if some function returns a list (e.g. a QValueList) by value and you fail to make a local copy. So, assuming you have a function 'QValueList numbers();', you shouldn't do

QValueList::ConstIterator end = numbers().end();

but

QValueList list = numbers();
QValueList::ConstIterator it = list.end();

or something like that, I hope the point I'm trying to make became apparent.

- Frerich

by Frerich Raabe (not verified)

A nice article IMHO, two things came to my mind while reading it:

- Regarding the NULL pointer issue, in particular setting 'ptr' to 0 after calling delete on it (or rather, what it points to): it might be a good idea to mention QGuardedPtr in that context (IIRC QGuardedPtr does just that, deleting something and setting the pointer to zero in the dtor. Just the name is a little suboptimal).

- Regarding iterators: a very common mistake I spotted is that iterators are acquired from temporary lists. Usually, this happens if some function returns a list (e.g. a QValueList) by value and you fail to make a local copy. So, assuming you have a function 'QValueList numbers();', you shouldn't do

QValueList::ConstIterator end = numbers().end();

but

QValueList list = numbers();
QValueList::ConstIterator it = list.end();

or something like that, I hope the point I'm trying to make became apparent (the problem is that in the former version, the iterator gets invalidated at the end of the line since the list it was acquired for got destroyed, as it's a temporary).

- Frerich