Automated KDE Test Reports

With the help of several other people I have begun writing automated test scripts for KDE. Located in kdenonbeta/kdetestscripts the current tests range from icon checking, finding memory leaks, outdated header names, slow code, and many more. The majority of the problems found by the scripts can be fixed with one line changes to the code. If you are someone who has wanted to play around in KDE's code, but don't know where to start this just might be the spot for you.

Find out more by check out the current report* which contains links on how to get the code and create diffs along with all the currently found problems and hints/links on how to fix them. If you feel up to the task, fix a few of them and send the patch to the applications author or pass it onto a developer in #kde-devel on irc.kde.org. Once your patch is reviewed you get to see your change commited into KDE. :-)

Feel free to come up with more ideas for scripts or even write up some of the scripts in the TODO list. There can easily be ten times as many test scripts as there are now.

*If your application isn't listed right now or isn't in KDE's CVS you can still get the scripts from CVS and run them on your own code. Also I will be adding the rest of KDE within the week.

Dot Categories: 

Comments

by cm (not verified)

I think this can be a very useful tool.

But when the real problems get fixed it will become more and more difficult to spot the remaining problems amonst the false positives and (for whatever reason) unfixable problems . It would be good to have a way to mark an entry as false positive to prevent it from popping up again in future reports. Being able to add a comment as to why something isn't an error or why it can't be fixed would help others as well and prevent wasted efforts.

In addition, it's also very satisfying to see a test succeed without any remaining errors which would be impossible if there were any false positives around...

And with a report with zero warnings as the "normal" case it would be much easier to see if a bad practice has crept back into the code again.

But I think that's not so easy to implement given the nature of the test scripts (shell scripts that search the source code using grep and friends).

by Anonymous (not verified)

Detecting something like

// KDE-Skip-Test: "test-name"
// This is ok here because ...

which causes a script to ignore the next failed test with a given name should not be too much of a problem. Optionally an integer parameter could specify that the test will skip the next n lines only.

by Benjamin Meyer (not verified)

There are defenetly some false positives. As I have been finding them I have been modifing the scripts (which are quite crude) to just ignore those files or enhancing the scripts to detect those falsities. At least the good part is that it is fairly easy to detect a check as false as most of the checks are for trivial things anyway.

For example some of the string == "" that are reported, string isn't a QString, but really a std::string. So I need to modifiy the script to find where the variable is declared and check that it is a QString.

Maybe I should add a big red box warning users of false positives?

-Benjamin Meyer

by Marc Mutz (not verified)

> For example some of the string == "" that are reported, string isn't a
> QString, but really a std::string. So I need to modifiy the script to find
> where the variable is declared and check that it is a QString.

Well, you should use std::string::empty(), then. No need to filter them out.

Another tip: you should exclude assignment, equivalence and comparison operators from checking for "this->", since I use it for readability there:

const Foo & operator=( const Foo & that ) {
if ( this->d == that.d )
return *this;
// ...
}

BTW: Many checks for sane C++ (like checking operator=() returns *this, can be done by the compiler. See GCC's -Weff-c++ (which checks for various items from the "Effective C++" book), -Wold-style-cast (which checks for old C-style casts used instead of the C++ casts), etc. They will emit a lot of hits in Qt and other C++ libraries, but writing a script that greps out the Qt and libstdc++ hits would be very welcome and probably quite easy to write.

by Benjamin Meyer (not verified)

Neat! Didn't know about std::string::empty() before. Learn something every day.

by Anonymous (not verified)

Can we get stats for these test (possibly included in the CVS-Digest?)

Would be interesting to see:

overall # of failed test
# of failed test / module
# of failed test / test type (e.g. 13 missing icons)

and a trend over time for these numbers.

by Derek Kite (not verified)

Hmmm. This would fit into a category called 'State of the Repository'.

Does everything build? (probably the most important test). How about the html regression tests that coolo and others are laboriously maintaining? Then of course icefox's tests. Maybe a valgrind run or two.

This would be interesting, and possibly even helpful. Experience brings two issues to mind. A while ago I attempted to comment weekly on whether the repository would build. I use gentoo's cvs ebuilds, so I could see what failed, what worked. Neato! Except building an inherently unstable codebase a day or two before needing a working desktop to work on the Digest proved foolhardy.

The second issue is more important, and alluded to earlier. Machine tests invariably show false positives or have other issues. The results of machine tests are a dataset that a developer has to go through, and based on knowledge and experience, either fix or ignore. Or more importantly, highlighting the machine test results could set priorities in the development effort that are inappropriate. I'm not disparaging icefox and his tests, or even his web page. He is an active developer, and has done much of the work in fixing the issues his tests have highlighted. He is experienced enough to know the ins and outs of this stuff. I don't feel it is my place to do this.

The real reason I don't want to do it is laziness.

Derek

by Anonymous (not verified)

Why is the input field so wide? Forces the user to either scroll or resize browser window to full screen when typing a reply to a post.

by Max Howell (not verified)

The scripts look cool, but I had to open one up to find out how to use them. Could usage information go in the README? Ta! Also 80 character width non-HTML README files are more pleasant! Thanks!

by Evan "JabberWok... (not verified)

lynx -dump README.html |less
or
lynx -dump README.html >README.txt

I agree - for developer tools, the appropriate documentation format is generally text, only using something more complicated when necessary. That said, html to text is an easy step, whereas the reverse is not.

by cm (not verified)

The current README doubles as introductory text on the report page. Maybe the two uses should be separated, but usage information for the scripts definitely doesn't belong on the report page.

by Benjamin Meyer (not verified)

Maybe I could have the first lines be html comments Something like:

Maybe I could add a command line option -h to all the scripts?

by Per (not verified)

This reminds me of a test procedure we used where I used to work:
You write a test-program that #includes the files you want to test. You then call the functions you want to test and compares that the result is the expected one. Possibly there are also some cleaning up to do afterwards like deleting created files or freeing memory. Bug reports can get new test cases in this file and the tests can then be reexecuted to check that fixing new bugs doesn't reopen old ones.

by Good idea! (not verified)

I have never heard of this before. Is it used in the industry a lot? It would be interesting to hear opinions on this from other developers as I have never seen anything like this done on a large scale...

(my inexperience is showing)

by cm (not verified)

Google for "test-driven development" and "unit tests" to learn more.

There are frameworks for the creation of such tests for many areas:

For java there's junit: http://www.junit.org/index.htm
For web applications there's HttpUnit: http://httpunit.sourceforge.net/
For PHP code there's PHPUnit2: http://pear.php.net/package/PHPUnit2

I don't know any frameworks for C++ but that doesn't mean anything. I'm sure they exist,
I just don't know them... maybe some experienced C++ developer can comment on them?

by Marc Mutz (not verified)

There's CppUnit, but I don't think anyone in KDE uses it.
autoconf/automake has special support for running regression tests. Grep Makefile.am for TESTS= to see how they are used (they're run by "make check").

Unit tests don't quite catch on in KDE outside libs, simply b/c they are low-level testing tools that can't test GUIs and b/c most of KDE is very hard to separate from the rest to actually unit test it. E.g. It's almost impossible to test KMail modules since virtually everything in there references an object named "KMKernel". For other's it's generally KApplication, which is very bad to have in your unit tests since it slows down "make check" tremendously compared to QApplication.

I was quite disappointed that Automated Testing in KDE now equals running crude source-level checkers. Not even the compiler's own mechanisms are used (see my other post). When I read the title of the article, I thought this was either about creating KDExecutor[1] scripts for various applications (inspired by David Faure's KDX talk at aKademy), integrating it using xvfb into "make check" or simply just collect use cases that interested users can run locally (either by hand or by recording or playing back KDX scripts).

OTOH, unit and regression tests are best written by the developers, since ideally, the test should come before the code anyway :)

Marc

[1] I should mention that I work for KDAB, the company that creates and sells KDExecutor. However, I'm not involved in it's development.

by cm (not verified)

> I was quite disappointed that Automated Testing in KDE now equals running
> crude source-level checkers. Not even the compiler's own mechanisms are used
> (see my other post).

Well, but still Benjamin's tool and especially the way the test results are published encouraged some people to create their very first patch (for myself it's been the first patch in a really long time...). I think this a good way to get new contributors. To see the number of errors reported by the "crude checkers" slowly go to zero can be a satisfying (first) goal for someone who doesn't know enough C++ yet to step forth and fix a bug or even take over maintainership of a piece of KDE's code. It's like an even lighter form of a Junior Job (JJ).

The use of compiler checks could be integrated without much pain, I guess.

The use of KDX is different, though. As you said, this is about GUI checks, whose results probably cannot be presented as easily on a web page (but maybe I'm wrong here, I haven't tried KDX yet).

I think both types of tests are useful. It's not like Benjamin is saying: "This is the way KDE's automatic tests are done.". He just created an automatic test report. It does not need to remain the only one.

But back to unit tests: Do you think unit tests and the "make check" feature should be used in KDE on a much broader basis? You say it's hard outside kdelibs, do you think it's feasible, desirable, or even worth the effort?

by Marc Mutz (not verified)

> To see the number of errors reported by the "crude checkers" slowly go to
> zero can be a satisfying (first) goal for someone who doesn't know enough C++
> yet to step forth and fix a bug or even take over maintainership of a piece
> of KDE's code. It's like an even lighter form of a Junior Job (JJ).

I'm not so sure. It takes a bit of thought to check for side effects when changing code. Also, the number of "errors" will never drop to zero, since in some cases, half of them are false positives, e.g. most

foo( QString str );

are actually in dcop interfaces where you apparently can't use const-&.

I also fear that people will step in and blindly sacrifice readability for immeasurable performance gains or to propose signalling constructs like
// KDE_NEXT_LINE_IS_FALSE_POSITIVE_FOR( check1, check2 )
Just to keep the stats looking good. I've seen similar things happening with pgp keysigning parties, when the MSD metric was regularly computed and the top50 and top1000 posted.

> The use of KDX is different, though. As you said, this is about GUI checks,
> whose results probably cannot be presented as easily on a web page (but maybe
> I'm wrong here, I haven't tried KDX yet).

KDX has HTML export for test results. But my key point is that the "tests" this aticle speaks about are not automated. They can't be done by users. At least it would be non-sensical. Automated tests, like unit tests or GUI test scripts, OTOH, can be run by a "normal" user, in the case of KDX even without having the source code, and then the reports come flowing back in.

Everyone can run "make check" and use Konsole's save history to prepare a bug report. Everyone can d/l KDX and the set of KDX scripts for the KDE app he wants to contribute something to, and run them.

This hurdle is even less high than reporting a bug by yourself, since you need to prepare/find a way to reproduce it and maybe you're not good with English and explaining the procedure is hard for you or you use a localised desktop and need to re-translate what you see into what you think is the original string. With automated tests, OTOH, you just open a bug report "KDX script foo-bar-baz fails". All the rest of the information is filled in for you (KDE version, OS, compiler version, hw architecture).

As for this being a way to drag in new developers: I concur.

> But back to unit tests: Do you think unit tests and the "make check" feature > should be used in KDE on a much broader basis?

Yes, definitely. In kdelibs, make check just builds a lot of test programs that really are demos, not tests. And that's only to be expected. How do you want to unit-test kio_http? First, you need a HTTP server running locally, with a defined set of pages... See?

> You say it's hard outside
> kdelibs, do you think it's feasible, desirable, or even worth the effort?

No, I said they're useless outside of libs. Libs in general. You can perfectly cover QTextCodec with unit tests. Very easily. You can't check anything that is a subclass of QWidget with unit tests, though.

And since it is the case that so small a part of KDE is actually unit-testable, I think it's more important to get non-programmers doing GUI testing and collecting use cases. And KDX scripts are a nice way to _write down_ use cases, complete with the ability to play them back anywhere you want by whoever feels inclined to give something back. You can let them run automatically with "make check" integration, or you can play them on a live application and actually see the mouse move around.

Don't underestimate the power of users playing around with stuff like that. _That_'s what I call automated testing. From a developer's POV. Let the computer (unit tests) or the Q&A dept of interested users (KDX scripts) do the boring stuff, since for them it's interesting, and they want to challenge you, while you want to protect yourself.

Ah, I should stop here. Already written a novel... I hope I got my point across :)

by Benjamin Meyer (not verified)

"I'm not so sure. It takes a bit of thought to check for side effects when changing code. Also, the number of "errors" will never drop to zero, since in some cases, half of them are false positives, e.g. most

foo( QString str );

are actually in dcop interfaces where you apparently can't use const-&."

As I false poitives are found I have been tweaking the scripts to not show them. For example with dcop interfaces I simply didn't know you can't have const & so didn't hide them.

-Benjamin Meyer

by Benjamin Meyer (not verified)

"I was quite disappointed that Automated Testing in KDE now equals running crude source-level checkers."

These script I am working on are equal to crude source-level checkers that is true*, but within KDE there are a bunch of test applications that already exist especially within kdelibs. There is a lot of talk about KDEcecutor and other ways of testing. One of the big goals (if you want to call it that) about the test scripts is that it does make some noise, bring up discusion and get people involved in making test tools/apps/scripts for KDE. I am not saying by any means that this is the way to go. It can only be a good thing if others get interested and write scripts, testing frameworks or anything else. Also because of the small simple nature of bugs it scripts find it makes for easy fixing for developers interested in learning more about KDE.

*What is suprising is just how _much_ these crude scripts find on the first pass.

-Benjamin meyer

by Old Idea (not verified)

Unit tests are used extensively in industry. Most large companies have a suite of unit tests that are run nightly or whenever new code is checked in. In fact, there is even a position for people who develop unit tests: test engineer.