[KDE Dot News]
 faq
 flatforty
 contribute
 subscribe
 configure
 search
 rdf

 main
 parent
 thread


Re: gqview does this
by raindog on Monday 31/Dec/2001, @20:32
Thanks, looking forward to trying it out. It'll be nice to see an implementation of my algorithm that doesn't take 18 hours :) (The gqview guys used their own algorithm, it's fast enough but the results are a little different...)
  Related Links
 ·   Articles on Graphics and Art
 ·   Also by raindog
 ·   Contact author

Thread Threshold:

The Fine Print: The following comments are owned by whomever posted them.
( Reply )

Howdy! :)
by Mosfet on Monday 31/Dec/2001, @21:24
Hiya, nice to see you here! I found the algorithm in findimagedupes more accurate. It's really rather clever. The algorithm I used is virtually identical to yours. As a matter of fact, the reason I started porting effects from ImageMagick was to get the necessary blur, normalize, and equalize effects as fast as possible ;-) You'd think it would be slower than averaging blocks of pixels like GQView, but actually it's not. In PixiePlus it's way fast :)

Findimagedupes performs quite well with a small amount of images, but really suffers from a small hash table size. For 47 images PixiePlus takes 11 seconds while findimagedupes takes 20. That's really not bad since it's compiled C++ vs. interpreted code. But when you increase the number to 5,049 images PixiePlus only takes 23 minutes while findimagedupes takes 1hr 16min.

PixiePlus also uses a persitant database, but it's not compatible with findimagedupe's. It's a binary database that also includes timestamps so you know if an image is modified and needs to get a new fingerprint. I could write an import and export to findimagedupes if you think it would be useful. I certainly can see value in a commandline utitlity for this as well as Pixie :)

Here's a comparison of the different methods I wrote for the Pixie docs. I also did a quicky screenshot of the comparison results which uses a nifty tree view (double click on the image to view, right click for a menu). It's at http://www.mosfet.org/compare.png. Of course there is also a 2 step progress dialog for generating/loading fingerprints and comparing ;-)

--------------------------

Notes on simularity comparison algoritm and performance

Much attention was spent on making finding similiar images perform as
fast as possible while still being accurate. Compared with other Unix
offerings PixiePlus performs quite well in this aspect.

There are two main algorithms for finding similiar images. One is used
by GQView and ShowImg. It is based on averaging blocks of color
channels and comparing the result. This is the obvious algorithm and
the one I was originally going to adopt. In GQView it performs with
acceptable speed, in ShowImg it does not. I'm not sure why this is
since the code is pretty much directly copied from GQView. Either way,
it tends to miss some matches found in the other available algorithm.

The other algorithm isn't based on colors at all but identifying the
general patterns in the image. This is the method used by the Perl
utility findimagedupes and is the method I adopted for Pixie. What it
does is sample an image to a standard size, apply a couple effects to
get rid of abnormalities, scale it down again, then convert it into a
string of bits suitable for use as a thumbprint. While it sounds like
it would take much longer than the above method, in reality it
doesn't, and it finds similiar images the other algorithm misses.

Like findimagedupes and unlike GQView and ShowImg a persistent
database of fingerprints is used so you don't have to regenerate
fingerprints or calculate image blocks each time you compare images. A
binary database is used in order to avoid parsing ASCII
representations of the binary thumbprint. Unlike findimagedupes the
database also contains timestamps so it can tell if an image has been
modified. A large hash table is used and should perform well for
folders up to 6,000 images. After that expect performance degragation.
Findimagedupes especially suffers from this. On my test machine,
(AMDK6 450MHZ/128M), initially comparing 47 large images takes 11 sec
on Pixie and 20 sec using findimagedupes. This is really good speed
for findimagedupes considering it's in Perl. But when the number of
images is increased to 5,049 findimagedupes slows down to 1hr 16min
while PixiePlus only takes 23min 14 sec (the fastest of the bunch).
[ Reply To This | View ]
  • Re: Howdy! :)
    by raindog on Tuesday 01/Jan/2002, @07:53
    Oh, I don't mind incompatibility if it means PixiePlus can compare 5000+ images in 23 minutes. Mine was originally meant as a lazy reference implementation and it turned out to be good enough for my mostly small image collections. Maybe with QT3 the local database file won't even matter anymore if its built in database functionality is worth anything.

    But I may try to do a C++ command line version that uses your database format; someone emailed me C++ compare code using my format (built it but never benchmarked it) so it'd really be a matter of teaching it about your database format and tying in Magick++ support to build the DB. I've had complaints about my kludgy attempts at providing an interface for GUI programs but now it won't be necessary. Thanks again. Neat looking result interface by the way.
    [ Reply To This | View ]

 
The Fine Print: The previous comments are owned by whomever posted them.
( Reply )

  "Feature freeze: You're allowed to add new bugs, but no new features." -- Richard J. Moore
KDE®, "K Desktop Environment", "KDE Dot News", "got the dot?" and the KDE Logo® are trademarks or registered trademarks of KDE e.V. in the European Union, the United States and other countries. All other trademarks and copyrights on this page are owned by their respective owners. Comments are owned by the poster. The rest: Copyright © 2000-2008 KDE e.V. for The KDE Project. For further information or comments on this site, please contact the Webmaster.
[ home | post article | flat forty | subscribe | search | rdf ]