Skanning with Kooka

Thursday, 21 December 2000 | Dre

Torsten Rahn is ebullient about a KDE program for scanning using SANE called Kooka. "This is a real nice productivity-app proving that it's easy to create extremely useful apps for KDE 2 with relatively little work." Details below.

Kooka

Kooka is a raster image scan program for the KDE2 system.

   PLEASE READ THE FILE "WARNING" FIRST !
   Using kooka may damage your hardware !

It uses the SANE libraries and the the KScan library, which is a KDE 2 module providing scanner access.

KScan and Kooka are under construction. Don't expect everything to work fine. If you want to help, send patches to freitag@suse.de.

Screenshots

The best way to describe an app is often with screenshots:

Features

Kookas main features are:

SANE

Scanner support using SANE. Kooka does not support all features that SANE and its backends offer. It takes a small subset of the available options.
Kooka offers a GUI to change the most important scanner options like resolution, mode, threshold etc. These options are generated on the fly, depending on the scanner capabilities.
Kooka offers a preview-function and the user can select the scan area interactively.

Image storage

Kooka provides an assitant to save your acquired images.
Filenames are generated automatically to support multiple scans.
Kooka manages the scanned images in a tree view where the user can delete and export images.

Image Viewing

Scanned images can be viewed by clicking them in the tree view.
The viewer has a zoom function.

OCR: Kooka supports Joerg Schulenburg's gocr, an open source program for optical character recognition. Kooka starts the OCR program and displays its output. Best results with bw-images scanned with ~150 dpi.

Kooka is being maintained by Klaas Freitag and can be found in kdenonbeta together with KScan.

Comments:

Looks nice - Ash - 2000-12-21

Cool, now if only I didn't have a paralell scanner : (

Re: Looks nice - Andrea Cascio - 2000-12-21

But some parallel scanners are supported by SANE, so they shoud work with Kooka Have a look at http://www.buzzard.org.uk/jonathan/scanners.html

Re: Skanning with Kooka - Tom Philpot - 2000-12-21

Looks great! Will it be integrated into the KParts framework? How cool would it be to be in KWord and have a button to scan in a document and have it OCR'd and then opened in KWord for correction and editing? All without leaving KWord! Keep up the great work! I look forward to hearing more about this project... Now, maybe if I could just find the $ to get a scanner :)

Re: Skanning with Kooka - KDEer - 2000-12-22

How about a system in kio? scanner://1 would result in a directory with two files (for the scanner #1) <ul> <li>A file containing the scanned file as a graphic <li>A file containing the scanned file as ASCII text That way you could open it with ANY kde app! Also, how about OCR support in other places. Say, download an image online, run it through a KDE prog, and get text.

Re: Skanning with Kooka - Klaas Freitag - 2000-12-23

There were discussions about that already. I dont like the idea to have just one image in a directory scanner:/1, because one image is not enough. Should it always be the last scanned image or should it be scanned on the fly ? You mostly need more than one try to get a cool scan result. Thats why I favour a scan application more than a kind of filesystem. Kooka saves your scan results automatically into a 'special' directory. Future releases will be able to save descriptions like date, caption etc. beside the images. I would like the idea that scanner:/1 leads you to that directory where you are able to manage your image pool. And that's hopefully fairly easy, because that should only be just a symlink to Kooka's image save dir or a bookmark ;-)

Re: Skanning with Kooka - Christian Naeger - 2000-12-21

Hi. Looks really cool, I like it. 2 hours ago, I just finished my university project on Handwriting OCR. As far as I have seen on the web page, gocr, the OCR engine, does not use neural networks but a more classic approach. I would like to dig into the source but I just started my thesis -- so little time :-( Chris

Re: Skanning with Kooka - Bernhard Vornefeld - 2000-12-27

This description sounds indeed very promising. A special point of interest for me lies in image storage. Kooka seems to be prepared to handle extensive scan jobs. (By the way: are automatic document feeders supported?). Automatic generation of file names can be a good aid, but another option would make it even better: Semi-automatic embedding of metadata. There are a few approaches of handling metadata availabe right now: <ul> <li>IPTC-Codes, embedded directly into some types (e.g. jpeg, tiff) of image-files, see <a href="http://www.cepic.org/iptc.htm">http://www.cepic.org/iptc.htm</a>. IPTC is very polular right now in professional image management systems. <li>(PHOTO-)RDF, wich can also be embedded, see <a href="http://www.w3.org/TR/photo-rdf/">http://www.w3.org/TR/photo-rdf/</a> </ul> It would be really a relief for document storage and retrieval, if the handlig of metadata would available in KDE-Applications. Kooka could mark the start, pixie, konqueror etc. to follow.

Re: Skanning with Kooka - Klaas Freitag - 2000-12-27

Thank you very much for the links. Yes, I tried to design kooka to handle mass scanning with approbiate scanners. SANE offers drivers for Fujitsu and Bell and Howell by now, maybe that improves ?ADF-Support depends on SANE. If a SANE driver supports ADF, kooka also should. There is already something coded (see massscandialog.cpp), but not yet finished and tested, because I do not have an ADF (yet). Automatic file name generation is just a starting point. The interface to the object, which stores the images, was designed as 'slim' as possible to allow the implementation of storage objects as required: Handling XML, database connections or whatever. Barcode and/or form recoginition for metadata generation should also be possible.

Re: Skanning with Kooka - Hannes Kruger - 2000-12-29

I have been looking for this for a while. Thanks. Any chance that you may include an interface to the HANDWRITTEN recognition system from NIST.

Re: Skanning with Kooka - Klaas Freitag - 2000-12-29

Hannes, I dont know what NIST is, but handwriting recognition sounds very interesting to me. Is there any open source software existing already?

Re: Skanning with Kooka - Christian Naeger - 2001-01-01

The NIST Public Domain OCR System Release 2.1 is at: http://www.itl.nist.gov/iaui/894.03/doc/doc.html It recognizes handprint characters only (in german: Druckbuchstaben). Perhaps it can be included in the Recognition engine. Chris

Re: Skanning with Kooka - Jay Austad - 2001-04-24

This is just what I'm waiting for. I'm writing a little KDE app that requires scanner support. Will Kooka work as a Kparts plugin type thing? I need to be able to call it to import the image into my app. Does Kooka have auto selection of the media being scanned? (if you insert a 4x5 picture, will it autoselect the picture at the edges, or do you have to do that manually?)

Tif scanning in Kooka - Chris - 2004-11-17

I was just wondering about why the ability to save as a tif file hasn't been included in kooka ? You can scan in binary/b&w mode but what format can it be saved in as a b&w image file ? The scanimage command has the option to output as tif. A great job has been done on the app, with a lot of progress. G4 compressed tif files are small and can be at 600dpi, quite easily. I used to work for a company supporting professional printing packages that work with the G4 tifs so because of size and quality. A 600dpi uncompressed binary tif, can be 4.2M, a greyscale tif is 34M so when a G4 compressed tif is around 90K, it is a huge difference in size. G3 compression reduces it to about 200K. A 600dpi pnm file is also 34M, binary is 4.1M, Jpeg greyscale is 3.2M, binary is about 1.3M. A long way from 90K, so unless for license restrictions, it seems silly to me, not to be able to save as a G4 compressed tif from xscanimage or kooka, hence the script. I found a way for this script to work with whichever sane scan device you may have, though not perhaps the most beautiful, it works and I adapted it to output to G4 compressed pdf to. 2 scripts which I saved to my /usr/bin directory. --------------------------------------------- tifscan.sh #!/bin/sh usage() { echo "Usage: tifscan {nameofimage.tif}" } #test to see if a filename has been entered if [ $# -lt 1 ] ; then usage ; exit 1 ; fi name=$1 #Read output of help command to get scanner device name scanner=`scanimage --help | tail --lines=1 ` echo Now scanning your A4 document on $scanner #scan the A4 binary(b&w) file uncompressed at 300dpi to temporary file scanimage -d $scanner --mode binary --resolution 300 --quick-format A4 --format tiff >temp-$name #Use tif utility to convert the temporary binary tif to a G4 compressed tif and then delete the temporary file tiffcp -c g4 temp-$name $name rm -f temp-$name # display resulting G4 tiff file kfax $name ---------------------------------------------- pdfscan.sh #!/bin/sh usage() { echo "Usage: pdfscan {nameofimage.pdf}" } #test to see if a filename has been entered if [ $# -lt 1 ] ; then usage ; exit 1 ; fi name=$1 #Read output of help command to get scanner device name scanner=`scanimage --help | tail --lines=1 ` echo Now scanning your A4 document on $scanner #scan the A4 binary(b&w) file uncompressed at 300dpi to temporary file scanimage -d $scanner --mode binary --resolution 300 --quick-format A4 --format tiff >temp-$name.tif #convert to pdf with G4 compression tiff2pdf temp-$name.tif -p A4 -q G4 -o $name #display pdf kghostview $name ---------------------------------------------- Chris

Re: Tif scanning in Kooka - Christopher Booth - 2004-12-06

Problem: The script didn't work with the 2.4 kernel, because of the way USB detects the scanner. In 2.4 kernel 3 devices are listed for my 1 scanner epson:/dev/usb/scanner0 epson:/dev/usbscanner0 epson:/dev/usbscanner whereas under 2.6 only one scanner is listed. Resolution: So swap the line which says : scanner=`scanimage --help | tail --lines=1 ` with scanner=`scanimage --help | sed -e 's/ /\n/g' | tail --lines=1 ` or even better scanner='scanimage -f %d | sed -e 's/0/\n/g' | tail --lines=1 ' which should work on 2.4 or 2.6 kernel, plus hopefully others Regards Chris

Re: Tif scanning in Kooka - Dr.V.shivakumar Sharma - 2005-11-28

Respected Sir, I am Dr.V.Shivakumar Sharma writing to you from India, Karnataka State, Mysore City. I saw your site, found it highly intersting and valuable information also. Please furnish me some details for my personal usage: 1. I need a software to compress my researched pdf results, they are occupying a lot of space. 2. I am facing a lot of problems for keeping my pictures in the tiff formats also. they are also occupying a huge amount of space. Can u please let me know the solution for the above problems and hence the reduce the size for the pdfs and the tiffs and help me Sir. Hope to do a healthy and longstanding longstanding business relationship with Professionals lilke You Sir. Dr.V.shivakumar Sharma Direct: +919845120010

Re: Tif scanning in Kooka - Danny Staple - 2006-03-22

Okay - I have built some improvements on this script, turning it into a small bit of perl. -------------------------------- #!/bin/perl # pdfscan, adapted from post by Christopher Booth, 2004 # Adapted by Danny Staple, 2006 use Term::ReadKey; sub usage() { print <<USAGETEXT Usage: pdfscan {nameofimage.pdf}\n pdfscan will use the default scanner (in a single scanner set up) and scan to a PDF file. Warning - these are big memory operations! USAGETEXT ; exit(); } sub user_has_more() { print "More to scan (y/n)?\n"; my $key; do { ReadMode 'cbreak'; $key = ReadKey(0); ReadMode 'normal'; if($key eq 'y' or $key eq 'Y') { return 1; } } while ($key ne 'n' and $key ne 'N'); return 0; } my $outputname = $ARGV[0] or usage(); #Read output of help command to get scanner device name my $scannerdevice =`scanimage --help | tail --lines=1 `; chomp($scannerdevice); print "Now scanning your A4 document on $scannerdevice\n"; #scan the A4 file uncompressed at 300dpi #--quick-format A4 - Note we are creating a temp file until we can find a way #to get tiff2pdf to take standard input my $count = 0; my $cpargs=""; do { print "scanning....\n"; $count ++; my $imagedata = `scanimage -d $scannerdevice --mode Color --resolution 300 --format tiff >temp-$outputname-$count.tiff`; $cpargs = "$cpargs temp-$outputname-$count.tiff" } while(user_has_more() == 1); print "stitching...\n"; `tiffcp $cpargs temp-$outputname-all.tiff`; #Look at multiple page scans - either from a preset parameter, or an interactive prompt, using the tiffcp command #Convert to a B&W tiff as well, and pass through ocr. Filter out non-dictionary print "outputting\n"; #convert to pdf with jpeg compression - pass in our image stream print `tiff2pdf temp-$outputname-all.tiff -j -p A4 -o $outputname`; for my $i (1..$count) { unlink("$temp-$outputname-$i.tif"); } unlink("temp-$outputname-all.tiff") or print ("Failed to remove output file\n"); #display pdf #kghostview $name -------------------------------------------- I am sure it could still be done in sh, and there are comments with stuff I may do later. If there is no objection from Chris, I may pop this onto berlios as an open source project. I have planss for this - meanwhile, it is now the core of my document scanning. Danny http://orionrobots.co.uk

Re: Tif scanning in Kooka - Danny Staple - 2006-03-22

How annoying - the posting system removed all my indenting. And there was me thinking posters just neglected it... There is a bug there - the $temp on the unlink in the loop should actually just be temp, no dollar sign. Danny