Skanning with Kooka
Thursday, 21 December 2000 | Dre
Torsten Rahn is ebullient about a KDE program for scanning using SANE called Kooka. "This is a real nice productivity-app proving that it's easy to create extremely useful apps for KDE 2 with relatively little work." Details below.
Kooka
Kooka is a raster image scan program for the KDE2 system.
PLEASE READ THE FILE "WARNING" FIRST ! Using kooka may damage your hardware !
It uses the SANE libraries and the the KScan library, which is a KDE 2 module providing scanner access.
KScan and Kooka are under construction. Don't expect everything to work fine. If you want to help, send patches to freitag@suse.de.
Screenshots
The best way to describe an app is often with screenshots:
Features
Kookas main features are:
- SANE
- Scanner support using SANE. Kooka does not support all features that SANE and its backends offer. It takes a small subset of the available options.
- Kooka offers a GUI to change the most important scanner options like resolution, mode, threshold etc. These options are generated on the fly, depending on the scanner capabilities.
- Kooka offers a preview-function and the user can select the scan area interactively.
- Image storage
- Kooka provides an assitant to save your acquired images.
- Filenames are generated automatically to support multiple scans.
- Kooka manages the scanned images in a tree view where the user can delete and export images.
- Image Viewing
- Scanned images can be viewed by clicking them in the tree view.
- The viewer has a zoom function.
- OCR: Kooka supports Joerg Schulenburg's gocr, an open source program for optical character recognition. Kooka starts the OCR program and displays its output. Best results with bw-images scanned with ~150 dpi.
Kooka is being maintained by Klaas Freitag and can be found in kdenonbeta together with KScan.
Comments:
Looks nice - Ash - 2000-12-21
Cool, now if only I didn't have a paralell scanner : (
Re: Looks nice - Andrea Cascio - 2000-12-21
But some parallel scanners are supported by SANE, so they shoud work with Kooka<br> Have a look at http://www.buzzard.org.uk/jonathan/scanners.html
Re: Skanning with Kooka - Tom Philpot - 2000-12-21
Looks great! Will it be integrated into the KParts framework? How cool would it be to be in KWord and have a button to scan in a document and have it OCR'd and then opened in KWord for correction and editing? All without leaving KWord! Keep up the great work! I look forward to hearing more about this project... Now, maybe if I could just find the $ to get a scanner :)
Re: Skanning with Kooka - KDEer - 2000-12-22
How about a system in kio? <BR> scanner://1 would result in a directory with two files (for the scanner #1) <ul> <li>A file containing the scanned file as a graphic <li>A file containing the scanned file as ASCII text That way you could open it with ANY kde app! <BR><BR> Also, how about OCR support in other places. Say, download an image online, run it through a KDE prog, and get text.
Re: Skanning with Kooka - Klaas Freitag - 2000-12-23
There were discussions about that already. I dont like the idea to have just one image in a directory scanner:/1, because one image is not enough. Should it always be the last scanned image or should it be scanned on the fly ? You mostly need more than one try to get a cool scan result. Thats why I favour a scan application more than a kind of filesystem. <P> Kooka saves your scan results automatically into a 'special' directory. Future releases will be able to save descriptions like date, caption etc. beside the images. <P> I would like the idea that scanner:/1 leads you to that directory where you are able to manage your image pool. And that's hopefully fairly easy, because that should only be just a symlink to Kooka's image save dir or a bookmark ;-)
Re: Skanning with Kooka - Christian Naeger - 2000-12-21
Hi. Looks really cool, I like it. 2 hours ago, I just finished my university project on Handwriting OCR. As far as I have seen on the web page, gocr, the OCR engine, does not use neural networks but a more classic approach. I would like to dig into the source but I just started my thesis -- so little time :-( Chris
Re: Skanning with Kooka - Bernhard Vornefeld - 2000-12-27
This description sounds indeed very promising. A special point of interest for me lies in image storage. Kooka seems to be prepared to handle extensive scan jobs. (By the way: are automatic document feeders supported?). <p> Automatic generation of file names can be a good aid, but another option would make it even better: Semi-automatic embedding of metadata. There are a few approaches of handling metadata availabe right now: <ul> <li>IPTC-Codes, embedded directly into some types (e.g. jpeg, tiff) of image-files, see <a href="http://www.cepic.org/iptc.htm">http://www.cepic.org/iptc.htm</a>. IPTC is very polular right now in professional image management systems. <li>(PHOTO-)RDF, wich <b>can</b> also be embedded, see <a href="http://www.w3.org/TR/photo-rdf/">http://www.w3.org/TR/photo-rdf/</a> </ul> It would be really a relief for document storage and retrieval, if the handlig of metadata would available in KDE-Applications. Kooka could mark the start, pixie, konqueror etc. to follow.
Re: Skanning with Kooka - Klaas Freitag - 2000-12-27
Thank you very much for the links.<P> Yes, I tried to design kooka to handle mass scanning with approbiate scanners. SANE offers drivers for Fujitsu and Bell and Howell by now, maybe that improves ?<P>ADF-Support depends on SANE. If a SANE driver supports ADF, kooka also should. There is already something coded (see massscandialog.cpp), but not yet finished and tested, because I do not have an ADF (yet). <P> Automatic file name generation is just a starting point. The interface to the object, which stores the images, was designed as 'slim' as possible to allow the implementation of storage objects as required: Handling XML, database connections or whatever. Barcode and/or form recoginition for metadata generation should also be possible.
Re: Skanning with Kooka - Hannes Kruger - 2000-12-29
I have been looking for this for a while. Thanks. Any chance that you may include an interface to the HANDWRITTEN recognition system from NIST.
Re: Skanning with Kooka - Klaas Freitag - 2000-12-29
Hannes, I dont know what NIST is, but handwriting recognition sounds very interesting to me. Is there any open source software existing already?
Re: Skanning with Kooka - Christian Naeger - 2001-01-01
The NIST Public Domain OCR System Release 2.1 is at: http://www.itl.nist.gov/iaui/894.03/doc/doc.html It recognizes handprint characters only (in german: Druckbuchstaben). Perhaps it can be included in the Recognition engine. Chris
Re: Skanning with Kooka - Jay Austad - 2001-04-24
This is just what I'm waiting for. I'm writing a little KDE app that requires scanner support. Will Kooka work as a Kparts plugin type thing? I need to be able to call it to import the image into my app. Does Kooka have auto selection of the media being scanned? (if you insert a 4x5 picture, will it autoselect the picture at the edges, or do you have to do that manually?)
Tif scanning in Kooka - Chris - 2004-11-17
I was just wondering about why the ability to save as a tif file hasn't been included in kooka ? You can scan in binary/b&w mode but what format can it be saved in as a b&w image file ? The scanimage command has the option to output as tif. A great job has been done on the app, with a lot of progress. G4 compressed tif files are small and can be at 600dpi, quite easily. I used to work for a company supporting professional printing packages that work with the G4 tifs so because of size and quality. A 600dpi uncompressed binary tif, can be 4.2M, a greyscale tif is 34M so when a G4 compressed tif is around 90K, it is a huge difference in size. G3 compression reduces it to about 200K. A 600dpi pnm file is also 34M, binary is 4.1M, Jpeg greyscale is 3.2M, binary is about 1.3M. A long way from 90K, so unless for license restrictions, it seems silly to me, not to be able to save as a G4 compressed tif from xscanimage or kooka, hence the script. I found a way for this script to work with whichever sane scan device you may have, though not perhaps the most beautiful, it works and I adapted it to output to G4 compressed pdf to. 2 scripts which I saved to my /usr/bin directory. --------------------------------------------- tifscan.sh #!/bin/sh usage() { echo "Usage: tifscan {nameofimage.tif}" } #test to see if a filename has been entered if [ $# -lt 1 ] ; then usage ; exit 1 ; fi name=$1 #Read output of help command to get scanner device name scanner=`scanimage --help | tail --lines=1 ` echo Now scanning your A4 document on $scanner #scan the A4 binary(b&w) file uncompressed at 300dpi to temporary file scanimage -d $scanner --mode binary --resolution 300 --quick-format A4 --format tiff >temp-$name #Use tif utility to convert the temporary binary tif to a G4 compressed tif and then delete the temporary file tiffcp -c g4 temp-$name $name rm -f temp-$name # display resulting G4 tiff file kfax $name ---------------------------------------------- pdfscan.sh #!/bin/sh usage() { echo "Usage: pdfscan {nameofimage.pdf}" } #test to see if a filename has been entered if [ $# -lt 1 ] ; then usage ; exit 1 ; fi name=$1 #Read output of help command to get scanner device name scanner=`scanimage --help | tail --lines=1 ` echo Now scanning your A4 document on $scanner #scan the A4 binary(b&w) file uncompressed at 300dpi to temporary file scanimage -d $scanner --mode binary --resolution 300 --quick-format A4 --format tiff >temp-$name.tif #convert to pdf with G4 compression tiff2pdf temp-$name.tif -p A4 -q G4 -o $name #display pdf kghostview $name ---------------------------------------------- Chris
Re: Tif scanning in Kooka - Christopher Booth - 2004-12-06
Problem: The script didn't work with the 2.4 kernel, because of the way USB detects the scanner. In 2.4 kernel 3 devices are listed for my 1 scanner epson:/dev/usb/scanner0 epson:/dev/usbscanner0 epson:/dev/usbscanner whereas under 2.6 only one scanner is listed. Resolution: So swap the line which says : scanner=`scanimage --help | tail --lines=1 ` with scanner=`scanimage --help | sed -e 's/ /\n/g' | tail --lines=1 ` or even better scanner='scanimage -f %d | sed -e 's/0/\n/g' | tail --lines=1 ' which should work on 2.4 or 2.6 kernel, plus hopefully others Regards Chris
Re: Tif scanning in Kooka - Dr.V.shivakumar Sharma - 2005-11-28
Respected Sir, I am Dr.V.Shivakumar Sharma writing to you from India, Karnataka State, Mysore City. I saw your site, found it highly intersting and valuable information also. Please furnish me some details for my personal usage: 1. I need a software to compress my researched pdf results, they are occupying a lot of space. 2. I am facing a lot of problems for keeping my pictures in the tiff formats also. they are also occupying a huge amount of space. Can u please let me know the solution for the above problems and hence the reduce the size for the pdfs and the tiffs and help me Sir. Hope to do a healthy and longstanding longstanding business relationship with Professionals lilke You Sir. Dr.V.shivakumar Sharma Direct: +919845120010
Re: Tif scanning in Kooka - Danny Staple - 2006-03-22
Okay - I have built some improvements on this script, turning it into a small bit of perl. -------------------------------- #!/bin/perl # pdfscan, adapted from post by Christopher Booth, 2004 # Adapted by Danny Staple, 2006 use Term::ReadKey; sub usage() { print <<USAGETEXT Usage: pdfscan {nameofimage.pdf}\n pdfscan will use the default scanner (in a single scanner set up) and scan to a PDF file. Warning - these are big memory operations! USAGETEXT ; exit(); } sub user_has_more() { print "More to scan (y/n)?\n"; my $key; do { ReadMode 'cbreak'; $key = ReadKey(0); ReadMode 'normal'; if($key eq 'y' or $key eq 'Y') { return 1; } } while ($key ne 'n' and $key ne 'N'); return 0; } my $outputname = $ARGV[0] or usage(); #Read output of help command to get scanner device name my $scannerdevice =`scanimage --help | tail --lines=1 `; chomp($scannerdevice); print "Now scanning your A4 document on $scannerdevice\n"; #scan the A4 file uncompressed at 300dpi #--quick-format A4 - Note we are creating a temp file until we can find a way #to get tiff2pdf to take standard input my $count = 0; my $cpargs=""; do { print "scanning....\n"; $count ++; my $imagedata = `scanimage -d $scannerdevice --mode Color --resolution 300 --format tiff >temp-$outputname-$count.tiff`; $cpargs = "$cpargs temp-$outputname-$count.tiff" } while(user_has_more() == 1); print "stitching...\n"; `tiffcp $cpargs temp-$outputname-all.tiff`; #Look at multiple page scans - either from a preset parameter, or an interactive prompt, using the tiffcp command #Convert to a B&W tiff as well, and pass through ocr. Filter out non-dictionary print "outputting\n"; #convert to pdf with jpeg compression - pass in our image stream print `tiff2pdf temp-$outputname-all.tiff -j -p A4 -o $outputname`; for my $i (1..$count) { unlink("$temp-$outputname-$i.tif"); } unlink("temp-$outputname-all.tiff") or print ("Failed to remove output file\n"); #display pdf #kghostview $name -------------------------------------------- I am sure it could still be done in sh, and there are comments with stuff I may do later. If there is no objection from Chris, I may pop this onto berlios as an open source project. I have planss for this - meanwhile, it is now the core of my document scanning. Danny http://orionrobots.co.uk
Re: Tif scanning in Kooka - Danny Staple - 2006-03-22
How annoying - the posting system removed all my indenting. And there was me thinking posters just neglected it... There is a bug there - the $temp on the unlink in the loop should actually just be temp, no dollar sign. Danny