MAY
11
2002

Kooka Scanner Suite Now With Website

The Kooka team is proud to announce the launch of the official Kooka website. Kooka is a scanner management suite for KDE with support for Optical Character Recognition (OCR). The Kooka web site offers extensive documentation on Kooka and the KScan library, future project plans, screenshots, and much more.

Kooka supports

Scanning

  • Scanner support through the SANE library.
  • Provides a user-friendly interface for important scanner options such as resolution, mode, threshold.
  • Supports preview and full final scans.
  • Supports interactive scan area selection.

Image Storage

  • The save assistant helps you to find the correct image format for your purpose and creates a filename automatically.
  • Images are stored in the default gallery -- no need to find a place to save for every test scan.

The Image Gallery

  • A treeview-organised workplace where your images are stored.
  • Create and remove folders to organise your image collections.
  • Drag and drop with other KDE programs.

OCR

  • Kooka supports GOCR, an open source Optical Character Recognition program.

KDE Scan Service

  • The KScan Library provides a user-friendly scan interface for all other KDE applications.
  • Currently supported by both KOffice and KView.

Comments

How good is OCR? I remember trying it 4-5 years ago when I had my old scanner, and back then every OCR program I tried wouldn't work properly as it always missed words on the pages that I scanned in.. They were also terribly slow..

What's the error-percentage today?

Ps. Keep up with the good work! Did you know that Kooka is one of the few programs that Mandrake have screenshots of for each distribution release?


By Christian A Str... at Sat, 2002/05/11 - 5:00am

OCR is suprisingly good considering the pain it is to program ... I notice lots of people help with fun high-profile projects and less with low profile hard slogging type projects. gOCR, sane, gimp, pango etc. prolly share some programmers but this thing needs corporate backing I think ....


By NameSuggesterEngine at Sat, 2002/05/11 - 5:00am

In case no one notices .... SANE now has a backend that supports Gphoto2 devices as well (still under development). Thus with one masterful API wee get Kooka, camera, scanner, an OCR support .... all exportable as a service to any KDE app that wants to use it.

This may not be "Enterprise Ready" just yet but the idea is quite frankly awesome and is bound to be good for development and catching bugs. The projects are essentially leveraging each others contributors.

It would be nice if KDE, Gnome and GNUstep could export services to one another.


By Positive Spin Dr. at Sat, 2002/05/11 - 5:00am

How good is "surprisingly good considering"? If you scan in a 8.5*11 sheet of paper with clear print on it, is it likely to get the whole thing with no errors? 1 error? 10 errors? more?


By not me at Sun, 2002/05/12 - 5:00am

Looking the screenshots on the website, the rate seems to be more on the order of 2-3 per paragraph. That was for some German text so perhaps english will be a little better with the lack of "funny" marks on the characters. I could probably tolerate that kind of error but i've never used OCR before so don't know what to expect.


By somewun at Sun, 2002/05/12 - 5:00am

I'm french and I try gocr alone : it works not so bad with a single column text (the recognition will be perfect when gocr can recognize the CCEDILL and the ligature o-e), but the result is catastrophic with a multicolumn text.


By Capit Igloo at Sun, 2002/05/12 - 5:00am

What OCR software does Google use for their Catalog site(http://catalogs.google.com/)? It must be good if they can do 2,7000 catalogs. It is probably completely automated I would assume.


By Rosis at Sun, 2002/05/12 - 5:00am

> It is probably completely automated I would assume.

Don't count on it.
The company I work for is (among other things) a scanning bureau, we do document archiving and scanning and OCR of everything from courier consignment notes, to multipage forms, to books, and if you want any sort of OCR, there MUST be a manual repair process, where an operator compares the original with the OCR'd version, and fixes the mistakes.

Usaualy the OCR engine has a configurable confidence threshold, so any document, or field that doesn't meet that confidence gets put in the repair queue.

Part of the problem is that there are some letters and numbers, that if printed in the wrong font, even a 100% accurate OCR engine (which doesn't exist) could not possibly guarantee to get right. Just look at the word Ill (That's capital 'i' and two lowercase 'L's) in a sans serif font to see what I mean.

In a big job, we could easily have a room full of operators keying continually to keep up with the repairing needed on the ouput of one (high speed) scanner, and that's with forms that have clearly marked boxes for text, and dropout colours.


By Stuart Herring at Sun, 2002/05/12 - 5:00am

The 'Ill' problem is automated to some degree by simply using a dictionary. If you have a word that looks like '|||', it's not hard to figure out that 'Ill' is the only word that actually exists. If there are alternatives, choose the one with higher frequency. The process can be further improved by analyzing the sentence grammar.

Different fonts is a minor problem compared to handwriting recognition. I have an application on my Psion PDA, where I can write the word 'Reykjavik' very sloppily and still get it recognized. That's because there are no other alternatives. I can look at the list of alternative spellings, and see that 'Reukjavik', 'Heykjavik', 'Reukiavik' and other and other non-words have been ruled out.

There's no such thing as perfect OCR, but you can improve the process a lot by analyzing word by word instead of character by character.


By Gaute Hvoslef K... at Sun, 2002/05/12 - 5:00am

Anyone with Kooka and Vuescan experience have any opinions on how they stack up against each other. When I got my slide scanner, Vuescan was so much more sophisticated than anything Free that I gladly shelled out for it. Is Kooka anywhere near Vuescan?


By Anonymous at Mon, 2002/05/13 - 5:00am

Faster; not as accurate


By uncertainty_pri... at Tue, 2002/05/14 - 5:00am

Sorry, but I'm not a fan of kookas (current) UI. Like most Linux scanner/cd-writer GUIs kooka ask for a (scsi)-device. Why must I choose a device? I want _only_ choose the _name_ of my scanner/cdwriter? BTW is'nt this something for KDE control center (hardware devices)? Same thing for default resolutions, gamma etc of the device?
When the program starts most of the window has no function. This is the place where later the scanned images appear.
Kooka gallery has no standard KDE path dialog,
The sliders are hard to use - try to change resolution from 100dpi to 200dpi - I get only 196 or 201 dpi - I have to use the textedit widget. BTW the KDE-standard is first textedit widget then slider (pedant) ;-)

I like UIs that help to solve tasks. I have a scanner, so I want to scan and
- print (photo copy)
- fax
- mail
- paste as (Image|text (ocr)

Do I really need a gallery function when I could use my scanner with e.g. pixie by a kio_slave|libkscan|kpart?
I would like to open kprinter and kprintfax and find there a special scanner extension for photo copy and fax. IMHO a special scan application is useful for special tasks like OCR. The lib is good, UI must be improved here as well, the app does not look like a tool integrated in my DE.

Bye

Thorsten


By Thorsten Schnebeck at Tue, 2002/05/14 - 5:00am

I find th preview window too small to make any accurate seclections. You should be able to preview in the main window (or is it just not obvious how to do this?).


By John at Wed, 2002/05/15 - 5:00am

I like to comment some of the remarks just to explain why things are somehow
not so straightforward with scanning...
Why do you have to select a scan device? Well, you can easily have two scsi
scanners of the same vendor and model attached and they only differ in their devicefile. If you are not so keen on scanning and have just one you should
select 'Always use that scanner and do not ask again' ;-)
You are right, after starting the program, the viewer appears to be empty.
But that changes immediately after you have scanned the first image. I dislike scan programs that do not show the scan results but save them silently. And that is what the gallery is for: You do not need to enter a filename for every single test scan. Just scan on and let kooka save your tries automatically, and if your scan result is that what you want (for me usually after ten tries ;-) you simply drag it to any konqueror around...
It is true that Kooka does not provide a fax or mail function (yet). But my proposal would be that the specific applications for that should accept drops of images and send them or use the scanservice provided by libkscan. The same for pixie and friends: They do not need Kooka - they need the scanservice if they want to receive images from the scanner ;)


By Klaas Freitag at Thu, 2002/05/16 - 5:00am

I like to comment some of the remarks just to explain why things are somehow not so straightforward with scanning ;-)

Why do you have to select a scan device? Well, you can easily have two scsi
scanners of the same vendor and model attached and they only differ in their devicefile. If you are not so keen on scanning and have just one you should
select 'Always use that scanner and do not ask again' ;-)

After starting the program, the viewer appears to be empty.
But that changes immediately after you have scanned the first image. I dislike scan programs that do not show the scan results but save them silently. And that is what the gallery is for: You do not need to enter a filename for every single test scan. Just scan on and let kooka save your tries automatically, and if your scan result is that what you want (for me usually after ten tries ;-) you simply drag it to any konqueror around...

It is true that Kooka does not provide a fax or mail function (yet). But my proposal would be that the specific applications for that should accept drops of images and send them or use the scanservice provided by libkscan. The same for pixie and friends: They do not need Kooka - they need the scanservice if they want to receive images from the scanner ;)


By Klaas Freitag at Thu, 2002/05/16 - 5:00am

http://dwhs.info And what google catalog have relationship with kde?I mean that port of google suck.Like is not enough to see google in every day use.


By Luka Horvatic at Thu, 2007/02/01 - 6:00am