Saturday, February 15, 2014

Optical Character Recognition with Nodejs

Today, I was prototyping a OCR tool to use as a web based API. My first intention was to develop a desktop version in python and provide it via flask application, but using node proved to be a lot easier.
Node.js has a library binding with Tesseract which proved to be quite handy.
I simply installed the library first using npm

npm install nodecr

Next, in a simple node application, I processed a user uploaded image:

ncr.process(filePath, function(error, text){ ....

This callback function performed the task of parsing the image and providing the text.

I have uploaded it into a generic application at https://github.com/SumitBisht/node-ocr and hope you will find it helpful.
Note that this is a really dumb form of OCR and the image sanitation needs to be provided first into it, on which I am working upon.

No comments: