Skip to main content

Node.js: Extract text from image using Tesseract.


In this article, we will see how to extract text from images using Tesseract.

So let's start with this use-case,

Suppose you have 300 screenshot images in your mobile which has an email attribute that you need for some reason like growing your network or for email marketing.
To get an email from all these images manually into CSV or excel will take a lot of time.
So now we will check how to automate this thing.


First, you need to install Tesseract OCR(An optical character recognition engine) pre-built binary package for a particular OS.
I have tested it for Windows 10.
For Windows 10, you can install it from here.
For other OS you make check this link.
So once you install Tesseract from windows setup, you also need to set path variable probably,
'C:\Program Files\Tesseract-OCR' to access it from any location.

Then you need to install textract library from npm.

To read the path of these 300 images we can select all images and can rename it to some name.
For example, we have renamed it to 'image' then there will image(1) to image(300) images,
So that we can read the image path dynamically using the loop index.

NodeJs Code:

var textract = require('textract');
var jsonexport = require('jsonexport');
const fs = require('fs');
var emailList = [];
for (let i = 1; i <= 300; i++) {
 var name = 'image(' + i + ').jpg';
 textract.fromFileWithPath(name,function (error, text) {
 console.log(text)//extracted text
//By some split logic we can get email from particular image depending upon image.
 var email = text.split("Email")[1];
 emailList.push({ Email: email });

 if (emailArray.length == 300) {
  jsonexport(emailList, function (err, csv) {
  if (err) return console.log(err);
  fs.writeFile('EmailList.csv', csv, function (err) {
  if (err) throw err;
   console.log('Congrats! Email List created for 300 emails');
   });
  })}
})}

The code is self-explanatory.
We have used jsonexport library to convert the email list to CSV format and then we have used fs.writeFile to export it to CSV file.

I hope you like this article and if any doubts please let me know in the comment section.

Subscribe this blog for more such articles.
You can also follow me on Twitter or Linkedin for the latest updates.










Comments

  1. Actually no need to rename files or any loop.
    You can use this to read files from folder.
    const testFolder = './tests/';
    const fs = require('fs');

    fs.readdirSync(testFolder).forEach(file => {
    console.log(file);
    });

    ReplyDelete

Post a Comment

Popular posts from this blog

Node JS:Understanding bin in package.json.

Well as a Node Js developer we know package.json as dependency file where we keep a note of all dependencies of our project. Here we will be looking at what is bin in package.json? To understand this we first need to understand command line application and it's purpose. CLI applications are mostly used to automate things such as deployments of application,running tests,building reports and the list goes on and on. So lets start with creating our first CLI application. First, let’s make sure you have the tools required. To complete this tutorial, you will need the following: 1)A recent version of Node.js downloaded and installed 2)A good text editor, such as Visual Studio Code Next, open your computer’s command prompt (Windows) or terminal (macOS/Linux). Change the current directory to the folder where you save your documents or projects. Enter the following commands to create a new project folder and initialize the project. mkdir hello-cli cd hello-cli npm init Nex

Node.js: create an excel file with multiple tabs.

This article is a sample code to generate an excel file with multiple tabs using  excel4node  module. Install: npm i excel4node Let's see the sample code, var excel = require( 'excel4node' ); // Create a new instance of a Workbook class var workbook = new excel.Workbook(); // Add Worksheets to the workbook var worksheet = workbook.addWorksheet( 'Buy order types' ); var worksheet1 = workbook.addWorksheet( 'Sell order types' ); // Style for headers var style = workbook.createStyle({ font : { color : '#EA3A14' , size : 18 }, numberFormat : '$#,##0.00; ($#,##0.00); -' }); var styleForData = workbook.createStyle({ font : { color : '#47180E' , size : 12 }, alignment : { wrapText : true , horizontal : 'center' , }, numberFormat : '$#,##0.00; ($#,##0.00); -' }); let buyOrderTypes = [ {name : "buy" ,id : "1" ,comment : "Normal