Skip to main content

Node.js: Downloading a xml file from given url and reading its data elements.



In this article, we will see how we can download an XML file from a given URL
and then access its elements.
This can be used in many cases, for example, scraping data from a site which has data in the XML file or multiple files.
The article is pretty short and most of it is just self-explanatory code.
Also nowadays, you can get a lot of freelance jobs in data scrapping as data is the new oil today so this might be helpful.

So let's begin with the code,

var fs = require('fs')
var https = require('https');//For accessing https url we need this module instead of http.
var xml2js = require('xml2js');//Required for xml parsing.
var file_name = 'data.xml'//This will be the name of file we will be generating.
var DOWNLOAD_DIR =__dirname+'/';


//This function reads data from URL and writes data into new file
//with respect to the given name and directory path.


function download(){
 var file_url='https://www.w3schools.com/xml/note.xml'
 var file =

 fs.createWriteStream(DOWNLOAD_DIR +file_name,{'flags': 'w'});
 const request = https.get(file_url, function(response) {
 response.pipe(file);
 });
}


//This function reads data from the XML file and parses it into JSON 

//format to access its elements.

function read(){
 var fileData = fs.readFileSync(file_name, 'ascii');
 var parser = new xml2js.Parser();
 parser.parseString(fileData.substring(0, fileData.length),

 function (err, result) {
   console.log(result)//Here you will get data in json format.
 }); 
}

Note: 
1)In Node.js, __dirname is always the directory in which the currently executing script resides.
So if you typed __dirname into /A1/A2/script.js, the value would be /A1/A2.

2)The pipe() function reads data from a readable stream as it becomes available and writes it to a destination writable stream. In our code variable file is the writable stream and response is the readable stream. This is also a most asked interview question for node.js developer profile.



Comments

Post a Comment

Popular posts from this blog

Node JS:Understanding bin in package.json.

Well as a Node Js developer we know package.json as dependency file where we keep a note of all dependencies of our project. Here we will be looking at what is bin in package.json? To understand this we first need to understand command line application and it's purpose. CLI applications are mostly used to automate things such as deployments of application,running tests,building reports and the list goes on and on. So lets start with creating our first CLI application. First, let’s make sure you have the tools required. To complete this tutorial, you will need the following: 1)A recent version of Node.js downloaded and installed 2)A good text editor, such as Visual Studio Code Next, open your computer’s command prompt (Windows) or terminal (macOS/Linux). Change the current directory to the folder where you save your documents or projects. Enter the following commands to create a new project folder and initialize the project. mkdir hello-cli cd hello-cli npm init Nex

Node.js: Extract text from image using Tesseract.

In this article, we will see how to extract text from images using Tesseract . So let's start with this use-case, Suppose you have 300 screenshot images in your mobile which has an email attribute that you need for some reason like growing your network or for email marketing. To get an email from all these images manually into CSV or excel will take a lot of time. So now we will check how to automate this thing. First, you need to install Tesseract OCR( An optical character recognition engine ) pre-built binary package for a particular OS. I have tested it for Windows 10. For Windows 10, you can install  it from here. For other OS you make check  this link. So once you install Tesseract from windows setup, you also need to set path variable probably, 'C:\Program Files\Tesseract-OCR' to access it from any location. Then you need to install textract library from npm. To read the path of these 300 images we can select all images and can rename it to som

Node.js: Bundling your Node.js application to single executable for Windows.

In this article, we will see how to bundle Node.js application to a single executable for Windows. What's the need? Well recently, I had taken a work where I needed to convert pdf's(Of similar format) to excel sheet. So I was reading the pdf's from a folder in desktop and I was storing the output excel sheet into a separate folder on the desktop. I used Node.js for the program. Now the client wanted it to install the program on 25 windows machine and his budget was really low. So it was also not possible for me to install node.js for 25 machines and then install the required dependency for each one. One of the solution: While I was searching for an easy solution I found this amazing npm module pkg . This module can make your node.js app work like plug and play type. No need to install Node.js on the client machine or any other dependency.  It helps to make a commercial or trial version of your node.js application without exposing the source code. I found