With web scraping, we can automatically extract data from websites!
It is used a lot in data science to acquire data from public websites that don't have an API.
In this article, I'm going to create a simple Wikipedia scraper in NodeJS.
Let's dive in!
The first step is to Fetch the HTML Code from the webpage. We can then extract the data we want from this HTML.
Let's start by extracting some interesting data from Wikipedia using NodeJS.
If you haven't done it already install the LTS version of NodeJS.
If you are on a mac you can use homebrew: brew install node.
For starters, I create a file called scraper.js
with the following code.
console.log('hello world')
Then I use the following command to run the program:
node scraper.js
It's a bit embarrassing, but I like Kevin Bacon.
So, because we need something to scrape, let's scrape some data from his Wikipedia article here.
To fetch the data we are going to use the axios
npm package.
npm i axios
In your scraper.js
file replace the console.log(...)
with this snippet.
const axios = require('axios')
axios
.get('https://en.wikipedia.org/wiki/Kevin_Bacon')
.then(({ data: html }) => {
console.log(html)
})
And to run it using node:
node scraper.js
This will output a huge chunk of HTML, let's extract some useful data from it!
To parse the HTML we are going to use the JSDom
package.
First install JSDom using npm
npm i jsdom
Next, edit scraper.js
to add JSDom.
const axios = require('axios')
const { JSDOM } = require('jsdom')
axios
.get('https://en.wikipedia.org/wiki/Kevin_Bacon')
.then(({ data: html }) => {
const { document } = new JSDOM(html).window
const nickname = document.querySelector('.nickname')
if (nickname) console.log(nickname.textContent)
})
.catch(e => {
console.log(e)
})
The magic happens in document.querySelector('.nickname')
.
This function searches through the html and finds the first element that contains the class nickname
.
With the chrome devtools you can inspect webpages and find classnames of the data you want to scrape.
Alright, we've built a simple web scraper getting Kevin Bacon's nickname from Wikipedia.
And now we will know instantly when his nickname changes, because that is super important... right?
Happy coding!
Start now with 500 free API credits, no creditcard required.
Try Scraperbox for free