Not sure where to begin?
Read our getting started guide here →

Documentation

Use the ScraperBox API to scrape web pages.

Getting Started

To scrape a webpage send a GET or POST request to the following endpoint.

https://scraperbox.com/api/scrape
    ?token=YOUR_API_TOKEN
    &url=github.com

This will return the HTML content of github.com.
Note that you should replace YOUR_API_TOKEN with your API token.

Additional Parameters

Parameter Type Default value Description
URL string '' The URL of the webpage you want to scrape.
javascript_enabled boolean false If javascript is enabled Javascript web pages will be rendered, we use undetectable real Chrome browsers for this.
proxy_location string '' The 2 letter country code of where you want your request to be sent from. Example use US to send a request from the US.
residential_proxy boolean false If true premium residential proxies will be used. These are truly undetectable in combination with Chrome browsers.
keep_headers boolean false If true any header sent through the request will be kept.
sticky_session_id string '' Use an unique sticky session id to keep sending requests through the same proxy. This will only work with premium residential proxies
Example: mysession1

Javascript Pages

If you want to scrape a website using a real chrome browser you can add javascript_enabled=true to your request. For example this is how you can scrape Amazon using a real web scraper.

https://scraperbox.com/api/scrape
    ?token=YOUR_API_TOKEN
    &url=amazon.com
    &javascript_enabled=true

Proxy locations

If you want to scrape from a specific country you an add the proxy_location to your request. Currently we support the following locations.

Country Letter code
Germany DE
Finland FI
France FR
UnitedKingdom GB
Hungary HU
Ireland IE
Israel IL
Italy IT
Netherlands NL
Poland PL
Serbia RS
Russia RU
UnitedStates US

Error Messages

Sometimes the API might fail, if that is the case a 4xx or 5xx response code is returned. An error message will explain what went wrong.

{
    "errors": {
        "general": [
            "The request timed out"
        ]
    }
}

Code Examples