HTML API

Not sure where to begin?
Read our getting started guide here →

1. Quickstart

The API endpoint is https://api.scraperbox.com/scrape
You can send a GET or POST request.

Note that you should replace YOUR_API_TOKEN with your API token.

2. Authentication

Add the token field to the request to authenticate yourself.

https://api.scraperbox.com/scrape?token=YOUR_API_TOKEN

Alternatively you can set your token in the Authorization header.

Authorization: YOUR_API_TOKEN

3. Javascript Rendering

Default value javascript_enabled=false

A lot of websites require Javascript to display their data correctly.

To enable Javascript rendering set the javascript_enabled field to true

An API call with Javascript enabled will cost 5 credits.

4. Proxies

Each API requests uses our proxy network by default. Geotargetting is not possible with our normal proxy network.

5. Residential Proxies

Default value residential_proxy=false

We also offer residential proxies. A residential proxy allows the API to connect to a real internet connection. With this you can counter almost all IP bot detection systems.

To enable residential proxies set the residential_proxy field to true

An API call with residential proxies enabled will cost 10 credits.

You can also set the proxy_location parameter when using residential proxies.

Currently we support the following values for proxy_location

proxy_location will only work with residential_proxy=true

6. Additional Parameters

6.1 Custom Headers

All headers send to our API with a SB- prefix will be forwarded to the target website.

You will have to add the SB- prefix so that the API recognizes what headers to forward. The API will remove the prefix before forwarding the header.

For example, if you only speak klingon, you can set the SB-Accept-Langauge header to klingon like this.

6.2 Post Requests

By default the API will send a GET request to the target website. You can send a post request by doing the following:

1. Set the method=POST parameter.

2. Add the POST body to the post_body parameter.

3. Set the correct content-type header SB-Content-Type: application/x-www-form-urlencoded for example.

This shows a complete example of a JSON post request.

6.3 Sticky Sessions

Default value sticky_session_id=null

You can supply a random sticky session id to keep using the same proxy.

The session id can be any alphanumeric string. For example mysessionid, or rand0mstr1ng are valid session ids.

This feature only works with residential proxies.

6.4 Resource Blocking

Default value types_to_block=['font', 'image', 'media']

By default the browser will not download images and fonts. This significantly speeds up requests.

The resource types that you can block: font, image, stylesheet, script, media, other

When sending a POST request you can send the array of resource types:

"types_to_block": ["font","image"]

When sending a GET request you can send multiple values like this.

?types_to_block=font&types_to_block=media&types_to_block=script

This feature only works when javascript_enabled=true

6.5 CSS Selectors

Default value css_selectors=[]

You can extract data from the web page with css selectors. For example to extact all h1 and h2 tags you can supply the following.

css_selectors=["body h1", "body h2"]

When this parameter is provided the API wil return a JSON response with the following format.

{
    "css_selectors": [
        // First item in the array matches the first css_selectors value.
        // "body h1" in this case.
        [
            "The main Title",
            "Another important title"
        ],

        // This matches the second css_selectors value.
        // "body h2" in this case.
        [
            "My h2 title",
            "My other h2 title"
        ]
    ]
}

When sending a GET request you can supply multiple css selectors like this.
Be sure to URL encode the values!

css_selectors=h1&css_selectors=h2

7. Response Headers

With each request the following response headers are sent.

Header Description
X-Request-Cost The cost of the request.
X-Credits-Remaining Your remaining credits this period.
X-Final-Url The final URL of the scraped webpage. This is useful for detecting redirects.
X-Solved-Captcha If no captcha was solved this will be false. Otherwise it contains the type of captcha that was solved, for example: recaptcha

8. Errors

If the API encounters a 4xx or 5xx status on the website it will fail along with a JSON error message.

These failed requests will not cost any credits

For example, if the API encounters a 403 response code it will return the following.

{
    "errors": {
        "response": `The website returned a 403 status code.
            You can try again with:
            1) javascript_enabled=true
            2) residential_proxy=true
            3) check out our troubleshooting guide:
                https://scraperbox.com/blog/troubleshooting-guide`
    }
}