Scraperbox documentation

Select your programming language

cURL

Python

NodeJS

Ruby

PHP

Quickstart

To start scraping you need to supply 2 required parameters:

api_keystring

default=""

Your API key: (Create an account to get an API key)

urlstring

default=""

The url encoded url that you want to scrape.
Url encoding example → https%3A%2F%2Fhttpbin.org%2Fanything

Example

To following example scrapes the https://httpbin.org/anything webpage and returns the HTML.

cURL

curl  -G \
  -d "api_key=YOUR_API_KEY" \
  -d "url=https%3A%2F%2Fhttpbin.org%2Fanything" \
  https://api.scraperbox.com/scrape

Chrome browser

The following parameters help you control the Chrome browser setup of your API request.

render_jsboolean

default=true

Note: API requests with Javascript enabled will cost 5 API credits.

By default API requests will use real Chrome browsers that can handle Javascript web pages. You can set render_js to false. When doing this a simple HTTP request is made without Javascript support.

wait_for_msnumber, between 0 and 35000

default=750

You can specify how long the Chrome browser should wait in milliseconds after the page has finished loading. This is useful if some elements take time before they are visible on the page for example.

wait_for_selectorstring

default=""

The browser will wait on the given CSS selector to appear on the page. For example if you want to wait on a button appearing: wait_for_selector="button.my-button"

wait_for_browser_event"domcontentloaded" | "load" | "networkidle0" | "networkidle2"

default="domcontentloaded"

The browser will wait until the given condition.

domcontentloaded wait until the html DOM is loaded and ready
load wait until the html DOM is loaded and ready and all images and frames have finished loading
networkidle0 wait until there are no active network requests
networkidle2 wait until there are at most 2 active network requests

block_adsboolean

default=false

If set to true the browser will bock advertisements. This will speed up API requests.

block_resourcesboolean

default=true

By default our API blocks images, videos, fonts and CSS stylsheets. Set block_resources to false to disable this blocking. This will slow down requests significantly.

browser_widthnumber, 1 to 3840

default=1920

Set the Chrome browser viewport width

browser_heightnumber, 1 to 2160

default=1080

Set the Chrome browser viewport height

device"desktop" | "mobile"

default="desktop"

When setting this to mobile the Chrome browser will act like it's a mobile browser. This is useful to scrape the mobile version of web pages.

Note: this works best in combination with setting reasonable browser_width and browser_height values.

Browser example

The following code navigates to https://isjsenabled.com and waits for the CSS selector div.enabled > h1 to appear. Note that we have url encoded the wait_for_selector parameter.

cURL

curl -G \
  -d "api_key=YOUR_API_KEY" \
  -d "url=https%3A%2F%2Fisjsenabled.com%2F" \
  -d "wait_for_selector=div.enabled%20%3E%20h1" \
  https://api.scraperbox.com/scrape

Proxies

The following parameters help you to control the proxy setup of your API request.

proxy_type"normal" | "premium"

default="normal"

Note: API requests with premium proxies will cost 25 API credits.

By default API requests will use our normal proxies. Some hard to scrape websites are able to detect our normal proxies though. In that case you can set the proxy_type to premium. This will use our premium residential proxies which are impossible to detect.

country_code"us" | "ca" | "nl" | ...

default=null

Note: Only available when proxy_type is premium

Select the location of the proxies. For example, if you only want proxies from Canada you can set country_code=ca . We use the ISO 3166-1 country code format.

Availble country codes

proxy_session_idnumber, 1 to 9999

default=null

If you want to keep the same IP addresses for multiple scrape requests you can supply a proxy_session_id value. This must be a number between 0 and 9999.

Proxy example

The following code uses a premium proxy from Belgium to scrape a web page. It sets the proxy_session_id parameter, so that any next requests with the same proxy_session_id will have the same IP.

cURL

curl -G \
  -d "api_key=YOUR_API_KEY" \
  -d "url=https%3A%2F%2Fapi.myip.com" \
  -d "proxy_type=premium" \
  -d "country_code=be" \
  -d "proxy_session_id=123" \
  https://api.scraperbox.com/scrape

Header forwarding

To forward headers you can add your own headers to the request with the SB- prefix. For example, to sent a custom Accept-Language header, you can add a SB-Accept-Language: klingon,elvish header to your API request.

cURL

curl -G \
  -d "api_key=YOUR_API_KEY" \
  -d "url=https%3A%2F%2Fhttpbin.org%2Fheaders" \
  -H "SB-Accept-Language: klingon,elvish" \
  https://api.scraperbox.com/scrape

Post requests

You can send POST and PUT requests to the url you want to scrape.

method"GET" | "POST" | "PUT"

default="GET"

Change this value if you want to send a POST or PUT request. For example when you want to scrape search results from a POST search result page.

bodyany

default=null

Only applicable if you send a POST or PUT request. This can be JSON data, or x-www-form-urlencoded or anything else you want.

Note: make sure you set the right SB-Content-Type header. This makes sure our API forwards your post data correctly.

Post request example

Here we send a JSON POST request. Note that we have url encoded the body parameter containing our JSON data. And, we have set the SB-Content-Type: application/json header. This way the receiving web page understands that we try to send JSON data.

cURL

curl -G \
  -d "api_key=YOUR_API_KEY" \
  -d "url=https%3A%2F%2Fhttpbin.org%2Fpost" \
  -d "method=POST" \
  -d "body=%7B%20%22search_query%22%3A%20%22what%20is%20the%20meaning%20of%20life%3F%22%20%7D" \
  -H "SB-Content-Type: application/json" \
  https://api.scraperbox.com/scrape

Other parameters

cookiesstring

default=""

Add cookies to the request, use the key=value; notation. For example to set 2 cookie values: is_new_customer=1; language=nl;

return_original_status_codeboolean

default=false

Normally when an API request fails our API returns an 500 status code. If you want to original status code to be returned set this to false

timeoutnumber 1,90000

default=30000

The amount of time a request can take before it's considered as timed out and our API will try again. Note that our API will try to get a webpage multiple times if it fails, so the total time of your request can be greater than the value set here.

API response

If succesful the API will respond with the original page HTML. It forwards the Content-Type header. This means that if the scraped web page contains html text/html content is returned. But, for example, if the scraped web page contains json application/json content is returned. This also supports images and binary data.

Validation errors

If you supply incorrect parameters you can get a JSON validation error. This will always be a response status code between 400 and 499

An example validation error response looks like this:

{
  "errors": {
    "api_key": ["Invalid api key"]
  }
}

Scrape errors

In some cases our API will be unable to retrieve the scraped web page data. In that case a 500 response code is returned, together with the error details as JSON.

{
  "browser_error": "server responded with a 402",
  "browser_status_code": 402,
  "browser_response_body": "<!doctype html><body>...</body></html>"
}

Response headers

Our API sends 2 response headers with each successful request.

SB-Cost - The credit cost of the API request
SB-Resolved-Url - The final URL of the scraped web page after all redirects have finished

Credit cost

Only successful requests will cost credits. A request is successful if one of the following status codes is returned: 2xx, 404, 410

Furthermore, depending on what API credits you supply a request can cost between 1 and 30 credits.

1 for basic API requests
+5 when render_js is enabled
+25 when proxy_type is set to premium

Contents

Select your programming language

Quickstart

Example

Chrome browser

Browser example

Proxies

Proxy example

Header forwarding

Post requests

Post request example

Other parameters

API response

Validation errors

Scrape errors

Response headers

Credit cost