Select your programming language

cURL
Python
NodeJS
Ruby
PHP

Quickstart

To start scraping you need to supply 2 required parameters:

api_keystring
default=""
urlstring
default=""
The url encoded url that you want to scrape.
https%3A%2F%2Fhttpbin.org%2Fanything

Example

To following example scrapes the https://httpbin.org/anything webpage and returns the HTML.

cURL
curl  -G \
  -d "api_key=YOUR_API_KEY" \
  -d "url=https%3A%2F%2Fhttpbin.org%2Fanything" \
  https://api.scraperbox.com/scrape

Chrome browser

The following parameters help you control the Chrome browser setup of your API request.

render_jsboolean
default=true
Note: API requests with Javascript enabled will cost 5 API credits.
By default API requests will use real Chrome browsers that can handle Javascript web pages. You can set render_js to false. When doing this a simple HTTP request is made without Javascript support.
wait_for_msnumber, between 0 and 35000
default=0
You can specify how long the Chrome browser should wait in milliseconds after the page has finished loading. This is useful if some elements take time before they are visible on the page for example.
wait_for_selectorstring
default=""
The browser will wait on the given CSS selector to appear on the page. For example if you want to wait on a button appearing: wait_for_selector="button.my-button"
wait_for_browser_event"domcontentloaded" | "load" | "networkidle0" | "networkidle2"
default="domcontentloaded"
The browser will wait until the given condition.
  • domcontentloaded wait until the html DOM is loaded and ready
  • load wait until the html DOM is loaded and ready and all images and frames have finished loading
  • networkidle0 wait until there are no active network requests
  • networkidle2 wait until there are at most 2 active network requests
block_adsboolean
default=false
If set to true the browser will bock advertisements. This will speed up API requests.
block_resourcesboolean
default=true
By default our API blocks images, videos, fonts and CSS stylsheets. Set block_resources to false to disable this blocking. This will slow down requests significantly.
browser_widthnumber, 1 to 3840
default=1920
Set the Chrome browser viewport width
browser_heightnumber, 1 to 2160
default=1080
Set the Chrome browser viewport height
device"desktop" | "mobile"
default="desktop"
When setting this to mobile the Chrome browser will act like it's a mobile browser. This is useful to scrape the mobile version of web pages.
Note: this works best in combination with setting reasonable browser_width and browser_height values.

Browser example

The following code navigates to https://isjsenabled.com and waits for the CSS selector div.enabled > h1 to appear. Note that we have url encoded the wait_for_selector parameter.

cURL
curl -G \
  -d "api_key=YOUR_API_KEY" \
  -d "url=https%3A%2F%2Fisjsenabled.com%2F" \
  -d "wait_for_selector=div.enabled%20%3E%20h1" \
  https://api.scraperbox.com/scrape

Proxies

The following parameters help you to control the proxy setup of your API request.

proxy_type"normal" | "premium"
default="normal"
Note: API requests with premium proxies will cost 25 API credits.
By default API requests will use our normal proxies. Some hard to scrape websites are able to detect our normal proxies though. In that case you can set the proxy_type to premium. This will use our premium residential proxies which are impossible to detect.
country_code"us" | "ca" | "nl" | ...
default=null
Note: Only available when proxy_type is premium
Select the location of the proxies. For example, if you only want proxies from Canada you can set country_code=ca . We use the ISO 3166-1 country code format.
Availble country codes
proxy_session_idnumber, 1 to 9999
default=null
If you want to keep the same IP addresses for multiple scrape requests you can supply a proxy_session_id value. This must be a number between 0 and 9999.

Proxy example

The following code uses a premium proxy from Belgium to scrape a web page. It sets the proxy_session_id parameter, so that any next requests with the same proxy_session_id will have the same IP.

cURL
curl -G \
  -d "api_key=YOUR_API_KEY" \
  -d "url=https%3A%2F%2Fapi.myip.com" \
  -d "proxy_type=premium" \
  -d "country_code=be" \
  -d "proxy_session_id=123" \
  https://api.scraperbox.com/scrape

Header forwarding

To forward headers you can add your own headers to the request with the SB- prefix. For example, to sent a custom Accept-Language header, you can add a SB-Accept-Language: klingon,elvish header to your API request.

cURL
curl -G \
  -d "api_key=YOUR_API_KEY" \
  -d "url=https%3A%2F%2Fhttpbin.org%2Fheaders" \
  -H "SB-Accept-Language: klingon,elvish" \
  https://api.scraperbox.com/scrape

Post requests

You can send POST and PUT requests to the url you want to scrape.

method"GET" | "POST" | "PUT"
default="GET"
Change this value if you want to send a POST or PUT request. For example when you want to scrape search results from a POST search result page.
bodyany
default=null
Only applicable if you send a POST or PUT request. This can be JSON data, or x-www-form-urlencoded or anything else you want.
Note: make sure you set the right SB-Content-Type header. This makes sure our API forwards your post data correctly.

Post request example

Here we send a JSON POST request. Note that we have url encoded the body parameter containing our JSON data. And, we have set the SB-Content-Type: application/json header. This way the receiving web page understands that we try to send JSON data.

cURL
curl -G \
  -d "api_key=YOUR_API_KEY" \
  -d "url=https%3A%2F%2Fhttpbin.org%2Fpost" \
  -d "method=POST" \
  -d "body=%7B%20%22search_query%22%3A%20%22what%20is%20the%20meaning%20of%20life%3F%22%20%7D" \
  -H "SB-Content-Type: application/json" \
  https://api.scraperbox.com/scrape

Other parameters

cookiesstring
default=""
Add cookies to the request, use the key=value; notation. For example to set 2 cookie values: is_new_customer=1; language=nl;
return_original_status_codeboolean
default=false
Normally when an API request fails our API returns an 500 status code. If you want to original status code to be returned set this to false
timeoutnumber 1,90000
default=30000
The amount of time a request can take before it's considered as timed out and our API will try again. Note that our API will try to get a webpage multiple times if it fails, so the total time of your request can be greater than the value set here.

API response

If succesful the API will respond with the original page HTML. It forwards the Content-Type header. This means that if the scraped web page contains html text/html content is returned. But, for example, if the scraped web page contains json application/json content is returned. This also supports images and binary data.

Validation errors

If you supply incorrect parameters you can get a JSON validation error. This will always be a response status code between 400 and 499

An example validation error response looks like this:

{
  "errors": {
    "api_key": ["Invalid api key"]
  }
}

Scrape errors

In some cases our API will be unable to retrieve the scraped web page data. In that case a 500 response code is returned, together with the error details as JSON.

{
  "browser_error": "server responded with a 402",
  "browser_status_code": 402,
  "browser_response_body": "<!doctype html><body>...</body></html>"
}

Response headers

Our API sends 2 response headers with each successful request.

  • SB-Cost - The credit cost of the API request
  • SB-Resolved-Url - The final URL of the scraped web page after all redirects have finished

Credit cost

Only successful requests will cost credits. A request is successful if one of the following status codes is returned: 2xx, 404, 410

Furthermore, depending on what API credits you supply a request can cost between 1 and 30 credits.

  • 1 for basic API requests
  • +5 when render_js is enabled
  • +25 when proxy_type is set to premium