Select your programming language
Quickstart
To start scraping you need to supply 2 required parameters:
Example
To following example scrapes the https://httpbin.org/anything webpage and returns the HTML.
curl -G \
-d "api_key=YOUR_API_KEY" \
-d "url=https%3A%2F%2Fhttpbin.org%2Fanything" \
https://api.scraperbox.com/scrape
Chrome browser
The following parameters help you control the Chrome browser setup of your API request.
render_jsboolean
default=trueNote: API requests with Javascript enabled will cost 5 API credits.
By default API requests will use real Chrome browsers that can handle Javascript web pages. You can set
render_js to
false. When doing this a simple HTTP request is made without Javascript support.
wait_for_msnumber, between 0 and 35000
default=0 You can specify how long the Chrome browser should wait in milliseconds after the page has finished loading. This is useful if some elements take time before they are visible on the page for example.
wait_for_selectorstring
default="" The browser will wait on the given CSS selector to appear on the page. For example if you want to wait on a button appearing: wait_for_selector="button.my-button"
wait_for_browser_event"domcontentloaded" | "load" | "networkidle0" | "networkidle2"
default="domcontentloaded" The browser will wait until the given condition.
- domcontentloaded wait until the html DOM is loaded and ready
- load wait until the html DOM is loaded and ready and all images and frames have finished loading
- networkidle0 wait until there are no active network requests
- networkidle2 wait until there are at most 2 active network requests
block_adsboolean
default=false If set to true the browser will bock advertisements. This will speed up API requests.
block_resourcesboolean
default=true By default our API blocks images, videos, fonts and CSS stylsheets. Set block_resources to false to disable this blocking. This will slow down requests significantly.
browser_widthnumber, 1 to 3840
default=1920 Set the Chrome browser viewport width
browser_heightnumber, 1 to 2160
default=1080 Set the Chrome browser viewport height
device"desktop" | "mobile"
default="desktop" When setting this to
mobile the Chrome browser will act like it's a mobile browser. This is useful to scrape the mobile version of web pages.
Note: this works best in combination with setting reasonable browser_width and browser_height values.
Browser example
The following code navigates to https://isjsenabled.com and waits for the CSS selector div.enabled > h1 to appear. Note that we have url encoded the wait_for_selector parameter.
curl -G \
-d "api_key=YOUR_API_KEY" \
-d "url=https%3A%2F%2Fisjsenabled.com%2F" \
-d "wait_for_selector=div.enabled%20%3E%20h1" \
https://api.scraperbox.com/scrape
Proxies
The following parameters help you to control the proxy setup of your API request.
proxy_type"normal" | "premium"
default="normal"Note: API requests with premium proxies will cost 25 API credits.
By default API requests will use our
normal proxies. Some hard to scrape websites are able to detect our normal proxies though. In that case you can set the
proxy_type to
premium. This will use our premium residential proxies which are impossible to detect.
country_code"us" | "ca" | "nl" | ...
default=nullNote: Only available when proxy_type is premium
Select the location of the proxies. For example, if you only want proxies from Canada you can set
country_code=ca . We use the ISO 3166-1 country code format.
proxy_session_idnumber, 1 to 9999
default=null If you want to keep the same IP addresses for multiple scrape requests you can supply a proxy_session_id value. This must be a number between 0 and 9999.
Proxy example
The following code uses a premium proxy from Belgium to scrape a web page. It sets the proxy_session_id parameter, so that any next requests with the same proxy_session_id will have the same IP.
curl -G \
-d "api_key=YOUR_API_KEY" \
-d "url=https%3A%2F%2Fapi.myip.com" \
-d "proxy_type=premium" \
-d "country_code=be" \
-d "proxy_session_id=123" \
https://api.scraperbox.com/scrape
To forward headers you can add your own headers to the request with the SB- prefix. For example, to sent a custom Accept-Language header, you can add a SB-Accept-Language: klingon,elvish header to your API request.
curl -G \
-d "api_key=YOUR_API_KEY" \
-d "url=https%3A%2F%2Fhttpbin.org%2Fheaders" \
-H "SB-Accept-Language: klingon,elvish" \
https://api.scraperbox.com/scrape
You can send POST and PUT requests to the url you want to scrape.
method"GET" | "POST" | "PUT"
default="GET" Change this value if you want to send a POST or PUT request. For example when you want to scrape search results from a POST search result page.
Only applicable if you send a
POST or
PUT request. This can be
JSON data, or
x-www-form-urlencoded or anything else you want.
Note: make sure you set the right SB-Content-Type header. This makes sure our API forwards your post data correctly.
Post request example
Here we send a JSON POST request. Note that we have url encoded the body parameter containing our JSON data. And, we have set the SB-Content-Type: application/json header. This way the receiving web page understands that we try to send JSON data.
curl -G \
-d "api_key=YOUR_API_KEY" \
-d "url=https%3A%2F%2Fhttpbin.org%2Fpost" \
-d "method=POST" \
-d "body=%7B%20%22search_query%22%3A%20%22what%20is%20the%20meaning%20of%20life%3F%22%20%7D" \
-H "SB-Content-Type: application/json" \
https://api.scraperbox.com/scrape
Other parameters
Add cookies to the request, use the key=value; notation. For example to set 2 cookie values: is_new_customer=1; language=nl;
return_original_status_codeboolean
default=false Normally when an API request fails our API returns an 500 status code. If you want to original status code to be returned set this to false
timeoutnumber 1,90000
default=30000 The amount of time a request can take before it's considered as timed out and our API will try again. Note that our API will try to get a webpage multiple times if it fails, so the total time of your request can be greater than the value set here.
API response
If succesful the API will respond with the original page HTML. It forwards the Content-Type header. This means that if the scraped web page contains html text/html content is returned. But, for example, if the scraped web page contains json application/json content is returned. This also supports images and binary data.
Validation errors
If you supply incorrect parameters you can get a JSON validation error. This will always be a response status code between 400 and 499
An example validation error response looks like this:
{
"errors": {
"api_key": ["Invalid api key"]
}
}
Scrape errors
In some cases our API will be unable to retrieve the scraped web page data. In that case a 500 response code is returned, together with the error details as JSON.
{
"browser_error": "server responded with a 402",
"browser_status_code": 402,
"browser_response_body": "<!doctype html><body>...</body></html>"
}
Response headers
Our API sends 2 response headers with each successful request.
- SB-Cost - The credit cost of the API request
- SB-Resolved-Url - The final URL of the scraped web page after all redirects have finished
Credit cost
Only successful requests will cost credits. A request is successful if one of the following status codes is returned: 2xx, 404, 410
Furthermore, depending on what API credits you supply a request can cost between 1 and 30 credits.
- 1 for basic API requests
- +5 when render_js is enabled
- +25 when proxy_type is set to premium