HTML API
Read our getting started guide here →
1. Quickstart
The API endpoint is https://api.scraperbox.com/scrape
You can send a GET
or POST
request.
Note that you should replace YOUR_API_TOKEN
with your API token.
2. Authentication
Add the token
field to the request to authenticate yourself.
https://api.scraperbox.com/scrape?token=YOUR_API_TOKEN
Alternatively you can set your token in the Authorization
header.
Authorization: YOUR_API_TOKEN
3. Javascript Rendering
Default value
javascript_enabled=false
A lot of websites require Javascript to display their data correctly.
To enable Javascript rendering set the javascript_enabled
field to true
An API call with Javascript enabled will cost 5 credits.
4. Proxies
Each API requests uses our proxy network by default. Geotargetting is not possible with our normal proxy network.
5. Residential Proxies
Default value
residential_proxy=false
We also offer residential proxies. A residential proxy allows the API to connect to a real internet connection. With this you can counter almost all IP bot detection systems.
To enable residential proxies set the residential_proxy
field to true
An API call with residential proxies enabled will cost 10 credits.
You can also set the proxy_location
parameter when using residential proxies.
Currently we support the following values for proxy_location
proxy_location
will only work with residential_proxy=true
6. Additional Parameters
6.1 Custom Headers
All headers send to our API with a SB-
prefix will be forwarded to the target website.
You will have to add the SB-
prefix so that the API recognizes what headers to forward.
The API will remove the prefix before forwarding the header.
For example, if you only speak klingon, you can set the SB-Accept-Langauge
header to klingon
like this.
6.2 Post Requests
By default the API will send a GET
request to the target website.
You can send a post request by doing the following:
1.
Set the method=POST
parameter.
2.
Add the POST body to the post_body
parameter.
3.
Set the correct content-type header SB-Content-Type: application/x-www-form-urlencoded
for example.
This shows a complete example of a JSON post request.
6.3 Sticky Sessions
Default value sticky_session_id=null
You can supply a random sticky session id to keep using the same proxy.
The session id can be any alphanumeric string. For example mysessionid
, or rand0mstr1ng
are valid session ids.
This feature only works with residential proxies.
6.4 Resource Blocking
Default value
types_to_block=['font', 'image', 'media']
By default the browser will not download images and fonts. This significantly speeds up requests.
The resource types that you can block:
font, image, stylesheet, script, media, other
When sending a POST
request you can send the array of resource types:
"types_to_block": ["font","image"]
When sending a GET
request you can send multiple values like this.
?types_to_block=font&types_to_block=media&types_to_block=script
This feature only works when javascript_enabled=true
6.5 CSS Selectors
Default value
css_selectors=[]
You can extract data from the web page with css selectors. For example to extact all h1
and h2
tags you can supply the following.
css_selectors=["body h1", "body h2"]
When this parameter is provided the API wil return a JSON response with the following format.
{
"css_selectors": [
// First item in the array matches the first css_selectors value.
// "body h1" in this case.
[
"The main Title",
"Another important title"
],
// This matches the second css_selectors value.
// "body h2" in this case.
[
"My h2 title",
"My other h2 title"
]
]
}
When sending a GET
request you can supply multiple css selectors like this.
Be sure to URL encode the values!
css_selectors=h1&css_selectors=h2
7. Response Headers
With each request the following response headers are sent.
Header | Description |
---|---|
X-Request-Cost |
The cost of the request. |
X-Credits-Remaining |
Your remaining credits this period. |
X-Final-Url |
The final URL of the scraped webpage. This is useful for detecting redirects. |
X-Solved-Captcha |
If no captcha was solved this will be false . Otherwise it contains the type of captcha that was solved, for example: recaptcha |
8. Errors
If the API encounters a 4xx
or 5xx
status on the website it will fail along with a JSON error message.
These failed requests will not cost any credits
For example, if the API encounters a 403
response code it will return the following.
{
"errors": {
"response": `The website returned a 403 status code.
You can try again with:
1) javascript_enabled=true
2) residential_proxy=true
3) check out our troubleshooting guide:
https://scraperbox.com/blog/troubleshooting-guide`
}
}