How I spend $500 per day because of a misconfiguration
By Dirk Hoekstra on 05 Feb, 2021
It was Friday and I closed my laptop. I had worked pretty hard on Scraperbox that week and decided it was time for some well earned weekend time.
When I logged into Google Cloud the next Monday my heart sank.
In one weekend Google Cloud had burned through €1,200 - which is roughly $1,500.
Scraperbox is bootstrapped so this was going to come out of my own pocket. And my money was already limited.
After the initial shock wore off I dove into the problem to fix it as quickly as possible. I was burning €17 an hour after all.
Running on Cloud infastructure.
ScraperBox used to have a simple MySQL database. But, as the number of requests kept growing the database was struggling to keep up.
When our API reached 200K requests per hour we stood for a choice.
- Buy more RAM for the MySQL server
- Switch over the Google Cloud Firestore.
I wanted to try out the Firestore, and on paper, it sounded amazing: an unlimited scalable database.
I started building and within a few days, ScraperBox was running on top of Google Firestore.
I wished now that I would have looked better at the pricing page of the Firestore database.
The Firestore pricing model
The pricing model is pretty simple. You pay $0.036 per 100K document reads and $0.108 per 100K document writes.
So, if you have 1M records and you do a
SELECT ALL query it will cost you
$0.036 * 10 = $0.36.
Now let's do some calculations to see how much our API would theoretically cost.
For each request, the API will check if there are concurrent requests running. The pseudo-query looks something like this.
COUNT api_requests WHERE active=true AND user=CURRENT_USER
On average this will return 5 documents.
Also, for each API request 4 writes are done:
- Create a new pending API request document.
- Set the status to active.
- Set the status to completed.
- Increment the user API request counter.
So, for each API request on average 5 read operations and 4 write operations are performed.
The million-dollar question is of course, how much does each API request cost using Google Firestore?
Cost of a single read: R = $0.00000036 Cost of a single write: W = $0.00000108 Cost of an API request: 5 * R + 4 * W = $0.00000612
That number seems pretty small. But we still have to multiply it with the 200K requests per hour coming through.
200,000 * $0.00000612 = $1.224 per hour
when I saw that number I was confused. That amounted to roughly $30 per day. That was nowhere near the $500 google was billing me.
This is where I really messed up.
I wanted to try out Google Data Studio. If you don't know it, it allows you to build cool dashboards.
There was a problem though, you couldn't select Firestore as your data source. So, I needed to sync all the data between BigQuery and the Firestore.
Me being naive and not knowing the pricing structure simply decided to do the following.
Every 5 minutes: 1. Clear BigQuery table 2. Select ALL documents from the Firestore 3. Write them to the BigQuery table.
Every 5 minutes I would read all documents from the Firestore.
Keep in mind that this collection grew with 200K documents per hour.
My costs were going up exponentially
After figuring this out I bashed my head on the table a few times.
Then, I quickly deleted the Firestore database. Spun up a $30 a month MySQL server, and it's been running withouth a problem ever since.
I thought of Google Cloud as a silver bullet. On paper, it sounded great, a super scalable database.
But I learned an expensive lesson that it's really important to choose the right technology for the job. My read and write-intensive operations weren't a good fit for the Firestore pricing model.
And to be completely fair a large part of my decision-making was influenced by the thought: "Firestore is new and cool, let's use that."
I won't be making that mistake quickly again.
That's it. I hope you enjoyed this story, happy coding! 👨💻
Dirk Hoekstra has a Computer Science and Artificial Intelligence degree and is the co-founder of Scraperbox.
He is a technical author on Medium where his articles have been read over 100,000 times.
Founder of multiple tech companies of which one was acquired in 2020.