How I spend $500 per day because of a misconfiguration

Dirk author image By on 05 Feb, 2021


It was Friday and I closed my laptop. I had worked pretty hard on Scraperbox that week and decided it was time for some well earned weekend time.

When I logged into Google Cloud the next Monday my heart sank.

Billing cost

In one weekend Google Cloud had burned through €1,200 - which is roughly $1,500.

Scraperbox is bootstrapped so this was going to come out of my own pocket. And my money was already limited.

After the initial shock wore off I dove into the problem to fix it as quickly as possible. I was burning €17 an hour after all.

Clouds

Running on Cloud infastructure.

What happened?

ScraperBox used to have a simple MySQL database. But, as the number of requests kept growing the database was struggling to keep up.

When our API reached 200K requests per hour we stood for a choice.

  1. Buy more RAM for the MySQL server
  2. Switch over the Google Cloud Firestore.

I wanted to try out the Firestore, and on paper, it sounded amazing: an unlimited scalable database.

I started building and within a few days, ScraperBox was running on top of Google Firestore.

I wished now that I would have looked better at the pricing page of the Firestore database.

The Firestore pricing model

The pricing model is pretty simple. You pay $0.036 per 100K document reads and $0.108 per 100K document writes.

So, if you have 1M records and you do a SELECT ALL query it will cost you $0.036 * 10 = $0.36.

Now let's do some calculations to see how much our API would theoretically cost.

For each request, the API will check if there are concurrent requests running. The pseudo-query looks something like this.

COUNT api_requests WHERE active=true AND user=CURRENT_USER

On average this will return 5 documents.

Also, for each API request 4 writes are done:

  1. Create a new pending API request document.
  2. Set the status to active.
  3. Set the status to completed.
  4. Increment the user API request counter.

So, for each API request on average 5 read operations and 4 write operations are performed.

The million-dollar question is of course, how much does each API request cost using Google Firestore?

Cost of a single read:  R = $0.00000036
Cost of a single write: W = $0.00000108

Cost of an API request: 
5 * R + 4 * W = $0.00000612

That number seems pretty small. But we still have to multiply it with the 200K requests per hour coming through.

200,000 * $0.00000612 = $1.224 per hour

when I saw that number I was confused. That amounted to roughly $30 per day. That was nowhere near the $500 google was billing me.

The "misconfiguration"

This is where I really messed up.

I wanted to try out Google Data Studio. If you don't know it, it allows you to build cool dashboards.

There was a problem though, you couldn't select Firestore as your data source. So, I needed to sync all the data between BigQuery and the Firestore.

Me being naive and not knowing the pricing structure simply decided to do the following.

Every 5 minutes:
  1. Clear BigQuery table
  2. Select ALL documents from the Firestore
  3. Write them to the BigQuery table.

Every 5 minutes I would read all documents from the Firestore.

Keep in mind that this collection grew with 200K documents per hour.

Cost graph My costs were going up exponentially

After figuring this out I bashed my head on the table a few times.

Then, I quickly deleted the Firestore database. Spun up a $30 a month MySQL server, and it's been running withouth a problem ever since.

The takeaway

I thought of Google Cloud as a silver bullet. On paper, it sounded great, a super scalable database.

But I learned an expensive lesson that it's really important to choose the right technology for the job. My read and write-intensive operations weren't a good fit for the Firestore pricing model.

And to be completely fair a large part of my decision-making was influenced by the thought: "Firestore is new and cool, let's use that."

I won't be making that mistake quickly again.

That's it. I hope you enjoyed this story, happy coding! 👨‍💻


Dirk author image Dirk Hoekstra has a Computer Science and Artificial Intelligence degree and is the co-founder of Scraperbox. He is a technical author on Medium where his articles have been read over 100,000 times. Founder of multiple tech companies of which one was acquired in 2020.