Two Problems I Had to Get Right Building a Video Compression SaaS
squeezeVid is a Django app for compressing videos. Most of it is unremarkable CRUD, but two parts had to be exactly right: spending paid quota exactly once, and talking reliably to a flaky external processing service.
# Two Problems I Had to Get Right Building a Video Compression SaaS
[squeezeVid](https://squeezevid.com) is a web app I built for compressing videos: you upload a file, it comes back smaller. It's a Django app, it's a paid product with usage quotas backed by Stripe, and most of it is unremarkable CRUD. But two parts had to be exactly right, because getting them wrong means either giving away paid work for free or shipping a product nobody can debug. Here's how I handled both.
## The shape of the system
Two design choices set up everything else. Uploads go straight to S3-compatible object storage (I use Hetzner's) via multipart upload, so large video files never have to stream through the web server. And the actual compression doesn't happen in the Django request - it runs in a separate processing service. Django's job is to dispatch a job to that service's API with the encode settings, things like the CRF quality value and the output codec (for example `libvpx-vp9` for WebM output).
## Problem 1: spending quota exactly once
Quota is paid for up front, which means the cardinal sin is letting someone spend quota they don't have - or spend the same unit twice. The naive version ("check quota, then create the job") has a race: two requests arriving at once can both pass the check before either one records its consumption.
I close that race with a database transaction. When a job comes in, I reserve quota inside a single transaction: I lock the relevant quota row, and *in the same transaction* verify there's no other in-flight job already consuming it, before creating the job. The row lock is the whole trick - it forces concurrent requests to take turns, so the second one sees the first one's reservation instead of racing past it.
## Problem 2: talking to a flaky external service
Calls to the processing service fail in two completely different ways, and treating them the same is a mistake.
- **Transient failures** - timeouts, dropped connections, upstream 5xx - get retried with a short backoff over a couple of attempts. These are worth retrying; they often succeed the second time.
- **Deterministic rejections** - 4xx - are *not* retried. The server has already decided the request is invalid; retrying it just wastes time and floods the logs with the same rejection.
The other half of resilience is being able to debug it. On any non-2xx response, I capture the response body in the error, not just the status line. I learned this the hard way: I was originally logging only the bare status code, which made the upstream 400s impossible to diagnose. Including the body is what turned "something returned a 400" into an actual reason I could act on.
## The lesson
Neither of these is exotic - a row lock and a retry policy. But they're the difference between a billing system you can trust and a support queue full of mysteries. The boring, careful version is the one that lets you sleep.
If you're building usage-based billing or wiring a Django app up to a flaky external service, I'm happy to compare approaches. You can try the product at [squeezevid.com](https://squeezevid.com), or reach out via my portfolio at [tahayusufkomur.me](https://tahayusufkomur.me).