A no-nonsense server architecture for group based SaaS

If you’re never built a SaaS product before all the server-side stuff might seem overwhelming. How many servers do you need? Magnificent monolith or microservices? VPS, app engines, end point computing, dedicated machines? Do you need a CDN? SQL, noSQL, distributed or not, failover, multi-master or RDS? So many decisions! And if you get it wrong it’s going to cost you.

Thankfully, we’ve built SaaS products before which means we’ve got a rough idea of the kind of problems we’re likely to encounter at different levels of usage.

Our guiding principles:

Don’t overspend

The cloud is great in many ways but the virtual machines are underpowered and expensive. Cloud servers make perfect sense when you don’t want to do sysadmin work or when your load is so unpredictable you need to dynamically spin up additional servers. Neither applies for us. A single $100/month dedicated machine gets you 16 cores, 128GB ECC ram, and 8TB of enterprise NVMe SSD storage. That’s a lot of horsepower!

A single server like that can support more simultaneous users than we’re ever going to realistically have. We could probably get a server that’s one third as powerful and not notice the difference. We won’t need to heavily optimize our server code and we won’t have to worry about communication between microservices. A monolith is simple and that eliminates entire classes of bugs, so that’s what we’re going for.

Boring tech is better

We’ll use Debian stable. We don’t mind that all packages are 9 months behind other distributions. We don’t need the latest point release of bash or Python. We want stable packages and regular security updates. That’s what Debian offers.

Django + MySQL is will be the core of our backend stack. Because our SaaS app will be group based (meaning: no shared data between groups) scaling and caching will be very easy. Servers like twitter are complex because every action potentially affects any of the other billion users in the system. But when you make a service where all user groups are independent you don’t get this explosion in complexity. If we suddenly end up with millions of users (yea, right!) and there is no faster server available (vertical scaling) we can still scale horizontally if necessary. It’s just a matter of buying a bunch of servers and evenly dividing our users between them.

We have used Python/Django for years so that’s what we’re going with. We could have picked golang or any other web stack. It doesn’t matter that much.

We don’t need a cache server, a message queue, or any other services at this point.

We should get 99.99% uptime with this server architecture, simply because there is so little that can break. Provided we don’t do anything dumb :). All downtime is bad, of course, but there are diminishing returns at some point. I think 1 minute of downtime per week is acceptable.

Security Paranoia

We’ll cover this in depth in future posts. Firewall, sshd policies, full disk encryption, and intrusion monitoring are core parts. The mantra is: layers of defense. If system files get changed, we want to know about it. If a program listens to an usual port, we want to know.

We’ll do backups with rdiff-backup (local + remote). We can use it to diff what changed on a server, which means it also functions as an audit tool. What makes rdiff-backup so cool is that it computes reverse diffs, as opposed to forward diffs. This means the latest backup set consists of plain files that can easily be verified to be correct. In addition there is some compressed archival that allows you to go back in time for individual files/directories. You never have to worry about the archival system getting corrupted.

For full disk encryption we’re going to need to employ some tricks. We want to encrypt everything, including the OS. For a remote headless server that requires some ingenuity. More about this later.

Mailgun

Mailgun is what we know so that’s what we’ll use for our planning IDE Thymer. Mailgun has always had reliability issues though and deliverability is mediocre, even when using a dedicated IP address. We can always switch transactional email providers at some later point, so it’s not something for us to worry about now. It would just be a distraction.

Cloudflare

Our app server(s) will be in Europe (because of GDPR), but most of our users will be in the VS. That’s not great from a latency point of view. It takes a couple of round trips to establish an SSL connection and if you have to cross an ocean that will make your site feel sluggish.

Cloudflare has many end points globally, and we want our app to load fast. We’ll move all static content to Cloudflare and we’ll make sure to design our app so it can load as much in parallel as possible. Cloudflare also provides us with free DDoS protection and their DNS service is great. One day Cloudflare will start charging startups like ours and we’ll gladly pay. For now, the free plan suits us just fine.

We’re bootstrapping and we’re running our startup on a shoestring budget. Everything on this list is free, except for the server(s) and those are dirt cheap. It’s almost comical how easy and cheap running online services has become. Most of the hip new tech we just don’t need. Our app will be client heavy and the server will be little more than a REST API and billing logic.

Don’t overspend

Boring tech is better

Security Paranoia

Mailgun

Cloudflare

Read more