Some database design strategies

Friday Wim wrote about Data and Event Handling on the client side of Thymer.com. Today is about our approach to data storage on the server side. All storage, with very few exceptions, will go directly into a SQL database. MySQL is what we know best, so that’s what we’ll use.

Almost all of our database tables will have at least these columns:

  • “id” (unique auto increment primary key),
  • “guid” (unique, user facing),
  • “group-id” (integer foreign key),
  • “created-at” (datetime),
  • “updated-at” (datetime),
  • “deleted-at” (datetime, nullable),
  • “destroyed-at” (datetime, nullable),
  • “extra_json” (plain text)

I’ll explain briefly what our reasoning is for having these columns.

GUID with unique index + auto-increment int primary key

Having auto-increment primary keys for internal use is convenient, but exposing these keys to the outside world isn’t great. It’s better to use GUIDs in the (JSON) API. Integer primary keys are considerably faster, though, so you can use use both. That’s what we’re doing. A side benefit is that you can use the GUIDs for backup and export/import purposes. You can’t easily import data with auto-increment keys, because you’ll get key collisions, but you don’t have this difficulty when you use GUIDs. They don’t collide by design. If you ever have to shard your service, or want to move data between SQL servers for another reason having GUIDs to work with will save you a great deal of time.

It also makes for much friendlier APIs. You can prefix the GUIDs with the class table it corresponds to. If the wrong kind of GUID is passed to the API, for instance “user_56b2c555-1ccf-4b9a” instead of “project_854a3923-0be9-4106” you can generate a friendly error message that the API function expected a Project instance instead of a User. If you just pass a generic integer you have to return a confusing error message about insufficient permissions because you can’t determine what the API users is trying to do.

Redundant group-id columns

Suppose your product allows groups of users to collaborate and you don’t want to accidently mix up information between groups. You don’t want to put yourself in a situation where users see other people’s data just because you forget a “WHERE” clause in an SQL statement somewhere. Mistakes are unavoidable, but some mistakes are unacceptable and we plan to design our system so the worst kind of mistakes can’t happen. If all tables have a group column (or equivalent) then you can enforce through your ORM that all queries are automatically filtered for access. You can make all database queries go through a single point that ensures a user of Group A can only see data from Group A.

A side benefit of storing redundant group information is that it makes for very easy clean up when people want to close their account. You just go through all database tables and you drop all records for that user/group. If you have to infer which data belongs to which user through foreign keys you can create a difficult situation for yourself. Especially when you need to delete a lot of data you want to be able to do so incrementally. You don’t want your service to become slow (or crash altogether) because a background task locks your database tables.

Multiple datetime columns

Created-at and updated-at speak for themselves. These are timestamps you want to display to the user in some fashion, and they’re too useful not to have. Deleted-at is for soft deletions (tombstones). Sometimes deletion is expensive and you want to just flag many items as deleted first and clean up asynchronously. When the data has actually been destroyed you can use ‘destroyed-at‘. You could delete the actual row from the database, but foreign keys could prevent that. In addition, if you want to show good error messages to users (“this item has been deleted”) it’s helps to keep the row around. Of course, you only need the metadata for this. The columns that contain actual user data you can overwrite when the user wants the data deleted.

If you provide some kind of full text search it helps to have an indexed-at column. You can use this to trigger re-indexing. You can use datetime columns like this for any kind of asynchronous update you want to run. Just select all rows where ‘indexed_at < updated_at‘ and do the necessary processing. Want to re-index a specific group? Just backdate the updated_at column and a background daemon will jump into action. Want to re-index 10 rows that didn’t get processed correctly the first time? It’s trivial. This is much less fragile than having some kind of message queue where you enqueue rows for additional work. A message queue can restart or get out of sync with the main SQL database really easily, but a simple datetime column works every time.

Append-only database tables

Dealing with immutable (meaning: unchanging) state is easy, and gives you version information for free. Being able to show users how their data has changed over time is a useful feature to add and it doesn’t cost anything. It’s just a matter of selecting the most recent row to get the most current data.

Storage is cheap and database engines can deal with large tables just fine. You can always fix performance problems when (and if!) they happen.

A surprise benefit is that caches automatically get invalidated if you use append-only strategies. You can’t have stale cashes for data that doesn’t exist yet. That’s a nice thing to get for free. (Assuming your app is read-heavy. For write-heavy apps this might actually be a disadvantage, because your cashes get invalidated prematurely.)

Multiple queries instead instead of complex queries

When you use an ORM to build your database queries you’ll get pretty naive queries by default. The ORM will happily select much more data than needed, or do thousands of small queries when one will do (see: N+1 problem). When you save objects most ORMs save the entire object back into the database, and not just the single property you’ve changed. ORMs aren’t sophisticated tools by any stretch of the imagination.

ORMs are really great at generating the most basic SQL queries, and we’ve learned that if you can rework your problem as a sequence of very basic queries you don’t end up fighting your ORM. I’ll give an example.

Suppose you want to generate a list of the 20 most popular books written by any of the top 3 most most popular authors. In Django ORM pseudocode:

popular_authors_ids = Books.sort(most_popular).limit(3).author_ids_only()
books_by_popular_authors = Books.filter(author__id__in=popular_authors_ids).sort(most_popular).limit(20)

The example is a bit contrived, but I hope it illustrates the point. By first selecting a bunch of ids that you need for the second query you end up with two very simple (and fast) database queries. Modern SQL databases are extremely powerful, and you can accomplish almost anything in a single query. But by splitting the work up in multiple queries you get three main benefits:

1) you know your ORM is going to generate reasonable SQL
2) you often get extra opportunities for caching the intermediate results
3) you can do additional filtering in Python, Ruby, NodeJS before you make another round-trip to the database to get your final results. This is often faster and easier than doing the same in SQL. Unicode handling in your database and in your target language are always going to subtly different, same with regular expressions. You’re better off using the database as dumb storage.

Extra JSON column

There are plenty of situations where you want to tag a couple of rows for whatever reason. Maybe you’re experimenting with a new feature. Maybe you need temporary storage for a migration. Maybe you need some logging for a hard to replicate bug. Being able to quickly serialize some metadata without having to add an extra column to a database is a big time-saver. You don’t even need to use the JSON serialization functionality in your database for this to be useful. Just being able to store a BLOB of data is where the real value is at.

Data & event handling in the app

As we continue working on the app, one of the decisions in building Thymer is how the data will flow through the app. Let’s look at some more technical details about all the event and data handling considerations.

Server communication

Because Thymer will work like an editor of sorts, it needs to feel like one you might have on your laptop, even though it can run in the browser. That means interactions with the app should be pretty much instant from the perspective of the user. When a user adds or removes text, we don’t have time to ask the server to do the rendering for us. Even if the server would be fast, when every millisecond to render something on the screen counts, that pesky speed of light limit alone would make a round-trip to the server too slow.

That means we need to do all the rendering in the app itself, and send data between the client and server in the background. All the rendering we do is optimistic, meaning it assumes that at some point we’ll be able to communicate it to the server, but it doesn’t matter when.

Data types

The main data type in Thymer is called a “line” for now. It’s like a line in a text document, except lines in Thymer can have all kinds of additional metadata. For example, we will support different “types” of lines, like a heading, a todo, a comment (and perhaps later on things like files or drawings or whatever else we want). Lines can also contain other information, such as references to other lines, tags, and they can be “scheduled”, have due dates and so on.

Lines can also have a parent line, so we can allow features to work on a tree structure, such as zooming in at a section of the document, or collapsing/expanding sections. A document is a list of lines in a certain order.

Next to the data itself, we store the changes on the “document”, called events. An event can be something like creating a line, editing a line, and so on. Although we store the latest version of the “document”, if you would replay all events from the beginning to the end, you would get to the same document.

A copy of all the data

In order to make the app feel instant, it’s important that we store a local copy of the “document” on the client. If something like scrolling down in your document or opening a schedule for next week would result in a delay while waiting from the server, that would not be a good editing experience. In addition, unlike text, it’s possible to link to other parts of the document, so those parts need to be loaded too.

In order to store a local copy of the document, we use the browser’s built-in database, called IndexedDB. Unfortunately, IndexedDB is rather slow, so we also keep an in-memory cache and try to do as few updates to the IndexedDB as possible.

An extra advantage of storing the document like this is that we will be able to easily make the app work offline as well, so you can keep working with Thymer while on the road (in the air).

Because almost all functionality will be client-side anyway, we could even look into something like end-to-end encryption, but we might not have time for that for the first version.

Collaboration

Another factor to consider is that we need the app to be collaborative. That means that not only should we send our own events to the server, but also listen to incoming changes the server sends us. For this part we use websockets. Whenever the user makes any changes, we’ll tell the server about it, which will then tell other clients which are online.

To sync with the server, we send our changes and request the server to send us changes from other clients. We’ll go into the exact algorithm and data types to do this in some other post, but in the end we end up with a list of “events” which we should apply to the document we have stored locally.

UI components

Another reason we have to think about “collaboration” is that even when someone uses Thymer by themselves, things still need to work if you have multiple browser tabs open. And even if that wouldn’t be necessary, then the point of the app is to have features popular in IDEs, such as split-panel views.

That means that when you make a change somewhere in the document, it’s possible that that same section of the document is also visible in another panel, and would need to be updated there as well. From the other panel’s point of view, the change might have come from anywhere, but we need to re-render the line so it’s updated. That means the events need to be handled by all components in the UI.

Combining it all

Because changes can come from multiple sources, multiple parts of the app might have to be updated because of a change, and simply rendering everything to the browser’s DOM would never be fast enough, we use a simple event system in which data always flows in the same direction.

That way, when we make a change somewhere, we don’t have to think about which of a dozen things we need to update one by one. Instead, we create an event, and send it to the root of the app, which sends it to all visible components which can then update themselves. For performance reasons, we take one shortcut in the editor itself: it will immediately respond by drawing the change, after which it will create an event which will inform other components.

As an example, when we paste “hello world” on an empty line:

  • The editor will immediately draw ‘hello world’ to the line’s DOM node
  • The editor panel will create an “update line” event, which is dispatched when the browser has time. We’ll experiment a bit with the best heuristic for performance. This could either be a timeout, or browser callbacks like requestIdleCallback. Using a timeout, we can replace multiple events happening in a short time with one single event (so we can combine individual keystrokes in quick succession to one update event)
  • When the “update line” is dispatched, the app will update the local database (in-memory, and make an asynchronous request to add the change to IndexedDB), and queue the event to be sent to the server
  • Each component in the app for which the “update line” event is relevant receives the event, updates its local state and redraws a part of the screen.
  • After a while, the event is sent to the server. Any new event which is received as a reply follows the same flow and all components will self-update.

Metal with a sprinkle of cloud: our CALM server stack

Which stack to pick is a recurring topic, and we’re big believers in how dedicated hardware [1] is (still!) a great option, especially for bootstrapped (SaaS) startups.

A few people pointed out we use a few cloud services as well, so which is it, cloud or metal? And what exactly does our favorite stack look like then?

I like to call our stack CALM. The purpose of CALM is in the name: we want the most bang for buck while keeping things simple and boring. We’re a tiny team and want our SaaS to scale to many customers (potentially millions of users) without having to worry about complexity and cost. Let’s look at the different parts:

1U ought to be enough for anybody!

If you’re puzzled by the “one machine” part: yes, that’s how ridiculously cheap and powerful metal is these days.

For ~$150/month you can get an absolute beast of a machine, with 128GB of RAM, 2 TB SSDs and dozens of 5Ghz cores. You don’t pay for each byte of traffic, each cycle of CPU or a few GBs extra of RAM. It makes tiny cloud compute nodes look like a toy. It’s powerful enough you can run everything on one machine for a long while, so your architecture becomes super simple as this point. When things start to grow further, we’ll have plenty of RAM to cache queries and dynamic parts of the app.

Sprinkle on the cloud!

This already is enough to handle many thousands of users, no problem. But to make this even more robust, this is where we like to sprinkle a bit of cloud on top:

By using CloudFlare as a reverse proxy, we significantly reduce the number of requests which hit our server. It’s literally set and forget. We set up the CloudFlare proxy in their dashboard and firewall everything else off. It doesn’t add any real complexity to our stack. And it’s cheap (even free for what we need). We can cache and serve static assets from CloudFlare from a location close to the user. There’s really a lot of traffic you can serve this way.

Reliability through simplicity

Servers these days are super reliable. I like to compare it to ETOPS for planes. You no longer need 4 clunky engines, if you use new hardware.

By design, the entire stack has very few moving parts. It’s easier to set up, simpler to maintain, and cheap. The more parts, the more can break. No 3am magic automatic migrations, noisy neighbors, billing alerts, or having to refactor your app because some inefficient query is eating up all your profits in cloud costs, none of it.

Another advantage is that a small dev VM on our laptop can mirror the entire architecture, so even if we want to make changes to the server set up (which we hardly ever do), we can test the exact thing on our laptop.

We’ve never had significant outages but of course, catastrophe can always strike, and that’s a fair point (and the cloud is not immune to these either!). You can find examples of an OVH datacenter literally catching fire. It’s very rare, but we want to be prepared!

When starting out we simply stored a backup of the entire server at a different location (again, cloud sprinkled on top). That’s OK, but in a worst case disaster we would have to spend half a day to restore things, which isn’t very calm. Because servers are so cheap, we’ve now simply added a completely identical mirror as hot standby in a datacenter in a different part of the world.

If something goes terribly wrong, we simply point the DNS to the other server. And in an even worse case we could even completely switch hosting providers and move our entire architecture somewhere else (not that we’ve ever had to deal with problems like these).

CALM

So there we have it, our CALM stack: Cloudflare (or other CDN/proxy) -> App -> Linux -> Metal.

Of course, some percentage of startups is going to have special requirements (petabytes of storage, 99.99999% uptime) and this won’t work for everyone. But when your database is only a hundred gigabyte or so and the whole thing fits into RAM, you don’t need to worry about “big data” problems. Similarly, lots of websites and apps can get away with a minute downtime per month (if that happens at all).

It will be a long time before we outgrow this architecture, which only costs about $500/month, and that means we can worry about the product and customers instead.

[1] Earlier this week, we wrote about how You Don’t Need The Cloud.