Database Soup: "In-memory" is not a feature, it's a bug

Friday, February 13, 2015

"In-memory" is not a feature, it's a bug

So, I'm hearing again about the latest generation of "in-memory databases". Apparently Gartner even has a category for them now. Let me define an in-memory database for you:

An in-memory database is one which lacks the capability of spilling to disk.

As far as I know in my industry literature reading, nobody has demonstrated any useful way in which data should be stored differently if it never spills to disk. While the talented engineers of several database products have focused on other performance optimizations to the exclusion of making disk access work, that's not an optimization of the database; it's an optimization of engineer time. The exact same database, with disk access capabilities, would be automatically superior to its predecessor, because users would now have more options.

PostgreSQL can be an "in-memory" database too, if you simply turn all of the disk storage features off. This is known as "running with scissors" mode, and people do it for useful effect on public clouds with disposable replicas.

So an "in-memory" database is a database with a major limitation. It's not a feature, any more than an incapability of supporting SQL access is a feature. Let's define databases by their useful features, not by what they lack, please.

Besides which, with the new types of persistent memory and fast random access storage coming down the pipe in a couple years, there soon won't be any difference between disk and memory anyway.

16 comments:

EJFebruary 13, 2015 at 9:22 PM
The data structures you use for in memory are different in many cases than what you would use if you had to pay the cost of a disk request for each reques
ReplyDelete
Replies
UnknownFebruary 14, 2015 at 12:55 AM
Even for cache systems that do want in-memory only behavior from the OS kernel, the best way to do it still is to use the file based APIs. See https://www.varnish-cache.org/trac/wiki/ArchitectNotes for some details.
ReplyDelete
Replies
Dmitry ShalashovFebruary 14, 2015 at 1:56 AM
Josh, what a good definition :-) Precise and kinda obvious in a hindsight!

Could you elaborate on turning off disk storage options? Turning fsync and friends off? Unlogged tables? RAM-disk?
By the way, does PG works on RAM-disk just as on a spinning one? I mean that probably some common steps are useless when working on unreliable storage and could be skipped.
ReplyDelete
Replies
Daniël van EedenFebruary 14, 2015 at 8:10 AM
I've used an in-memory database for years: NDB Cluster (aka MySQL Cluster), which can be used and was developed without the MySQL frontend. Everything is always in memory and partitioned over multiple machines. If it didn't spil to disk (to make it durable) it would have been useless. A newer feature is diskdata tables for which it is no longer required to have the full table in memory.

Another thing which comes to mind is the in memory option of SQLite which can be used in similar situations as unlogged tables and the memory storage engine for MySQL
https://www.sqlite.org/inmemorydb.html
ReplyDelete
Replies
Coder BobFebruary 16, 2015 at 2:07 AM
If the data fits in RAM then the capability to spill to disk is a liability. What's wrong with a trade-off where you sacrifice a feature that you don't need for major performance, flexibility, productivity and maintainability benefits?

I'm building OrigoDB, an in-memory database for NET. The datamodel is user defined with NET types and collections. LINQ is used for queries and precompiled C# stored procedures for modifications. I can run 100K fully serialized write transactions per second AND squeeze in millions of queries while waiting for the transaction log to flush.
ReplyDelete
Replies
AnonymousFebruary 25, 2015 at 10:41 AM
Constraints and limitations *can* be features. I think you need to use your imagination a bit more. There are legit reasons to run purely in memory and to consider the mere capability of using disk, even solid state, as a liability.
ReplyDelete
Replies
whocaresNovember 24, 2015 at 10:12 AM
gemfire (aka apahace geodoe) has some interestng features on paper.

in-memory processing across multiple nodes (speed and stuf), wan replication, and a persist to disk backed database sync thing (greenplum) .
ReplyDelete
Replies

Add comment