Comments on Database Soup: PostgreSQL New Development Priorities: Scale It Now

Amazon claims to have made the "scale it now&...

2013-11-18T13:41:12.005-08:00

Amazon claims to have made the "scale it now" button: http://aws.amazon.com/rds/postgresql/

MongoDB-s safe write opion are so slow that you re...

2013-05-19T09:24:27.376-07:00

MongoDB-s safe write opion are so slow that you really can't use them well for scaling. That is with {j=True} you get about 1/20th of write performance of PostgreSQL which likely means that you need 20+ nodes to get the same performance.
And NoSQL-like scaling is easy - do as Skype does and use pl/proxy for near-linear scaling, easy multi-node queries, map/reduce, etc...

I was just talking with Mason Sharp about this sam...

2013-05-16T15:37:39.935-07:00

I was just talking with Mason Sharp about this same thing last night after the NYCPUG meetup. Postgres is currently lacking in the areas of HA and Fault Tolerance, and what's worse is that the majority of Postgres developers seem almost indifferent. Given the operational demands that OmniTI and our clients have, I'm not surprised by this gap between us and some of the other support companies, but it can be frustrating at times, so I am glad to hear someone else bringing up the issue.

That said, we (OmniTI) are working on trying to solve this; we've got trials ready to go with one of our customers on the whole automated failover / rebuild portions of this problem. There are other components in the mix as well; pgbouncer and vippy solve certain parts of the puzzle, it's a matter of finding the right mix. We were able to make WAL management and backups way better with the OmniPITR tools, and we think we close to the point where we can do the same thing for the HA story, and then hopefully tackle fault tolerance at some point after that.

Is this about MySQL replication being easier befor...

2013-05-15T12:54:21.955-07:00

Is this about MySQL replication being easier before Postgres 9.X ? Or still the case today?

> For example, listen to Nelson Elhage, enginee...

2013-05-15T12:28:40.481-07:00

> For example, listen to Nelson Elhage, engineer at Stripe.com: "I love Mongo's HA story. Out of the box I can build a 3-node Mongo cluster with a full replica set. I can add nodes, I can fail over, without losing data."

An interesting contrast are the stories where due to the nature of MongoDB's design, data can be lost without node failures or network partitions. In some cases this is due to developers not reading the docs about the safe vs fast write options, but in others it is simply the nature of the tool which favors speed over consistency (or logical integrity).

IMO caching systems (especially Redis) in conjunction with a relational engine like PostgreSQL are the more interesting and flexible approaches.

In my dream world (where I have a pony and can fly...

2013-05-15T10:59:59.741-07:00

In my dream world (where I have a pony and can fly), some simplified form of Oracle RAC would be a big win. In my day job and for my personal AWS projects I don't really require the pure scalability of Mongo/Redis/Couch/etc. They are great but for all of my projects and work and I would guess about 99% of the devs out there that if you could get PostgreSQL a good out of box HA with 2-4 with some simple form of sharded SELECT would be "good enough". For DML a single node is generally enough, or similar to RAC, whichever node you are connected to is your DML server but all of the clustered servers "could" be used for SELECTS if the query amenable to it.

Actually I take that back, that would "friggin' awesome!". I've been interested in Postgres-XC for a while though but in talking to the owner of the company at SCALE this year the limitations are too extreme for my use cases.

2013-05-15T10:59:52.745-07:00

This comment has been removed by a blog administrator.

One anonymous comment sent to me by email: &quo...

2013-05-15T10:05:02.123-07:00

One anonymous comment sent to me by email:

"I'm running a large scale POC to bring in "new" database technology at a major firm (with over 15,000 Sybase databases), and I probably know MongoDB better than anyone here. I wanted you to know that I think you hit the nail on the head. I've also heard from friends in other sectors that they've tried PG but went back to MySQL because "failover is easier". In my experience Mongo has its ups and downs, but if it does something well, it's that you can have a replicated cluster up in about 15 minutes tops, and that when you need to failover, it almost always "just works". This nearly flat learning curve greatly helped it gain traction IMO, because it does that simply, and the rest of it is close enough to the RDBMS in principal that an average project is off to the races in no time. Bringing this kind of simplicity to HA initial setup and stability could do a massive amount of good for PG's adoption curve."

I completely agree with the ideas in that post. Sc...

2013-05-15T09:49:44.705-07:00

I completely agree with the ideas in that post. Scalability and ability to easily push and then aggregate data to several nodes is something, that is needed more and more.

Actually PostgreSQL already has most important components that are needed and the amount of work to extend them in not really much:

for the quick win:
- readable postgres fdw (there)
- writable postgres fdw (mostly there)
- predicate pushing in fdw (there)
- aggregate pushing in fdw (should be implemented)
- data moving from shard to shard / re-balancing (should be implemented)

As a data architect who has successfully tried to ...

2013-05-15T07:39:35.107-07:00

As a data architect who has successfully tried to push Postgres to be used for astronomy project with a near Petabyte scale eventually (currently 10^9 complex objects, 10^11 unnested, 10sTB currently), I would love to see improvements on all the items from your list, with a twist: I'd move improvements to replication to the last place.

XC is certainly not a way to do replication, but for semi-analytical XLDB-like setups it might be an answer granted it is robust (enough) and supports query partitioning better.

I could have seen spikes of development activity for XC, but at least it has been progressing, so hope this would be a better bet next year and years to come.
XC for replication: amount of traffic to/from GTM just defies the purpose and this has been clear from the beginning of this project, it should never be run on WAN currently (and will never be unless Postgres provides better read-only logic, without need to create snapshots etc).

There's an idea to add a 'standalone' option to XC so it could be deployed as a standard PG but it is low on the list of priorities from what I could deduct as a XC user.

Best

There is one thing that has been somewhere way in ...

2013-05-15T06:22:31.990-07:00

There is one thing that has been somewhere way in the back of my mind a while now; is there anything that can be done by Postgres itself to make it more accomodating to "scale-out" solutions, and perhaps even provide a "stepping-stone path" to an easy to use scale-out solution some releases in the future? Any common feature that could be pulled in and maintained as part of Postgres? Could extensions be expanded so that all one has to do to turn a common PostgreSQL database installation into a cluster node is to load the right extension? Or is each scale-out approach just too different and requires too specific configuration? I don't know the internals of the various components well enough so nothing specific comes to mind, I'm just seeing that there's a lot of side-projects related to scaling Postgres but they all add "curious setup steps and complicated configuration" to the mix.

Agreed. At work we grudgingly moved our main proje...

2013-05-15T02:11:41.465-07:00

Agreed. At work we grudgingly moved our main project to MongoDB, which is a PITA on many levels but is easyer to scale than PG. I'd love to get back to PG someday.

Postgres-XC looks promissing but its development seems slow, and the restrictions and configuration are still too burdensome. Perhaps this can be solved by an influx of developers.

Streaming logical replication sounds easyer to achieve, but it doesn't solve the querying part.