Thursday, November 6, 2014

We need a webapp benchmark

I've been doing some comparative testing on different cloud platform's hosting of PostgreSQL.  And one of the deficiencies in this effort is the only benchmarking tool I have is pgbench, which doesn't reflect the kinds of workloads people would want to run on cloud hosting.  Don't get me wrong, pgbench does everything you could imagine with the simple Wisconsin benchmark, including statistics and sampling.   But the core benchmark is still something which doesn't look much like the kind of Rails and Django apps I deal with on a daily basis.

There's also TPCC-js, which is more sophisticated, but is ultimately still a transactional, back-office OLTP benchmark. 

So I'm thinking of developing a "webapp" benchmark.  Here's what I see as concepts for such a benchmark:
  • Read-mostly
  • No multi-statement transactions
  • Defined "users" concept with logins and new user registration
  • Needs a "sessions" table which is frequently updated
  • Read-write, read-only and session database connections should be separable, in order to test load-balancing optimization.
  • Queries counting, sorting and modifying content
  • Measured unit of work is the "user session" which would contain some content lookups and minor updates ("likes").
Now, one of the big questions is whether we should base this benchmark on the idea of a social networking (SN) site.  I think we should; SN sites test a number of things, including locking and joins which might not be exercised by other types of applications (and aren't by pgbench).  What do you think?  Does anyone other than me want to work on this?

23 comments:

  1. What about a spec compliant TPC-W?

    Jan published on written in PHP several years ago it can be obtained here on pgfoundry, http://pgfoundry.org/projects/tpc-w-php/

    ReplyDelete
    Replies
    1. Yah, this is even worse than pgbench as far as being a real benchmark is concerned. No FKs, no joins. And yet uses multi-statement transactions.

      There's reasons TPC-W has been an obsolete benchmark for a decade now.

      Delete
  2. I think this website/project can be helpful Josh http://oltpbenchmark.com/wiki/index.php?title=Main_Page specially the Epinions and SEATS workloads.

    ReplyDelete
    Replies
    1. Oh! Thank you! I'll dig into that, that looks like exactly what we need.

      Delete
    2. Feh. Java! Why did it have to be Java?

      Delete
    3. Also, sadly, the benchmarks I'm interested in seem to be incomplete. For example, there's no Wikipedia trace file.

      Delete
    4. What do you mean with trace file? I remember running various workloads against PG an year ago or so.

      Delete
    5. Yeah, I'll need to give it some testing time; apparently the file structure is just very opaque. My real concern is that if I don't like some of the workloads (that is, if they're unbalanced or don't scale well), I won't be able to fix it. But test first, fix later ...

      Delete
    6. The structure isn't optimal as it's designed to be generic both workload and database wise. The good stuff is usually at workloads/*/procedures/*.java

      Delete
  3. Maybe some wiki made with Django? I'm thinking about using some existing open source project, something like https://github.com/django-wiki/django-wiki. Using Django will allow compare PostgreSql with other DBMS too.

    I think that using a framework with an ORM may be a little unfair (since it uses basic features), but will give better results, closer to more common and real workloads.

    ReplyDelete
    Replies
    1. yeah, I started down the road of building a benchmark out of some common Django projects. The problem with that is unless I want to devote 5-6 webservers to running the benchmark, that becomes a benchmark *of Django* rather than of the Postges sevice.

      Delete
  4. In an age when we can have servers with 100s of gigabytes of memory, terabytes of SSD, and dozens of cores for about $10K (lower end of the possibilities, yes) talking to browsers on broadband with persistent connections, perhaps we should contemplate designing apps to the VT-100/*nix database/RS-232 paradigm of times past. After all, all those so-called OO coders remain in love with dataObject/actionObject structure which is not a bit different from the COBOL/VSAM structure their granddaddies used. Some kinds of Back to The Future actually make sense; we can now do what Dr. Codd had in mind with hardware he didn't have.

    A test suite should acknowledge such a structure, rather than promoting the disconnected structure of vanilla HTTP.

    ReplyDelete
    Replies
    1. Um, did you have a point with this?

      Delete
    2. Yeah. Conventional web benchmarks are really measuring the latency among javascript, the webserver, and the database server. We need not do it that way any more. At least for a significant number of application types. HTTP was always a bad paradigm for OLTP applications, and with current day infrastructure, we can do away with the disconnected-ness.

      Building a benchmark for yesterday's paradigm only serves to support yesterday's paradigm.

      Delete
    3. Um ... yeah. You do that. Post a link when you have something.

      Delete
  5. I think you should look into linkbench.

    ReplyDelete
    Replies
    1. Thanks, Baron! However, I'm not looking to mainly benchmark graph performance.

      Delete
  6. I'm not convinced your suggested set of measurements reflects typical web apps.
    * Server-side sessions are rare in my experience, signed or encrypted cookies being the preferred alternative for almost any situation. (Sites with server-side sessions always seem frustratingly unusable with the session timing out at inopportune times, the implementation against the grain of HTTP.)
    * "likes" should generally be implemented asynchronously, perhaps via a webserver log which is read in in small batches, so a 1 like per transaction under load would be unusual.
    * The login action needs to reflect a usual user table rather than a database user. Very, very few webapps map app users to db users.

    ReplyDelete
    Replies
    1. Lawrence: you seem to have an axe to grind which has little to do with my post. No thanks.

      Delete
  7. I think it's a great idea. One thing to add to the list: common pagination techniques, with abusing limit/offset being number 1. That's where most of my unexpected performance variation (and corresponding headaches) comes from.

    ReplyDelete
    Replies
    1. If I end up writing my own benchmark, I'd want to hold of on pagination until the 2nd version. It's definitely a real problem, but it's also very hard to set up paginated data in a synthetically generated benchmark.

      Delete
  8. You can use something like tcpcopy: https://github.com/session-replay-tools/tcpcopy & a real log from a Rails/Django MySQL intercept: https://github.com/session-replay-tools/mysql-replay-module


    Note: I haven't used it yet, but seems promising!

    ReplyDelete
  9. Came across this, perhaps it has something useful:

    https://github.com/oltpbenchmark/oltpbench

    ReplyDelete