Saturday, May 18, 2013

PostgreSQL New Development Priorities 3: Pluggable Parser

Really, when you look at the long-term viability of any platform, pluggability is where it's at.  A lot of the success of PostgreSQL to date has been built on extensions and portability, just as the success of AWS has been built on their comprehensive API.   Our future success will be built on taking pluggability even further. 

In addition to pluggable storage, a second thing we really need is a pluggable parser interface.  That is, it should be possible to generate a statement structure, in binary form, and hand that off to libpq for execution.  There was recently some discussion about this on -hackers

If there were a way to hand off expression trees directly to the planner, then this would allow creating extensions which actually had additional syntax, without having to fork PostgreSQL.  This would support most of those "compatibility" extensions, as well as potentially allowing extensions like SKYLINE OF which change SQL behavior.

It would also help support PostgreSQL-based clustered databases, by allowing all of the parsing for a particular client to happen on a remote node and get passed to the clustered backends.  The pgPool2 project has asked for this for several years for that reason.

More intriguingly, it would allow for potentially creating an "ORM" which doesn't have to serialize everything to SQL, but can instead build expression trees directly based on client code.  This would both improve response times, and encourage developers to use a lot of PostgreSQL's more sophisticated features since they could access them directly in their code.

Taking things a step further, we could extend this to allow users to hand a plan tree directly to the executor.  This would fix things for all of the users who actually need query hints (as opposed to those who think they need them), as well as taking efficiency a step beyond cached plans.

There are a lot of reasons this would be just as difficult to do as pluggable storage.  Currently parsing depends on a context-dependant knowledge of system catalogs, including things like search_path.  So I have no idea what it would even look like.  But a parser API is something that people who hack on Postgres and fork it will continue to ask for.


  1. Having an ability not to fork postgres and plug different storage,parser is my dream. Good point, Josh !

  2. Academic Postgres has direct API for bypassing parser - our FastCall API is strongly reduced descendant - but actually it is hard task, probably we need expandable parser and rewriter. You will not would write complete SQL parser to support some feature.

    1. My main concern is that SQL allows you only a limited subset of all possible parse / execution trees and everything not expressible via SQL has got no testing.

  3. Antony T. Curtis had a proof-of-concept plugable parser implementation for MySQL ready as far back as around 2005 or so AFAIR ...

    Unfortunately he was never given the time or resources to finish this ... and unlike his pluggable procedure language work that will now finally emerge in MariaDB i haven't head of any plans to revive the pluggable parser stuff yet ... but he might be able to provide some insight in his implementation plans from back then nonetheless ...

  4. I had a (rejected) talk proposal on these (and then some) pluggability features presented to last years European PostgreSQL conference.

    Nice you have taken up elaborating on these much needed issues ;)

  5. I would not do it as "pluggable parser" but rather a "pluggable top-level language", so in addition to pl/v8 we could also have tl/v8 - "tl" for top-level - which then has possibility to directly construct/manipulate parse trees and pass them to planner or even directly construct execution plans bypassing the optimizer.
    Then you could do "ALTER USER bob SET main_language = tlv8" to always switch to new language on connect

  6. I agree with Hannu Krosing's last comment, "pluggable top-level language". This is exactly the kind of thing I need to most effectively make my new Muldis D language run in a DBMS.

  7. I will note here that said -hackers thread is still going on, and I just contributed a post to it myself. In brief, I believe that my work in designing Muldis D, a homoiconic language whose native source code form is structured data, would be a good foundation for designing Postgres' whole pluggable language/parser system, and being an intermediary for arbitrary language translation or generation besides.