Monday, March 26, 2018

Greenplum, the Postgres Community and PostgresConf

1. You are an Open Source variant of Postgres; what benefits do you bring to the table over vanilla Postgres?

Postgres is a powerful ORDBMS, but as your data scales, the only way to keep up is to buy bigger and bigger machines to run on. It suffers from the same problems that all SMP databases do: you can only get as big as the machine you’re running on.

With Greenplum you can put a subset of your data on a Postgres database on one reasonably-sized machine, and another subset on a second machine, and so on. All of your users and applications can then query one of these Postgres databases as if all the data was in a single location - making your data scale limitless. Greenplum manages the distribution, data shuffling, and querying of all of your data across a magically sharded implementation of Postgres databases.

2. Greenplum has its own community; what do you hope to achieve by joining the Postgres community and PostgresConf?

The Postgres community represents some of the most passionate and knowledgeable creators, developers, and users of database technology of our time. We believe that the combination of Postgres and Greenplum becomes the software equivalent of what Oracle Exadata purported to be: an all-purpose database that can do both transactional and analytical workloads across multi-structured data. Simply put, the Greenplum community is looking to join with the Postgres community to further the understanding and adoption of these technologies.

3. Do you have plans for cross pollination of technologies with the two open source projects?

Greenplum forked from Postgres over 10 years, circa Postgres 8.2. Greenplum 5.0 is based off of Postgres 8.3, with our next major release slated for Postgres 9.4 (current open source Greenplum is compatible with 9.0 as of this writing).

Likewise, we have Postgres committers working at Pivotal looking for opportunities to improve the Postgres code specifically for analytics. We are also ensuring that other projects related to Greenplum, like Apache MADLib, continue to be compatible with Postgres.

4. What challenges do you see working with the Postgres community as an open source fork?

The Postgres community is a long-running and very passionate group, and we want to be both collaborative and respectful in how we continue to grow our participation. We see the products as having synergies which complement each other very well, with some use cases that best fit Postgres, and others that best fit Greenplum. The use of either benefits the other as they both further adoption.

5. What would you tell a user who has a choice between Postgres and Greenplum about when they should use which?

Postgres is a great ORDBMS that will scale to the performance of a single server. For analytical needs, being restricted to a small number of terabytes does not allow for the type of exploration that most organizations need. Because Greenplum is a Postgres compatible database, you can start out using Postgres and either convert to Greenplum underneath or leverage Greenplum alongside your Postgres systems (making data ETL a ton easier). This then makes the choice of which product to use for your particular use case clearer and clearer.

6. What is the number one barrier you see to contributing to the Postgres community?

The number one barrier we will have to contributing is not seeing the corresponding adoption of our technologies. We feel very strongly that both the transparency and removal of vendor lock-in make our open source commitment the only choice for users. I’m here to implore the community to embrace our technology with zeal and help us continue to drive more and more Postgres adoption in the world.

7. What is the best thing about working with the Postgres community?

Because Greenplum is based on Postgres, we get to interact with this vast community of talent. We are also able to more seamlessly interact with ecosystem products that already work with Postgres, making the adoption of Greenplum that much easier.

8. Tell us why you believe people should attend PostgresConf 2018 in April.

PostgresConf is going to be awesome - with both Pivotal and Amazon headlining as Diamond sponsors - as well as the quality of speakers and their content. I wouldn’t miss it for anything.

We’re thrilled to organize the first annual Greenplum Summit at PostgresConf. Greenplum co-founder, Scott Yara, will give a keynote on April 18th relating to how data tells the story at the organizations that we help enable (#DataTellsTheStory), and his journey from SMP to MPP. Greenplum Summit on April 19th will be a full day packed with with great use case sessions and tech talks for novices and experts alike.

Jacque Istok, Head of Data, Pivotal