8VC Emerging Builders Spotlight: Piyush Jain (Yugabyte)

Posts

Interview

Mar 30, 2023

In supporting the industry defining companies of the 8VC portfolio, we are fortunate to work with the brightest, most dedicated people in the world. We’re excited to feature some of the most promising engineering and product talent we have the pleasure of collaborating with, not only at 8VC, but within our broader network.

Today we are highlighting Piyush Jain from Yugabyte. Piyush is a software engineer at Yugabyte working in the database group. Before this, he worked at Nutanix, after which he completed his Masters at UT Austin with a focus on distributed systems. In his free time: Piyush keeps an eye on developments in layer 1 blockchains, enjoys hiking, practicing yoga and playing tennis (started picking up recently).

In your words, what is Yugabyte and why are you excited about the company?

Yugabyte is the company behind YugabyteDB, our open source distributed transactional database. You might be wondering “Why do we need another database?” Let me go back into some history to explain how YugabyteDB came to be.

Databases are complex pieces of software, which almost all technical systems rely on for their data storage and access needs.

Databases have gone through a long and interesting journey and have evolved in two ways; 1) functionality-wise to cater to new data management needs and 2) scalability-wise to support a larger population on the internet, while still maintaining good response times.

Until the late 2000s, we simply had single node databases. If you wanted to support more data and traffic, you would have to switch to a beefier single-node machine with more compute and storage. But, these single node databases were transactional, i.e. ACID compliant.

Having an ACID-compliant transactional database brought two major user benefits. Firstly, you could apply a set of operations atomically on the database without worrying about partial application of the operations this would help in a bank transfer for example. The second benefit was that the database would serve multiple such atomic sets of operations in parallel for various clients, while ensuring that it behaves as if all transactions occurred serially one after the other. This is important to ensure that when you transfer money to someone, two independent people don’t have to be blocked for your transfer to go through. This needs to be done while ensuring that two sets of operations to the same piece of data don’t trample on each other leading to unintuitive outcomes, such as double spending using one account, by performing simultaneous transfers amounting to more than the account balance.

As more people started coming on the internet, our applications grew and you could vertically scale-up your database node to satisfy the traffic. But, there was a hard limit on the beefiest machine available on the market at any given time. These bigger servers also became more specialized and thus more expensive.

This paved the way for a class of distributed NoSQL databases that horizontally scaled by compromising on transactional guarantees. The application wouldn’t get the same intuitive guarantees that traditional databases provided, and would need some re-writing to bake in some/all of the required ACID guarantees into the application.

Reinventing the wheel for transactional requirements in each application is highly time-consuming and error prone as transactions are hard to get right.

This is where YugabyteDB comes in: it provides the old, gold standard of being ACID compliant while still being horizontally scalable, similar to NoSQL databases. So, you get the best of both worlds. It also goes further, providing not just transactions, but also the classical features of relational databases in a distributed environment, which would have again required extra logic in the app if using a NoSQL database. Moreover, it runs almost anywhere, bare metal, any cloud, Kubernetes, etc.

Databases are all about tradeoffs. With the transition from SQL to NoSQL, you’re trading off consistency for scalability. What are the tradeoffs as you move to a distributed SQL database— i.e. horizontally scaled SQL— paradigm and where does Yugabyte do the most work to minimize the tradeoffs that you’re forced to make in that transition?

Let’s discuss two different paths to distributed SQL databases: from NoSQL and from single-node relational databases.

When moving from NoSQL to distributed SQL we are not really compromising on anything, instead we are strictly gaining some benefits. A distributed SQL database can do everything that NoSQL does while achieving the same performance and providing the same scalability. This can be easily seen via the YCQL Cassandra API that YugabyteDB provides. Moreover, YCQL goes further by providing distributed transactions, which traditional NoSQL doesn't offer.

However, a trade-off exists when comparing with single node relational databases. Distributed SQL has to incur higher latencies, which stems from the distributed nature of the system required for scale and availability and the basic laws of physics and speed of light. Inter-node interaction is needed for all writes since all data is replicated by a consensus algorithm that provides resilience to node failures and high availability. Moreover, distributed transactions that touch data on multiple nodes require additional inter-node coordination along with the usual consensus replication. All distributed databases have to face this.

At Yugabyte there has been a lot of work put into ensuring that we pay only the penalty justifiable by this theoretical trade-off and nothing more. One example is that we make sure not to start a distributed transaction if we can decipher that it touches a single shard, thus saving on network round trips associated with the lifecycle of a distributed transaction. Another instance is that since a distributed transaction incurs overhead even during creation, the database keeps a pool of new distributed transactions around on each node to avoid that latency. Further, there are multiple micro optimizations baked in to solve the problems that distributed SQL databases face, for example, higher latencies stemming from clock skew between nodes.

How did you first hear about Yugabyte and how did you initially decide to join?

From my previous role and during my Masters’ at UT Austin, I was deeply into distributed systems, but not so much into databases. At that point, I was reading literature on Google Spanner and other distributed databases, and realized that transactional distributed databases were a space that I wanted to work in.

This was due to many reasons: they are a cutting-edge piece of technology which touches a lot of pieces in distributed systems theory, and this amplifies the complexity of the already complex field of databases. It is better to deal with that complexity once when making the database, rather than it spilling out into all applications, as NoSQL users have to do. If distributed SQL databases were created earlier, tech companies wouldn’t have taken the NoSQL detour, but at that time we didn’t have the technical breakthroughs needed to make it happen. I could see that many people had started/were starting to move to distributed SQL given that it is a win-win situation - you get both scale and consistency.

Karthik, our CTO, gave a talk at UT, which led me to start following Yugabyte and learning about the database world. I followed the company and its competitors for about five months and finally decided to reach out. I was inclined towards Yugabyte because of their strategy of reusing the PostgreSQL code base, which is inline with this philosophy, which has historically proven successful.

How has your work evolved over time at Yugabyte? What does your day-to-day look like?

Thanks to Yugabyte, I’ve grown a lot in the past two years here. I’ve worked on various features from start to finish, including roadmap-level planning and design as well as core implementation.

I started with simple features, like adding partial indexes for YCQL, our Cassandra wire-compatible query language. Then I began working on major additions to our distributed transactions layer, specifically a new isolation level and orthogonally, a new concurrency control scheme. This helped me gain an understanding of the innermost workings of the database, which makes it a true distributed SQL database with a truly differentiated architecture compared to old databases or the earlier NewSQL offerings.

PostgreSQL has three isolation levels at the core:

Serializable (strongest guarantees)
Repeatable Read
Read Committed (weakest guarantees)

At the time, Yugabyte only had two. My task was to add the third isolation level, Read Committed, to bring us to parity with PostgreSQL.

Currently, I am working on improving our cost-based optimizer for better query planning. In regards to my day-to-day work, I must mention that a major chunk of our new feature development time is spent in ensuring correctness guarantees of the database – since that doesn’t come easy.

Even though Read Committed has the weakest guarantees, it’s still important for many personas, especially application developers given this is the default in Postgres. Can you tell us a bit more about the impact of adding this for end users?

There’s a huge impact. Although it is the weakest isolation level, many applications use Read Committed and give up stronger isolation guarantees to ensure better performance. This is ideal when the use case is straightforward enough to add some logic in the application and ensure required isolation guarantees on a case-by-case basis.

We have seen some very large workloads run Read Committed isolation on YugabyteDB. One example is our partnership with the financial banking application Temenos. Also, given that Read Committed is the default isolation in PostgreSQL, it is simple to lift and shift many existing PostgreSQL applications to YugabyteDB without needing to re-write the app.

Also, this isolation brings YugabyteDB in parity with PostgreSQL on isolation levels, which is a major pillar in OLTP databases.

Can you tell our readers a bit more about the three isolation levels to contextualize this discussion?

Assume you have a simple ledger of accounts, and a thousand transactions to process at some point. The simplest way would be to execute the transactions in their totality in order (serially), which ensures no conflicts i.e., no unintuitive behavior. For example, in banking, this ensures that you don’t allow double spending. But, this sequential execution results in a useless system: transactions between strangers will block your transactions. And every transaction blocks all the others that came after it.

This is where transactional databases come in with their isolation levels and why they differ from the simple ledger. They allow transactions (sets of operations) to execute atomically and simultaneously, while keeping varying levels of checks on the unintuitive behaviors based on the chosen isolation level.

Serializable is the gold standard, the strongest, but also the easiest to understand - it doesn’t allow any unintuitive behavior. It guarantees that transactions behave as if they occurred one after the other serially, even though they are executed simultaneously. If it is not possible in some cases, they throw an error to the client.

Repeatable Read and Read Committed aren't explainable in simple terms like Serializable, but the thing to note is that they offer more concurrency at the cost of allowing specific unintuitive behaviors.

Here’s a good post, which covers this in more depth and helps you to understand the trade-offs.

What are some surprising or unexpected things you’ve learned while at Yugabyte so far?

Databases are hard to build, and a distributed setting amplifies the complexity of the former. Even with these challenges, I’m constantly impressed by how quickly and efficiently things are built at Yugabyte. This velocity comes from the brilliant minds that work here, not by compromising on quality and long term vision.

It is surprising and inspiring to see people single-handedly own and drive whole features, which would normally be a whole team effort. And although a large chunk of work might be driven by a single individual, the benefit of Yugabyte’s collaborative approach is that there is still a lot of deliberation within the wider team to ensure the right design choices are made.

Another thing which is not unexpected, but worth mentioning, is the level of transparency and technical insight in the leadership team. Plus, their trust-building attitude with employees, which in-turn gives each individual an opportunity to grow.

What’s something on the Yugabyte product roadmap that you’re excited to work on?

Now that we have parity with PostgreSQL in terms of isolation levels and concurrent control, some of my colleagues are working on providing top-notch observability, to complete the picture.

This includes various notable items including, a view to provide information about the locks that the database is holding, and which transactions are blocking which ones at any instant. Perhaps further in future, reporting historical metrics on conflicting transactions to find hotspots in the workload. This will serve as a feedback loop to write better applications.

An area I am focused on is to perform better cost-based query planning. The PostgreSQL query planner is sophisticated and requires some table-level statistics like histograms, cardinalities of columns, etc to make informed choices between various query plans. One item I will work on is finding ways to efficiently fetch a random sample of rows from a table in a distributed setting. Another item is to automatically perform such statistical data collection when a table’s data changes by a significant amount.

As Yugabyte grows, how do you and the team think about engineering culture?

We have a culture of humility, even amongst a team full of brilliant people, and I look for this humility and self-awareness whenever I interview. It’s all about getting things done in the right way, and getting them done together.

Continue Reading

Posts

News

Jun 11, 2025

Announcing Our Investment in Outset

Ask anyone who’s ever begun a term paper the night before it’s due: there are stark tradeoffs between research time and research quality. This is especially true of primary user research, one of the fundamental ways large enterprises can ensure their products resonate with customers. Historically, this has required a choice between user interviews, which are high-fidelity but slow, manual, and expensive, and surveys, which are scalable, cheaper, and faster, but often result in unrefined, low-signal data. Although there have been some strong companies on the survey side, e.g. Qualtrics and Medallia, this uneasy compromise has always persisted—until now.

Posts

Interview

May 9, 2025

Charles Srisuwananukorn (Together AI) Fireside Chat

We were thrilled to feature Charles Srisuwananukorn from Together AI at January’s Chat8VC. Charles is the Founding Vice President of Engineering at Together AI, where he leads the company’s work on AI infrastructure and clusters. Previously, he was Head of Applied Machine Learning at Snorkel AI and held engineering roles at Apple. He studied Computer Science at Stanford and has helped steer Together from an early contributor to open-source AI to a full-stack infra platform.

Posts

Interview

May 6, 2025

Joe Chen and Jonathan Shen (Upwork) Fireside Chat

We were thrilled to feature Joe and Jonathan from Upwork at March's Chat8VC in San Francisco. We covered their journey from teams like Google Brain and Cruise, and their own startup, to leading AI efforts at Upwork—building Uma, a suite of specialized LLMs powering workflows for freelancers and clients across the platform.

Posts

News

Apr 30, 2025

Clear Eyes, Fuzzy Joins, Can’t Lose: Announcing Our Investment in Structify

Human-quality workflows need human-quality data, an axiom that has only grown truer in the AI-first enterprise. However, access to complete, high-signal data remains a limiting factor, given steep data provider fees, inflexible schemas, AI hallucinations, and scattered, inconsistent, and mutating sources. Customers don’t need to be data scientists to recognize shovel-ready datasets, but if they need to be data scientists to generate them reliably, data will always be rate-limiting.

Posts

Apr 9, 2025

Quantifying the Impact of GenAI Developer Tools

It’s widely agreed that GenAI will transform software development, and GenAI dev tools have emerged as cornerstones of 8VC’s portfolio and broader AI productivity thesis. Up to now, however, hard data on the scale and specifics of this shift have been missing from the equation. In competitive industries, the speed and efficiency gains promised by GenAI coding tools could well mean the difference between market leadership and obsolescence. Companies can’t afford to select the wrong tools and end up on the wrong side of the AI adoption curve.

Posts

News

Apr 8, 2025

Glimpse's $10M Series A to Rescue CPG Margins

For many consumers, the grocery aisle is the avatar of post-COVID inflation. Even as headline inflation has cooled (from 9.1% in June of 2022 to 2.8% in February of 2025), food prices remain stubbornly high, driving popular perceptions of gouging. $10 egg cartons, once confined to artisanal producers in Portland who know every hen by name, have hit big box stores. But grocery is a famously low-margin industry, and for CPG brands, the cost of doing business can be prohibitive.

[mORE RESOURCES]

back to RESOURCES

Home

Resources

Portfolio

Fellowship

About

Build

Our Thesis

Jobs

Team

Contact

Announcing Our Investment in Outset

Charles Srisuwananukorn (Together AI) Fireside Chat

8VC Emerging Builders Spotlight: Piyush Jain (Yugabyte)

Share

In your words, what is Yugabyte and why are you excited about the company?

How did you first hear about Yugabyte and how did you initially decide to join?

How has your work evolved over time at Yugabyte? What does your day-to-day look like?

Even though Read Committed has the weakest guarantees, it’s still important for many personas, especially application developers given this is the default in Postgres. Can you tell us a bit more about the impact of adding this for end users?

Can you tell our readers a bit more about the three isolation levels to contextualize this discussion?

What are some surprising or unexpected things you’ve learned while at Yugabyte so far?

What’s something on the Yugabyte product roadmap that you’re excited to work on?

As Yugabyte grows, how do you and the team think about engineering culture?

Continue Reading

Announcing Our Investment in Outset

Charles Srisuwananukorn (Together AI) Fireside Chat

Joe Chen and Jonathan Shen (Upwork) Fireside Chat

Clear Eyes, Fuzzy Joins, Can’t Lose: Announcing Our Investment in Structify

Quantifying the Impact of GenAI Developer Tools

Glimpse's $10M Series A to Rescue CPG Margins

Links

Company

Programs

Contact