Unlock 2025 Benchmark data → Access insights to stay ahead in the digital experience race.

Get the Report
skip to content
Loading...
    • Why Product Analytics And what can it do for you?
    • How Heap Works A video guide
    • How Heap Compares Heap vs. competitors
    • Product Analytics + Digital Experience Analytics A deeper dive
    • The Future of Insights A comic book guide
    Watch a Demo
  • Data Insights

    • Journeys Visual maps of all user flows
    • Sense AI Analytics for everyone
    • Web Analytics Integrate key web metrics
    • Session Replay Complete context with a single click
    • Heatmaps Visualize user behavior instantly
    • Heap Illuminate Data science that pinpoints unknown friction

    Data Analysis

    • Segments User cohorts for actionable insights
    • Dashboards Share insights on critical metrics
    • Charts Analyze everything about your users
    • Playbooks Plug-and-play templates and analyses

    Data Foundation

    • Capture Automatic event tracking and apis
    • Mobile Track and analyze your users across devices
    • Enrichment Add context to your data
    • Integrations Connect bi-directionally to other tools

    Data Management

    • Governance Keep data clean and trusted
    • Security & Privacy Security and compliance made simple
    • Infrastructure How we build for scale
    • Heap Connect Send Heap data directly to your warehouse
  • Solutions

    • Funnel Optimization Improve conversion in user flows
    • Product Adoption Maximize adoption across your site
    • User Behavior Understand what your users do
    • Product Led Growth Manage PLG with data

    Industries

    • SaaS Easily improve acquisition, retention, and expansion
    • Retail and eComm Increase purchases and order value
    • Healthcare Build better digital patient experiences
    • Financial Services Raise share of wallet and LTV

    Heap For Teams

    • Product Teams Optimize product activation, conversion and retention
    • Marketing Teams Optimize acquisition performance and costs
    • Data Teams Optimize behavioral data without code
  • Pricing
  • Support

    • Heap University Video Tutorials
    • Help Center How to use Heap
    • Heap Plays Tactical how-to guides
    • Professional Services

    Resources

    • Down the Funnel Our complete blog and content library
    • Webinars & Events Events and webinar recordings
    • Press News from and about Heap
    • Careers Join us

    Ecosystem

    • Customer Community Join the conversation
    • Partners Technology and Solutions Partners
    • Developers
    • Customers Stories from over 9,000 successful companies
  • Free TrialRequest Demo
  • Log In
  • Free Trial
  • Request Demo
  • Log In

All Blogs

Heap’s Next Generation Data Platform: How we re-architected Heap to make it 5x faster

David Fullerton
September 19, 20236 min read
  • Facebook
  • Twitter
  • LinkedIn

Since we started Heap in 2013, we’ve seen the use of behavioral data expand exponentially, especially at the enterprise level. This is great - at large companies, there’s no better solution for quickly understanding what your users are doing and why.

But managing all of that data comes with its own challenges: as the volume of data grows, queries get slower and slower. For our enterprise users, this can be a real problem. That’s why for the past year we’re been hard at work rebuilding the core Heap Data Platform from the ground up.

Introducing Heap’s Next Generation Data Platform. It’s an analytics platform fully re-architected to handle enormous datasets, and return answers immediately. We’ve been quietly rolling it out to an initial set of customers over the past few months, and are excited to announce the rollout to all customers beginning today!

For you this means: faster queries, greater flexibility, and access to even more advanced analytics capabilities like Heap Illuminate. 

That said, rearchitecting from the ground up wasn't without its difficulties. Here are some of the challenges we faced, and what we did to overcome them.

Heap’s unique platform

From its inception, Heap’s product and platform have been built on the promise of a complete, retroactive, autocaptured dataset. These core capabilities are what make Heap so powerful, allowing teams to be more agile in how they capture and analyze information. But they also introduce significant challenges in designing a platform that performs at scale.

  1. Autocaptured data: Heap automatically captures a complete behavioral dataset (views, clicks, taps, form submissions, etc.). This creates a powerful dataset for analysis, but also generates a significant volume of data: a typical Heap session tracks dozens of autocaptured events, much more than a legacy manual tracking platform would typically instrument. This means the Heap platform must be able to process and store billions of events every day in a format that allows for easy querying.

  2. Heap Illuminate-powered insights directly from autocaptured data: Heap’s Illuminate capabilities proactively surface suggestions directly from your data, allowing you to find insights in data you might never have thought to manually track. This means we need to be able to query the autocaptured data in real time based on whatever you’re currently looking at, and not rely on pre-computed insights that don’t reflect the problem you’re trying to solve.

  3. Retroactive, virtual events: Heap's platform provides the flexibility to define events on top of the autocaptured dataset and use them in analysis – for example, defining a conversion event based on button text and a page url pattern. These event definitions are retroactive, which means that Heap applies them back in time on your dataset allowing you to analyze from the beginning of data collection, not just from when you defined the event.

  4. Retroactive user and account identification across devices and platforms: Heap also allows you to pass in a user identity in order to analyze their behavior across devices and platforms. This identity is also applied retroactively, allowing you to understand what a user did in the sessions before they signed up or converted. (note: user identification is done via a customer-provided identity, like an account id after sign up – Heap does not do any device or user fingerprinting)

All of this adds up to an incredibly complex platform to design, implement, and operate at scale.

Heap’s new platform

Heap has run for years on Postgres using the Citus extension to turn it into a massively distributed database. Postgres is a tried-and-true database that powers some of the largest products in the world, and we still use it wherever we can. But as our core analytical query system, we were starting to hit its limits at scale.

Heap’s original architecture stored every event in one place: a single giant distributed Citus table. We made heavy use of partial indexes to make it efficient to query virtual events, and partitioned data by user id to ensure that as much of the query workload as possible was pushed down to the individual nodes. This system had a few limitations: Postgres is row-based, which meant that data didn’t compress well and many types of queries were slower than they needed to be if they were stored in a columnar format. It also fundamentally coupled compute and storage, which meant we couldn’t scale them independently or take advantage of cheaper cloud storage.

The new system splits the architecture across two components, a Hudi data lake for offline processing, plus SingleStore as our core “live” analytics system. We chose SingleStore after an extensive evaluation of a number of different platforms, including the option of building something ourselves from scratch. We chose it for several key advantages:

  1. Faster querying. While Postgres continued to hold up well on simpler queries, SingleStore stood out for its ability to join data across distributed tables. This significantly simplified our architecture (allowing us to store user and account data separately from the event data) and enabled us to massively speed up some of our slowest query types.

  2. More efficient scaling. SingleStore separates where data is stored from where queries are run (the “separation of storage and compute”). It also natively supports columnar storage, which allows for much more efficient storage for an analytics workload. These two things combined mean that we can store data much more efficiently, taking advantage of cheap cloud storage, and scale out our query capacity separately from the underlying storage.

  3. Flexibility. Heap runs some very complex queries, and we make heavy use of UDFs (user-defined functions) to push that complexity down into the database engine. SingleStore has both a powerful set of query primitives in SQL and support for us to create our own UDFs for more advanced cases.

In a follow-up post we’ll dive more into the technical details, but the end result is that we now have a platform that is faster, cheaper, and more flexible than our prior system.

Rolling out the new platform

For the past few months, we’ve been running the new platform alongside our existing platform, running the same queries on both systems and comparing the results. This has allowed us to verify that the new system is operating correctly, and to do direct comparisons of the performance at scale. 

The biggest impact of the new system is felt by our largest customers, who are typically seeing 5-10x improvements in query performance on their slowest queries:

NextGen image

Comparing query performance of some of the first large customers running on the new system.

We’ve also achieved our goals around data latency, meaning that the data is more up-to-date when you query it. This is important for customers who are trying to make realtime decisions based on user behavior. And we’re seeing decreased costs as we roll customers over to the new system, particularly in data storage.

We’re just starting to scratch the surface of what’s possible with this new architecture – we’re still finding new performance tweaks just about every day, and are rolling out several new query abilities that take advantage of the new architecture to customers who are already on it. We’re excited to push the limits of the system over the next few months as we complete the roll out to all customers.

Try it now

We’re still putting the new platform through its paces, and are excited to have as many customers as possible trying it out. If you’re an existing customer, there’s no need to do anything – we’re rolling through the customer base and migrating existing customers as fast as we can. The change is completely transparent so all you should notice is queries getting faster. 

If you’re not yet a Heap customer, give it a try: you can sign up for a free trial of Heap and you’ll automatically be started on our new platform.

David Fullerton, Chief Technology Officer at Heap

Was this helpful?
PreviousNext

Getting started is easy

Start today and elevate your analytics from reactive reporting to proactive insights. What are you waiting for?

Free Trial

Related Stories

See All

  • Creative visualization of AI CoPilot capability
    article

    Heap announces new generative AI CoPilot

    Heap, the leader in product analytics, unveils AI CoPilot’s open beta today.

  • Heap.io
    article

    What’s Next in Experience Analytics?

    What does the future of analytics hold, and what does it mean for you?

  • Heap.io
    article

    Building a Retention Strategy, Part 2: Connecting Activities to Revenue with a Metrics Tree

    If you read one post from this series, it should be this one.

Better insights. Faster.

Request Demo
  • Platform
  • Capture
  • Enrichment
  • Integrations
  • Governance
  • Security & Privacy
  • Infrastructure
  • Heap Illuminate
  • Segments
  • Charts
  • Dashboards
  • Playbooks
  • Use Cases
  • Funnel Optimization
  • Product Adoption
  • User Behavior
  • Product Led Growth
  • Customer 360
  • SaaS
  • Retail and eComm
  • Financial Services
  • Why Heap
  • Why Product Analytics
  • How Heap Works
  • How Heap Compares
  • ROI Calculator
  • The Future of Insights
  • Resources
  • Blog
  • Content Library
  • Events
  • Topics
  • Heap University
  • Community
  • Professional Services
  • Company
  • About
  • Partners
  • Press
  • Careers
  • Customers
  • DEI
  • Support
  • Request Demo
  • Help Center
  • Contact Us
  • Pricing
  • Social
    • Twitter
    • Facebook
    • LinkedIn
    • YouTube

© 2025 Heap Inc. All Rights Reserved.

  • Legal
  • Privacy Policy
  • Status
  • Trust