NEW: Heap for mobile. Track every interaction, on every platform.

Learn more
skip to content
Loading...
    • The Digital Insights Platform Transform your digital experience
    • How Heap Works A video guide
    • How Heap Compares Heap vs. competitors
    • The Future of Insights A comic book guide
  • Data Insights

    • Session Replay Complete context with a single click
    • Illuminate Data science that pinpoints unknown friction
    • Journeys Visual maps of all user flows

    Data Analysis

    • Segments User cohorts for actionable insights
    • Dashboards Share insights on critical metrics
    • Charts Analyze everything about your users
    • Playbooks Plug-and-play templates and analyses

    Data Foundation

    • Capture Automatic event tracking and apis
    • Mobile Track and analyze your users across devices
    • Enrichment Add context to your data
    • Integrations Connect bi-directionally to other tools

    Data Management

    • Governance Keep data clean and trusted
    • Security & Privacy Security and compliance made simple
    • Infrastructure How we build for scale
    • Heap Connect Send Heap data directly to your warehouse
  • Solutions

    • Funnel Optimization Improve conversion in user flows
    • Product Adoption Maximize adoption across your site
    • User Behavior Understand what your users do
    • Product Led Growth Manage PLG with data

    Industries

    • SaaS Easily improve acquisition, retention, and expansion
    • eCommerce Increase purchases and order value
    • Financial Services Raise share of wallet and LTV

    Heap For Teams

    • Product Teams Optimize product activation, conversion and retention
    • Marketing Teams Optimize acquisition performance and costs
    • Data Teams Optimize behavioral data without code
  • Pricing
  • Support

    • Heap University Video Tutorials
    • Help Center How to use Heap
    • Heap Plays Tactical how-to guides
    • Heap Updates
    • Professional Services

    Resources

    • Blog A community for digital builders
    • Content Library Ebooks, whitepapers, videos, guides
    • Press News from and about Heap
    • Webinars & Events Virtual and live events
    • Careers Join us

    Ecosystem

    • Customer Community Join the conversation
    • Partners Technology and Solutions Partners
    • Developers
    • Customers Over 8,000 successful companies
  • Free TrialRequest Demo
  • Log In
  • Free Trial
  • Request Demo
  • Log In

All Blogs

Engineering

Creating PostgreSQL Arrays Without A Quadratic Blowup

Dan Robinson
March 12, 20142 min read
  • Facebook
  • Twitter
  • LinkedIn

At Heap, we lean on PostgreSQL for most of the backend heavy lifting.[1] We store each event as an hstore blob, and we keep a PostgreSQL array of events done by each user we track, sorted by time. Hstore lets us attach properties to events in a flexible way, and arrays of events give us great performance, especially for funnel queries, in which we compute the drop off between different steps in a conversion funnel.

In this post, we’ll take a look at a PostgreSQL function that hangs on large inputs and rewrite it in an efficient, idiomatic way.

If you’re writing a PL/pgSQL function that returns an array, it can be tempting to create an empty array and build up results in a loop with [array\_append] or [array\_cat]. But, as is often the case, procedural idioms in a relational database result in bad times.

Consider an example function in which we create an array of hstores such that the entry at position i is "num=>i".

add blowup.sql git

Looks simple enough, but this is bad news. This takes a quadratic amount of time to run, blowing up to 36 seconds to generate an array of 100k elements.

blowup execution times graph

Test queries were timed on a MacBook Pro with a 2.4GHz i7 and 16 GB of RAM.

What’s going on here? It turns out the repeated calls to array\_append cause this quadratic explosion. When we call result := array\_append(result, ...), PostgreSQL allocates a new array that’s wide enough for the result of the array\_append call and then copies the data in. That is, array\_append(array, new\_element) is linear in the length of array, which makes the implementation above O(N2).

A lot of languages handle this idiom more gracefully. A common strategy is to resize the array that backs a list to double the existing size. With a list written this way, repeated appends would only require us to execute the “grow the array and copy over your data” operation a logarithmic number of times, and our amortized runtime would be linear.

So, PostgreSQL could be smarter, but this is not an idiomatic implementation, so we shouldn’t expect it to be. The correct way to do this is with [array\_agg] — an aggregate function that takes a set and returns all of the entries as an array.

This is fast, and it scales linearly. It takes 300 ms to generate an array of 100k elements — a 100x reduction. This query allows PostgreSQL to generate the complete set of results before materializing any arrays, so it doesn’t need to do so more than once. PostgreSQL can compute upfront exactly how long the resulting array needs to be and only needs to do one allocation.

Lesson learned: if you find yourself calling array_append or array_cat in a loop, use array_agg instead.

When you’re working with any relational database, you should reconsider any procedural code you find yourself writing. Also, it helps to have an intuition for how long something “should” take. Generating a 100k element array (with around one megabyte of total data) shouldn’t take thirty seconds, and, indeed, it doesn’t need to. We like learning stuff. Have any feedback or other PostgreSQL tips? Shoot us a note @heap.

[1] In particular, we use a lovely tool called Citus Data. More on that in another blog post!

Dan Robinson

Was this helpful?
PreviousNext

Related Stories

See All

  • Heap.io

    Data Stories

    Celebrating H&R Block as the inaugural winner of the Digital Innovator Award

    March 22, 2023

  • Heap.io

    Product Updates

    Introducing Heap for mobile: see Everything, Everywhere all at once

    March 14, 2023

  • Heap.io

    Data Stories

    How I shipped a mobile app without tracking and bad things™ happened

    March 15, 2023

Subscribe

Sign up to stay on top of the latest posts.

Better insights. Faster.

Request Demo
  • Platform
  • Capture
  • Enrichment
  • Integrations
  • Governance
  • Security & Privacy
  • Infrastructure
  • Illuminate
  • Segments
  • Charts
  • Dashboards
  • Playbooks
  • Use Cases
  • Funnel Optimization
  • Product Adoption
  • User Behavior
  • Product Led Growth
  • Customer 360
  • SaaS
  • eCommerce
  • Financial Services
  • Why Heap
  • The Digital Insights Platform
  • How Heap Works
  • How Heap Compares
  • The Future of Insights
  • Resources
  • Blog
  • Content Library
  • Events
  • Topics
  • Heap University
  • Community
  • Professional Services
  • Company
  • About
  • Partners
  • Press
  • Careers
  • Customers
  • Support
  • Request Demo
  • Help Center
  • Contact Us
  • Pricing
  • Social
  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

© 2023 Heap Inc. All Rights Reserved.

  • Legal
  • Privacy Policy
  • Status
  • Trust