NEW: Heap for mobile. Track every interaction, on every platform.

Learn more
skip to content
Loading...
    • The Digital Insights Platform Transform your digital experience
    • How Heap Works A video guide
    • How Heap Compares Heap vs. competitors
    • The Future of Insights A comic book guide
  • Data Insights

    • Session Replay Complete context with a single click
    • Illuminate Data science that pinpoints unknown friction
    • Journeys Visual maps of all user flows

    Data Analysis

    • Segments User cohorts for actionable insights
    • Dashboards Share insights on critical metrics
    • Charts Analyze everything about your users
    • Playbooks Plug-and-play templates and analyses

    Data Foundation

    • Capture Automatic event tracking and apis
    • Mobile Track and analyze your users across devices
    • Enrichment Add context to your data
    • Integrations Connect bi-directionally to other tools

    Data Management

    • Governance Keep data clean and trusted
    • Security & Privacy Security and compliance made simple
    • Infrastructure How we build for scale
    • Heap Connect Send Heap data directly to your warehouse
  • Solutions

    • Funnel Optimization Improve conversion in user flows
    • Product Adoption Maximize adoption across your site
    • User Behavior Understand what your users do
    • Product Led Growth Manage PLG with data

    Industries

    • SaaS Easily improve acquisition, retention, and expansion
    • eCommerce Increase purchases and order value
    • Financial Services Raise share of wallet and LTV

    Heap For Teams

    • Product Teams Optimize product activation, conversion and retention
    • Marketing Teams Optimize acquisition performance and costs
    • Data Teams Optimize behavioral data without code
  • Pricing
  • Support

    • Heap University Video Tutorials
    • Help Center How to use Heap
    • Heap Plays Tactical how-to guides
    • Heap Updates
    • Professional Services

    Resources

    • Blog A community for digital builders
    • Content Library Ebooks, whitepapers, videos, guides
    • Press News from and about Heap
    • Webinars & Events Virtual and live events
    • Careers Join us

    Ecosystem

    • Customer Community Join the conversation
    • Partners Technology and Solutions Partners
    • Developers
    • Customers Over 8,000 successful companies
  • Free TrialRequest Demo
  • Log In
  • Free Trial
  • Request Demo
  • Log In

All Blogs

Product Insights

The 4 Flavors of Untrustworthy Data

Kevin Moyer
July 11, 20197 min read
  • Facebook
  • Twitter
  • LinkedIn
Heap

For too many product teams, a data-first approach to decision-making is more of a soundbite than a reality. There are a number of reasons why data-first strategies never materialize to begin with. Lack of resources, lack of tooling, lack of direction, and treating data as a project rather than a foundation are all blockers to getting off the ground. But even when an organization gets all of these things right, one culprit can bring the whole thing down: untrustworthy data.

Untrustworthy data is the root of all evil when it comes to a data-driven product strategy. One of my favorite explanations of this phenomenon comes from Brian Balfour. He calls it the Data Wheel of Death. The basic premise is this: when data isn’t trustworthy, teams use the data less. When teams use data less, it gets de-prioritized and grows stale. When data grows stale, it becomes less trustworthy, and the cycle continues. (You can watch a video of Brian talking about this phenomenon here.) But what does “untrustworthy data” actually mean?

This post will explore the four types of untrustworthy data, specifically as it relates to behavioral product data, meaning data that tells you about users and the things they do within a website or application. At Heap, we have a large number of clients who have partnered with us specifically as a means for solving this problem.

Across all of the teams we’ve spoken to that have dealt with data woes, “untrustworthy data” falls into 4 categories:

  • Stale Data

  • Unclear Data

  • Inaccurate Data

  • No Data

Stale Data

Stale Data means data that is out of date, and no longer being collected.

There’s a sinking feeling that comes with running a report and seeing a flatline: a single line that holds steady at 0 for the entire time period in the report.

What do you do next?

Surely, this event did something at some point. Was it in reference to an old version of the feature that has been deprecated? Is the tracking code broken? Am I looking in the wrong place? It’s usually hard to tell why the data doesn’t exist, and the situation can be an annoying blocker to making a product decision with truth. All too often, we hear about PMs giving up on the data altogether, and making a decision based on gut feeling.

This problem grows worse when an analytics environment is riddled with a lot of stale events; users are less likely to get their hands dirty exploring the data if they keep hitting a flat “0” line.

The root cause of the stale data problem is typically lack of process around tracking product data. More specifically, this happens when there is no maintenance of old events and no effort put into updating a tracking plan as the product evolves.

In a paradigm that involves tracking code and manual instrumentation, it’s common that event tracking whack-a-mole takes precedent, as new features and use cases pop up, while the effort involved in cleaning data moves to the back burner.

Unclear Data

Raise your hand if you’ve dealt with an analytics environment that has multiple versions of the same data point. Something like “Signup”, “Sign Up”, “signup”, and “Signup – NEW” might all be separate events, even though they would seem to tell you the same thing.

Which one is the correct one? Are the other ones “right”, but tell some different version of the story? Even if you know which version is correct, how can you be sure your teammates do? Data points that don’t clearly refer to one specific thing are common in product analytics, and the above example is just one version of this tricky problem.

Unclear data tends to lead to two bad outcomes.

  • In one scenario, a user ends up analyzing an event that does not tell them what they think it does. This situation is obviously pretty nasty, and in the worst case scenario, leads to a decision being made with wrong data.

  • On the other hand, even in the best case scenario, the user may choose the “right” event, but only those close to the implementation typically have enough confidence in the event to actually use it. For everyone else, trust in the entire dataset is eroded, even if everything else is perfectly correct.

When events are manually instrumented, this is very hard to avoid. A rigid, code-generated dataset tends to be a hotbed for unclear event data, mostly because implementation of new events is managed through code, and by only a small number of people.

Any inconsistency, duplication, or poor naming convention starts as a quick decision, a simple mistake, or someone saying “good enough”, but the problems that result tend to grow in scope and sneakily infect an entire analytics environment over time.

Inaccurate Data

Inaccurate data is a problem that comes in 2 varieties:

  • Data that is inaccurate and you know it.

  • Data that is inaccurate, but you have no idea.

Inaccurate data of both kinds lead to the same two bad outcomes as the “unclear data” problem mentioned above: decrease in trust of the data and faulty conclusions based on misleading data.

Let’s start with the type of inaccurate data that is obvious. This is the lesser of two evils, but an annoying obstacle nonetheless. If you’re someone who has used an analytics solution in the past, chances are at some point you’ve looked at a report and thought something like “There’s no chance that only 30 people viewed our homepage last month” or on the other end of the spectrum, “Oh nice, it looks like everyone on Earth clicked our new call-to-action…twice.”

The depth of these inaccuracies is hard to quantify (Are the numbers just a little bit off, or am I looking at the wrong thing entirely?), but also typically hard to fix, especially when events are created by a small team of engineers operating on their own. Obviously inaccurate data, like unclear data, doesn’t always lead to wrong conclusions, but almost always leads to decreased adoption and usage of product data.

The second version of inaccurate data is the most formidable of all. When data is sneakily inaccurate, it can produce business decisions based in complete fallacy.

Imagine if you thought one call-to-action at the bottom of your app’s homepage was outperforming another similar one at the top of the page. You got this information from a report that clearly showed that the second CTA was more commonly clicked, so you decide to deprecate the top one.

You never find out, but the inverse was actually true – the top CTA was more effective than the second one. Maybe the event names got mixed up during implementation, or maybe the tracking code on the top CTA was flawed, and not every occurrence was logged.

The potential causes are many, but the scariest part is that you will probably never even know you were wrong.

No Data

Simply not having the data required to answer a business question is one of the most common challenges that teams run into when faced with a business question.

Too many times, an analytics implementation is treated as a one-time project with an end date. Early on, everyone expects that they’ll always have the necessary information for every question that will ever pop up once the project is complete. After all, you’ve spent time scoping the requirements, building a tracking plan, and working with engineering to implement it all.

Then you hit a brick wall – now what?

This is when the one-time implementation comes back to bite. It’s common to not have processes in place for instrumenting new events when gaps in the dataset inevitably pop up. Instead, teams end up in an endless cycle of event tracking whack-a-mole, where new requests get put into an engineering queue and eventually end up in a sprint. Once the new events get implemented, data still needs to build up, and by the time the question can be answered, it might not even be relevant anymore.

At this point, you might have a new set of questions that can’t be answered, and the cycle continues.

Addressing These Challenges

These challenges are tough and pervasive. So the question becomes: “How can any team manage to avoid the seemingly inevitable pitfalls of product analytics?” There are generally 3 approaches that teams take when they want to get ahead of these issues.

  1. Putting resources and money towards preventing and fixing the problem

  2. Spending a lot of time on the problem

  3. Implementing a virtual dataset

Putting resources and money towards preventing and fixing the problem

For some large companies, the best approach is to staff the problem with lots of resources and money. For these organizations, there are hundreds of engineers and countless resources dedicated to maintaining a clean and complete dataset. When humans spend their days collecting, monitoring, cleaning, and analyzing the data at this scale, you typically end up with a pretty useful set of information. The reality is that, for the vast majority of companies, this approach isn’t feasible.

Spending a lot of time on the problem

The second approach makes sense for a broader set of companies, specifically those who aren’t able to staff hundreds of engineers on analytics. This approach consists of spending lots of time planning, implementing, and building processes for updating and cleaning the data. In this scenario, the intention is usually to focusing the entire company on customer data, but when fatigue sets in, it’s common to see other priorities take precedent.

Implementing a virtual dataset

Some teams choose to implement a virtual dataset, meaning they capture all event data up front, without any manual tracking code, and then later on, pick and choose which events they’d like to analyze. The benefit of a virtual dataset is that there are not nearly as many resources needed to track all of the information, and the time window from implementation to insight is much smaller. Plus, since the data is available retroactively, there’s no need to wait for it to build up before a question can be answered.

At Heap, we care about helping teams make business decisions with truth. This means that we spend a lot of time thinking about how to help teams avoid all flavors of untrustworthy data. If you’d like to hear more about how we can help you, reach out to us at sales@heap.io!

Kevin Moyer

Was this helpful?
PreviousNext

Related Stories

See All

  • Google Analytics 4

    Product Insights

    Google Analytics 4: What it promises, and what that really means

    April 28, 2022

  • Heap.io

    How to

    The 3 key first steps to improving CRO

    March 29, 2023

  • Heap.io

    Data Stories

    Celebrating H&R Block as the inaugural winner of the Digital Innovator Award

    March 22, 2023

Subscribe

Sign up to stay on top of the latest posts.

Better insights. Faster.

Request Demo
  • Platform
  • Capture
  • Enrichment
  • Integrations
  • Governance
  • Security & Privacy
  • Infrastructure
  • Illuminate
  • Segments
  • Charts
  • Dashboards
  • Playbooks
  • Use Cases
  • Funnel Optimization
  • Product Adoption
  • User Behavior
  • Product Led Growth
  • Customer 360
  • SaaS
  • eCommerce
  • Financial Services
  • Why Heap
  • The Digital Insights Platform
  • How Heap Works
  • How Heap Compares
  • The Future of Insights
  • Resources
  • Blog
  • Content Library
  • Events
  • Topics
  • Heap University
  • Community
  • Professional Services
  • Company
  • About
  • Partners
  • Press
  • Careers
  • Customers
  • Support
  • Request Demo
  • Help Center
  • Contact Us
  • Pricing
  • Social
  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

© 2023 Heap Inc. All Rights Reserved.

  • Legal
  • Privacy Policy
  • Status
  • Trust