Unlock 2025 Benchmark data → Access insights to stay ahead in the digital experience race.

Get the Report
skip to content
Loading...
    • Why Product Analytics And what can it do for you?
    • How Heap Works A video guide
    • How Heap Compares Heap vs. competitors
    • Product Analytics + Digital Experience Analytics A deeper dive
    • The Future of Insights A comic book guide
    Watch a Demo
  • Data Insights

    • Journeys Visual maps of all user flows
    • Sense AI Analytics for everyone
    • Web Analytics Integrate key web metrics
    • Session Replay Complete context with a single click
    • Heatmaps Visualize user behavior instantly
    • Heap Illuminate Data science that pinpoints unknown friction

    Data Analysis

    • Segments User cohorts for actionable insights
    • Dashboards Share insights on critical metrics
    • Charts Analyze everything about your users
    • Playbooks Plug-and-play templates and analyses

    Data Foundation

    • Capture Automatic event tracking and apis
    • Mobile Track and analyze your users across devices
    • Enrichment Add context to your data
    • Integrations Connect bi-directionally to other tools

    Data Management

    • Governance Keep data clean and trusted
    • Security & Privacy Security and compliance made simple
    • Infrastructure How we build for scale
    • Heap Connect Send Heap data directly to your warehouse
  • Solutions

    • Funnel Optimization Improve conversion in user flows
    • Product Adoption Maximize adoption across your site
    • User Behavior Understand what your users do
    • Product Led Growth Manage PLG with data

    Industries

    • SaaS Easily improve acquisition, retention, and expansion
    • Retail and eComm Increase purchases and order value
    • Healthcare Build better digital patient experiences
    • Financial Services Raise share of wallet and LTV

    Heap For Teams

    • Product Teams Optimize product activation, conversion and retention
    • Marketing Teams Optimize acquisition performance and costs
    • Data Teams Optimize behavioral data without code
  • Pricing
  • Support

    • Heap University Video Tutorials
    • Help Center How to use Heap
    • Heap Plays Tactical how-to guides
    • Professional Services

    Resources

    • Down the Funnel Our complete blog and content library
    • Webinars & Events Events and webinar recordings
    • Press News from and about Heap
    • Careers Join us

    Ecosystem

    • Customer Community Join the conversation
    • Partners Technology and Solutions Partners
    • Developers
    • Customers Stories from over 9,000 successful companies
  • Free TrialRequest Demo
  • Log In
  • Free Trial
  • Request Demo
  • Log In

All Blogs

The 4 biggest challenges you face as a Data Engineer, and how to solve them

Garrett McClintock
April 19, 20235 min read
  • Facebook
  • Twitter
  • LinkedIn

As a Data Engineer, you know how crucial it is to have reliable customer data. Without it, it’s almost impossible to do your job! But capturing clean behavioral data is often easier said than done. It usually requires complicated manual work. And that - nearly always - leads to human error and corrupted data.

In our experience working with hundreds of data teams, we’ve seen four challenges consistently get in the way. Even if they’re not your fault (they’re usually not!), these still tend to be the things that most prevent data engineers from doing their best work. 

Let’s talk about them, and address some possible solutions.

Challenge #1: Your data collection process isn’t scalable 

Considering everything on your plate, manual collection probably isn’t your favorite thing. For one, it’s time-consuming. First, you have to define everything upfront. Then you have to make sure you have consistent tagging schemas. After that, you get the joy of implementing tags. It might not be the hardest work in the world, but one missed or broken tag can corrupt your entire dataset.

You also face challenges with scalability. As data volumes increase, manual data collection and management become more impractical. As your business grows, so does the number of elements that need to be tagged. And all that work falls on … you. Even a little mistake could accidentally duplicate data or cause a major data gap. If someone in the org makes a decision based on that bad data, it could have serious consequences for everybody.   

Challenge #2: Data silos keep multiplying, and it's on you to connect the dots

These days, each function within an org typically has a preferred tool for tracking and reporting on performance. Each of those systems becomes its own data silo. A data silo that’s now your problem to fix. If you don’t, teams might not have access to the same information. That leads to misalignment, poor collaboration, and slowed decision-making. Once again, the fate of the business rests on your shoulders.

You need to break down those silos and give your org a trustworthy, single source of truth. Of course, that’s tricky. Different systems or tools might be duplicating the same data. On top of that, each application might have different naming conventions. So before any data transformation can begin, you have to go through the time-consuming process of identity resolution.

And as with all of these risks, more data = more problems. As your business scales, typically so does its tech stack. With each new data source, identity resolution becomes more complex. Now you need to complete a much larger internal mapping of what’s what so you can clearly identify what data goes where and why. 

Ultimately, you’ll need more and more custom ETL pipelines to join data together. And those introduce even more risks, which brings us to our next point.

Challenge #3: Your custom ETL pipelines are a struggle to maintain

Consolidating your data can be just as challenging as obtaining it. Before your data can be loaded into your warehouse, you have to connect it in a standardized schema. ETL tools can assist with this, but you’ll need to build a custom pipeline to join everything together. 

Custom ETL pipelines can be a major bottleneck in your data processing workflow. If your pipeline is slow or unreliable, downstream teams won’t have access to the data they need. If something goes wrong with a pipeline, you could end up wasting days trying to identify the issue. Especially if you weren’t there for the initial build so you aren’t completely sure how data was set up to flow through.

As time passes, custom ETL pipelines create even more challenges for you. As source data changes, you may need to update your pipeline logic to ensure it’s still handling data correctly. A problem made once again harder if you aren’t familiar with the pipeline’s configuration. If you don’t reconfigure the pipeline it might miss out on capturing necessary data because it wasn’t built to handle the changes. 

Challenge #4: The burden of answering everyone’s questions falls on your SQL expertise

Now that you’ve sent your data downstream, it’s time for segmentation and analysis. The problem is, that usually requires SQL. And not many people know SQL. So once again, this falls on you. This is where things can get really bottlenecked. While the queries are typically straightforward, the queues of requests can grow massive. That means you’re having to spend most of your time dealing with these simple requests. With all of your technical expertise, does this really feel like the best use of your time?

Like any manual process, it’s easy to make a mistake with SQL. But even little mistakes can add hours or even days to the time it takes for you to fulfill a request. Let’s say the query you just finished writing takes 4 hours to execute. And when it’s finally done, you realize you had an error in your syntax. After you fix that mistake, it takes another 4 hours to execute again. You’ve now wasted a whole day trying to get the requester their data.

Teams need to be able to answer their own questions, in near real-time. Instead, only a few motivated requesters will end up with the data they need. It’s not your fault. You’re only human. But ultimately teams will miss key learnings on their user segments. Those missed learnings become missed opportunities for optimization and personalization. And those missed opportunities become missed revenue.

Again, this isn’t your fault. But still, does it really have to be this way? 

How to conquer your challenges

Historical data solutions aren’t cutting it anymore. Luckily, new solutions like Heap have entered the market. And they’re ready to help make your job a whole lot easier.

Take it from our implementation partners over at Brooklyn Data Co. They’ve heard from Data Teams from a wide range of industries about the challenges of legacy data solutions. According to Scott Breitenother, CEO of Brooklyn Data Co, almost everyone shares the same pains.

As Scott describes, "For many of our clients capturing rich, reliable event data means it can be a daunting task involving front-end developers or custom pipelines. What draws them to tools like Heap is their ability to capture events automatically and send them straight to a data warehouse like Snowflake. We think of tools like Heap as the easy button, accelerating time to value and reducing ongoing maintenance.”

Do you want to streamline data capture and transformation, and sync everything to your warehouse in just one click? Then it’s time to explore a solution like Heap. 

Visit our Heap for Data Teams page to learn more about how we solve these problems and free you up to do great work.

Garrett McClintock, Analytics Engineering Manager at Heap

Was this helpful?
PreviousNext

Related Stories

See All

  • Creative visualization of AI CoPilot capability
    article

    Heap announces new generative AI CoPilot

    Heap, the leader in product analytics, unveils AI CoPilot’s open beta today.

  • Heap.io
    article

    What’s Next in Experience Analytics?

    What does the future of analytics hold, and what does it mean for you?

  • Heap.io
    article

    Building a Retention Strategy, Part 2: Connecting Activities to Revenue with a Metrics Tree

    If you read one post from this series, it should be this one.

Better insights. Faster.

Request Demo
  • Platform
  • Capture
  • Enrichment
  • Integrations
  • Governance
  • Security & Privacy
  • Infrastructure
  • Heap Illuminate
  • Segments
  • Charts
  • Dashboards
  • Playbooks
  • Use Cases
  • Funnel Optimization
  • Product Adoption
  • User Behavior
  • Product Led Growth
  • Customer 360
  • SaaS
  • Retail and eComm
  • Financial Services
  • Why Heap
  • Why Product Analytics
  • How Heap Works
  • How Heap Compares
  • ROI Calculator
  • The Future of Insights
  • Resources
  • Blog
  • Content Library
  • Events
  • Topics
  • Heap University
  • Community
  • Professional Services
  • Company
  • About
  • Partners
  • Press
  • Careers
  • Customers
  • DEI
  • Support
  • Request Demo
  • Help Center
  • Contact Us
  • Pricing
  • Social
    • Twitter
    • Facebook
    • LinkedIn
    • YouTube

© 2025 Heap Inc. All Rights Reserved.

  • Legal
  • Privacy Policy
  • Status
  • Trust