To turn customer data into business value, you need data quality and governance — the key questions are: Is the data accurate? And can the right people access and use it while still keeping it secure? With the tech and approaches available over the past decade, too often the answers have been no.
The Customer Data Platform (CDP) was supposed to facilitate that value, but CDP has historically been a loose category. There have been different types of CDPs, each offering their own way to make customer data work yet still having limitations. Some didn’t fit in well with the existing stack, some didn’t create sufficient data quality, some took too long to work, etc.
Even with all the variation, we can categorize the different approaches into three eras. Earlier waves of CDP development may not have solved all the challenges, but they did highlight the gaps and gotchas in getting value from customer data, pointing the way to a more effective approach.
At a glance:
First era: Packaged CDP, a pre-built, end-to-end suite of customer data tools bundled together for marketers, designed to collect, unify, analyze, and activate customer data but often becoming another silo.
Second era: Composable CDP, a modular approach to CDP using a la carte tools connected to a data warehouse to replicate Packaged CDP functionality, offering flexibility but leaving the hardest task — achieving data quality — up to the user to solve on their own.
Now: Lakehouse CDP, a suite of data quality and governance tools that can access and share live Lakehouse data across an ecosystem without replication. Instead of relying on complex business logic, a Lakehouse CDP unifies and enriches customer data in a Lakehouse without code for activation, analytics, and AI use cases. This combines the benefits of the previous approaches without any of the drawbacks.
Taking a look at the phases of CDP evolution makes it easy to understand where the tech for getting value from customer data has been, and where it’s going.
First wave: Packaged CDP
A Packaged CDP is a one-stop-shop SaaS platform for end-to-end customer data management, bundling up connector libraries, customer data features, and marketer interfaces. It’s designed to be deployed and configured quickly, often with minimal IT involvement. Functionality is bundled together and tailored for specific use cases or industries, making it easier to set up and operate.
The drawback of a Packaged CDP is that they were not designed as part of a broader enterprise data management strategy. Many have fixed schemas which make it difficult to onboard, requiring multiple ETLs to load data. Once data is onboarded, these CDPs operate as a “black box” where much of the data generated is copies of other data and not directly accessible by other tools. This ultimately creates another silo of customer data.
Plus, many started as other tools like tag managers, event routers, or activation tools, and didn’t have the expertise needed to properly unify data and deliver on data quality.
Second wave: Composable CDP
Composable CDPs aimed to help businesses eliminate data silos by building a CDP around a data warehouse. Most businesses recognized that they only needed some of the functionality in a Packaged CDP, so instead of purchasing a whole platform, businesses could purchase “best-in-class” components.
Core components include ETL and reverse ETL tooling. ETL tools load data into the data warehouse. Reverse ETL tools could help marketers access and activate data with business-friendly interfaces for segmentation and journey orchestration. The composable approach promises cost savings as businesses only buy the tools they need, as well as not duplicating data.
But it often ends up involving hidden costs because of the important tasks that component tools don’t account for which then get pushed back on the user. Making components work together requires deep SQL knowledge to format data for a Reverse ETL tool and data casting to prepare data for each downstream activation tool, so it’s not nearly as automatic as its proponents would claim.
Composable CDPs de-emphasize the importance of building a unified foundation because of the time and resources needed to transform raw customer data into a usable asset. Data teams are left to normalize, resolve identities, and aggregate data, which can take hundreds of hours of effort and drive up compute costs in a data warehouse.
Now: Lakehouse CDP
This new approach answers the persistent challenges of Packaged and Composable CDPs by being able to improve data quality for use across the whole stack, without creating any more silos.
It starts from the recognition that Data Lakehouses combine the best elements of Data Warehouses and Data Lakes — able to handle both raw and structured data, able to perform analytics with both SQL and AI/ML — and thus make the best data storage solution around which to build a stack.
A Lakehouse CDP plugs directly into the Lakehouse and uses data sharing capabilities to unify and enrich customer data. It also serves as both ETL and rETL, and native connectors move enriched data throughout the stack. The CDP unifies all of its capabilities on an enterprise-ready platform, but each capability is composable, allowing customers to pick and choose what they want to build with.
Meanwhile, a pre-built, comprehensive Customer Data Operations suite helps everything function properly, including workflow monitoring, dependency validation, role-based access, and dev & staging environments with sandboxes for testing and version control rollbacks. Each of these capabilities is built on a common platform to ensure governance across the stack.
This combination of data quality, governance, connectivity, and built-in maintenance workflows incorporates the benefits of the earlier two waves of CDP development without any of the drawbacks.
How the Lakehouse CDP makes it easier to get value from customer data
Amperity pioneered the Lakehouse CDP in order to make good on the promise of composability by allowing users to build the tech stack they want without sacrificing on data quality. In practice, this is anchored on four problem areas it was critical to solve for after watching how CDPs have evolved over the past ten years:
Automating identity resolution. It needs to be easy to maintain high data quality in your Lakehouse. AI-powered ID resolution lets data teams unify raw customer data and produce a stable, universal identifier.
Building data assets quickly. It can’t take a long time to shape data for activation. Pre-built industry- and use case-specific data assets can be easily shared with and enriched in a Lakehouse.
Syncing enriched data to any tool. With business-friendly reporting and reverse ETL tools, business users can easily access and activate high-quality data from the CDP or the Lakehouse.
Keeping data secure and flowing. Safely share data between a Lakehouse and the CDP without replication to track every data transformation and get advice on how to resolve customer data errors.
A Lakehouse CDP provides the flexibility of a Composable CDP and the infrastructure of a Packaged CDP. This new approach helps data teams choose how to store and process data to improve data quality in a Lakehouse and lower the total cost of ownership. On a day to day basis, it means less time spent integrating and managing connections in and out of a central data storage hub, faster time to value, and fewer headaches. It’s taken some time for the CDP space to evolve to the point where the original goal of getting value out of customer data could be easily achieved. The good news is that the wait is over.
Watch a demo of the Amperity Lakehouse CDP in action.