How AI Can Make Data Modeling So Much Faster

Life could be a lot easier for technical teams in consumer businesses. As data owners, it’s their job to make customer data a valuable asset for the business, but the dominant practices are manual and repetitive.

Part of this has to do with the complexity that’s unique to customer data: the volume and velocity from across channels is massive, it’s always changing as people go through life and their preferences evolve, and it’s subject to a patchwork of regulations. Yet customer data is only valuable to a company if it’s accurate, up-to-date, secure, and accessible to business teams for appropriate use.

Adding to the data modeling and management challenge is the fact that it’s never one-and-done, but rather needs to be repeated over and over again. New data sources, new destinations, new models, new regulations, not to mention new information about existing customers, all of which results in a never ending stream of change requests from business teams. Bottom line, data assets needs constant updates and maintenance.

This collection of jobs to be done is extremely time- and resource-intensive when approached manually. The unfortunate reality is that data engineers and CTOs are used to this manual method. Most tools dealing with customer data leave data quality and fidelity upkeep to the practitioners. Properly unifying the data requires maintaining both merge rules and match rules, with lots of custom code jobs to make it work. The dominant mindset is that the only ways to approach this challenge are more elbow grease and better processes.

The good news is that the AI revolution offers approaches that can make data modeling and management more efficient, more cost effective, and more successful at driving better business results. New methods mean that it doesn’t have to be like Sisyphus pushing his boulder up the hill day in and day out.

The real time-cost of data modeling & management

It’s worth backing up to take in a full view of all the jobs and sub-jobs involved in making customer data a valuable asset.

Co-locate all of your customer and behavior data
Model and clean data
Resolve identities
Merge into unified profile
Link new identity back to behavior data
Generate business-level aggregated data
Build views for marketers (and other stakeholders)
Feed resulting customer profiles into action applications
Manage change on an ongoing basis

And these phases are really just subject headers that contain multiple sub-steps, each representing hours of work for your teams. Much of this work requires time-intensive custom coding since standard customer data tools don’t account for these capabilities.

An architecture diagram illustrating the process of data refinement and the high amount of custom code required to make it happen

Legacy data refinement needs extensive custom code.

For example, let’s zoom in on phase 2 in the list above, Model and Clean Data, to take a more granular look at the sub-tasks involved and the time they typically take to perform.

We’ll say that this is working with a data infrastructure that involves 5 identity feeds (eCommerce profiles, point of sale profiles, loyalty profiles, campaigns, clicks/views) and 5 behavior parameters (in-store purchase, online purchase, email, customer support, social media engagement).

2. Model and Clean Data

Jobs to be developed	Example level of effort per table
Convert source data models into a format that works well with identity resolution process	16 hours
Extract customer table from behavior tables if necessary	16 hours
Identify bad PII values	8 hours
Define data standardization and cleaning rules	8 hours
QA data standardization and cleaning rules	8 hours
QA data modeling	8 hours
Total per table	64 hours
Total for 5 tables	320 hours

320 hours for one phase of the process, involving six sub-steps. Each of the other eight phases has between four and ten sub-steps. A conservative estimate for the entire process would be upwards of 1,200 hours to onboard and ~200 hours per change (get in touch, I’ll be happy to walk you through it step by step), and that’s assuming there are no complications along the way.

Where AI tools can save you hours, days, weeks

One of the main reasons why the legacy method of data modeling is so labor intensive is all the custom code jobs. At several points in the process, data engineers need to write code to move data between locations, standardize data formats, and build models. For each of these areas, AI tools built for these jobs can turn a manual process into an automated one, drastically reducing the amount of manual work and the time involved.

An architecture diagram illustrating a more streamlined process of data refinement powered by AI tools

AI tools dramatically reduce custom code to make data refinement faster.

This is especially helpful since so many of these processes need to be done over and over again — data infrastructure is never set-it-and-forget-it. Beyond routine maintenance, the tech teams responsible need to repeatedly revisit key phases to account for new data sources and destinations, new models to meet new business requirements, updates in customer profiles due to new behaviors or life changes, etc.

Let’s take another look at phase 2, Model and Clean Data, and how AI automation can substantially reduce the burden on tech teams to perform manual coding jobs.

Job to be done	Legacy method - example level of effort per table	AI toolkit - example level of effort per table
Convert source data models into a format that works well with identity resolution process	16 hours	AI analyzes and annotates data with validation by user: 30 minutes
Extract customer table from behavior tables if necessary	16 hours	AI analyzes and annotates data with validation by user: 30 minutes
Identify bad PII values	8 hours	AI ID Resolution and OOTB rules identify common patterns with an option to customize: 30 minutes
Define data standardization and cleaning rules	8 hours	AI ID Resolution provides a framework, only manual work is any customizations: 30 minutes
QA data standardization and cleaning rules	8 hours	4 hours
QA data modeling	8 hours	4 hours
Total per table	64 hours	10 hours
Total for 5 tables	320 hours	44 hours

That’s an 86.25% improvement in time needed to complete this phase. Again, extrapolating this through the rest of the onboarding process and we go down from over 1200 hours to ~350 hours. There are some elements that AI automation cannot account for and still need to be taken care of by data engineers, generally anything that requires human oversight for QA or involves strategic choices about what kinds of attributes or marketing use cases to build for. But the tedium of extensive custom code jobs is taken out of the picture.

Tech teams don’t need to struggle to unlock the value of customer data

The next AI revolution for consumer businesses is in making persistent data quality a reality. There are more customer apps generating more data than ever before, and technical leaders need a shortcut to prepare and organize customer data. Rules-based approaches have never been easy or reliable, but today’s profusion of data makes them truly obsolete. In order to succeed in data modeling & management – creating and maintaining a data asset of high enough quality and easy enough access to be of value to the business – technical teams need methods that offer a dramatic boost in both efficiency and effectiveness.

Implementing an AI toolkit to replace custom code jobs and rules-based approaches means:

Faster time to value by removing complexity
Lower costs through automation
Better results from higher-quality data

Generative AI, data governance, and personalization will only be successful if businesses can build a high-quality data foundation. Accomplishing that means moving away from incremental process improvements for cumbersome legacy methods and instead taking advantage of advancements in AI tooling for customer data. It will make the work far faster and the outcome much more valuable.

Learn more about how an AI toolkit can save you massive time and effort on data refinement and management with Amperity Customer Data Cloud.