October 31, 2024 | 3 min read

How AI Can Make Data Modeling So Much Faster

Building and maintaining a customer data asset that adds value to the business doesn’t have to be a slog of custom jobs.

A man looking up in wonder at several screens that depict AI-powered tools to make customer data management more efficient and streamlined.

Life could be a lot easier for technical teams in consumer businesses. As data owners, it’s their job to make customer data a valuable asset for the business, but the dominant practices are manual and repetitive. 

Part of this has to do with the complexity that’s unique to customer data: the volume and velocity from across channels is massive, it’s always changing as people go through life and their preferences evolve, and it’s subject to a patchwork of regulations. Yet customer data is only valuable to a company if it’s accurate, up-to-date, secure, and accessible to business teams for appropriate use. 

Adding to the data modeling and management challenge is the fact that it’s never one-and-done, but rather needs to be repeated over and over again. New data sources, new destinations, new models, new regulations, not to mention new information about existing customers, all of which results in a never ending stream of change requests from business teams. Bottom line, data assets needs constant updates and maintenance

This collection of jobs to be done is extremely time- and resource-intensive when approached manually. The unfortunate reality is that data engineers and CTOs are used to this manual method. Most tools dealing with customer data leave data quality and fidelity upkeep to the practitioners. Properly unifying the data requires maintaining both merge rules and match rules, with lots of custom code jobs to make it work. The dominant mindset is that the only ways to approach this challenge are more elbow grease and better processes. 

The good news is that the AI revolution offers approaches that can make data modeling and management more efficient, more cost effective, and more successful at driving better business results. New methods mean that it doesn’t have to be like Sisyphus pushing his boulder up the hill day in and day out. 

The real time-cost of data modeling & management

It’s worth backing up to take in a full view of all the jobs and sub-jobs involved in making customer data a valuable asset. 

  • Co-locate all of your customer and behavior data

  • Model and clean data

  • Resolve identities

  • Merge into unified profile

  • Link new identity back to behavior data

  • Generate business-level aggregated data

  • Build views for marketers (and other stakeholders)

  • Feed resulting customer profiles into action applications

  • Manage change on an ongoing basis 

And these phases are really just subject headers that contain multiple sub-steps, each representing hours of work for your teams. Much of this work requires time-intensive custom coding since standard customer data tools don’t account for these capabilities.

An architecture diagram illustrating the process of data refinement and the high amount of custom code required to make it happenLegacy data refinement needs extensive custom code.

For example, let’s zoom in on phase 2 in the list above, Model and Clean Data, to take a more granular look at the sub-tasks involved and the time they typically take to perform. 

We’ll say that this is working with a data infrastructure that involves 5 identity feeds (eCommerce profiles, point of sale profiles, loyalty profiles, campaigns, clicks/views) and 5 behavior parameters (in-store purchase, online purchase, email, customer support, social media engagement).

2. Model and Clean Data

Jobs to be developed

Example level of effort per table

Convert source data models into a format that works well with identity resolution process

16 hours

Extract customer table from behavior tables if necessary

16 hours

Identify bad PII values

8 hours

Define data standardization and cleaning rules

8 hours

QA data standardization and cleaning rules

8 hours

QA data modeling

8 hours

Total per table

64 hours

Total for 5 tables

320 hours

320 hours for one phase of the process, involving six sub-steps. Each of the other eight phases has between four and ten sub-steps. A conservative estimate for the entire process would be upwards of 1,200 hours to onboard and ~200 hours per change (get in touch, I’ll be happy to walk you through it step by step), and that’s assuming there are no complications along the way.  

Where AI tools can save you hours, days, weeks

One of the main reasons why the legacy method of data modeling is so labor intensive is all the custom code jobs. At several points in the process, data engineers need to write code to move data between locations, standardize data formats, and build models. For each of these areas, AI tools built for these jobs can turn a manual process into an automated one, drastically reducing the amount of manual work and the time involved.  

An architecture diagram illustrating a more streamlined process of data refinement powered by AI toolsAI tools dramatically reduce custom code to make data refinement faster.

This is especially helpful since so many of these processes need to be done over and over again — data infrastructure is never set-it-and-forget-it. Beyond routine maintenance, the tech teams responsible need to repeatedly revisit key phases to account for new data sources and destinations, new models to meet new business requirements, updates in customer profiles due to new behaviors or life changes, etc. 

Let’s take another look at phase 2, Model and Clean Data, and how AI automation can substantially reduce the burden on tech teams to perform manual coding jobs. 

Job to be done

Legacy method - example level of effort per table

AI toolkit - example level of effort per table

Convert source data models into a format that works well with identity resolution process

16 hours

AI analyzes and annotates data with validation by user: 30 minutes

Extract customer table from behavior tables if necessary

16 hours

AI analyzes and annotates data with validation by user: 30 minutes

Identify bad PII values

8 hours

AI ID Resolution and OOTB rules identify common patterns with an option to customize: 30 minutes

Define data standardization and cleaning rules

8 hours

AI ID Resolution provides a framework, only manual work is any customizations: 30 minutes

QA data standardization and cleaning rules

8 hours

4 hours

QA data modeling

8 hours

4 hours 

Total per table

64 hours

10 hours

Total for 5 tables

320 hours

44 hours

That’s an 86.25% improvement in time needed to complete this phase. Again, extrapolating this through the rest of the onboarding process and we go down from over 1200 hours to ~350 hours. There are some elements that AI automation cannot account for and still need to be taken care of by data engineers, generally anything that requires human oversight for QA or involves strategic choices about what kinds of attributes or marketing use cases to build for. But the tedium of extensive custom code jobs is taken out of the picture. 

Tech teams don’t need to struggle to unlock the value of customer data

The next AI revolution for consumer businesses is in making persistent data quality a reality. There are more customer apps generating more data than ever before, and technical leaders need a shortcut to prepare and organize customer data. Rules-based approaches have never been easy or reliable, but today’s profusion of data makes them truly obsolete. In order to succeed in data modeling & management – creating and maintaining a data asset of high enough quality and easy enough access to be of value to the business – technical teams need methods that offer a dramatic boost in both efficiency and effectiveness. 

Implementing an AI toolkit to replace custom code jobs and rules-based approaches means:

  • Faster time to value by removing complexity 

  • Lower costs through automation

  • Better results from higher-quality data

Generative AI, data governance, and personalization will only be successful if businesses can build a high-quality data foundation. Accomplishing that means moving away from incremental process improvements for cumbersome legacy methods and instead taking advantage of advancements in AI tooling for customer data. It will make the work far faster and the outcome much more valuable. 

Get in touch to find out more about how an AI toolkit can save you massive time and effort on data refinement and management.