Stanislav Petrov is a seasoned Senior Data Scientist and Machine Learning Engineer specializing in marketing measurement, recommendation systems, and forecasting with over 6 years of experience. Currently at Capital.com, he focuses on customer lifetime value prediction, time series anomaly detection, and marketing mix modeling.

Introduction

Things are changing faster than ever today. We become the witnesses of technological advancements, shifts in business paradigms, and cultural transformations. Predicting what will happen next is an ultimate task that needs a mix of different approaches. In customer relationship management, knowing and predicting the value of each customer over their lifetime becomes an issue of great importance.

Businesses are always seeking better ways to create value and refine their offerings to attract customers and turn a profit. Many companies regularly use CLV to shape their marketing strategies and gauge their overall success. Understanding how much net benefit a company can expect from its customers is key. Thus, CLV has become a cornerstone of many companies’ strategies. It reflects both present and future customer value. In this article, I’ll discuss the fundamentals of employing machine learning techniques to forecast future customer lifetime value.

What is Customer Lifetime Value

To put it simply, CLV represents the total value a customer brings to a company over their entire relationship. This concept has been discussed extensively in customer relationship management literature recently. It’s calculated by multiplying the average transaction value by the number of transactions and the retention time period:

CLV = Average Transaction Value × Number of Transactions × Retention Time Period

Let us bring some examples. Suppose you own a coffee shop where the average customer spends $5 per visit, and they visit your shop twice a week, on average, for a period of 2 years. Here’s how you would calculate the CLV:

CLV = $5 (average transaction) x 2 (visits per week) x 52 (weeks in a year) x 2 (years) = $1040 CLV

Calculating your CLV is important as it helps to obtain valuable insights into many business tasks:

CLV assists in optimising advertising spend to ensure a positive return on investment (ROI) and prevent overspending.
Insights into customer churn patterns empower marketing teams to create personalised campaigns aimed at improving customer retention.
CLV guides sales teams in acquiring customers effectively, ensuring that acquisition costs align with customer value.
A deeper understanding of customer behaviour enables support teams to boost engagement and satisfaction, leading to increased customer loyalty and CLV.
Predicting future revenues and customer behaviour aids in realistic business planning, allowing for setting revenue targets and acquisition strategies.

Machine Learning for Predicting CLV

Using machine learning to analyse and predict customer value can offer a more accurate insight into future business prospects. By scrutinising customer purchasing behaviour and interactions, algorithms can effectively interpret this data to forecast future trends.

Key Information for Data Analysis

Now let us take a look at what data we need to have for analysis. First of all, it is transactional and historical data, it needs to be available for detailed probabilistic modelling. Essential data points include invoice details, product information, customer identifiers, and attributes like location.

Further on I will focus on machine learning, exploring its application by cohorts and by users.

Predicting CLV with Machine Learning

Now let’s explore two ways to predict CLV using machine learning: by cohorts and by users.

The fundamental difference between these approaches is that in the first, we form cohorts of users based on a certain characteristic (e.g., users who registered on the same day). In the second, we do not create such groups and treat each user individually. The advantage of the first approach is that we can achieve greater prediction accuracy. But there is a downside: the thing is that we must fix the characteristic by which we group users into cohorts. In the second approach, it is generally more challenging to predict the CLV of each user accurately; however, this method allows us to analyse the predicted CLV data based on various characteristics (e.g., user’s country of origin, registration day, the advertisement they clicked on, etc.).

It is also worth mentioning that CLV predictions are rarely made without a time constraint. A user can experience several “lifetimes” throughout their lifecycle, so CLV is usually considered over a specific period, such as 30, 90, or 365 days.

By Cohorts

One of the most common ways to form user cohorts is by grouping them based on their registration day. This allows us to frame the task of predicting CLV as a time series prediction task. Essentially, our time series will represent the CLV of users over past periods, and the task will be to predict (extend) this time series into the future.

In this case, simple time series prediction methods like moving average prediction can be helpful.

Alternatively, we can build a regression model that takes into account various factors such as:

1. The number of registered users

2. The number of users who converted to purchase

3. Market factors, if any (Black Friday, weekends, holidays)

4. Marketing expenditures

5. Seasonality

6. Other factors affecting the demand for your product

There is also the option to build a hierarchical predictive model that allows predicting CLV at multiple levels, such as country and region. One library that addresses this problem is Nixtla.

By Users

Buy Till You Die (BTYD)

Within this approach, we model two processes:

1) The repeat purchase process while the customer is active

2) The churn process, which essentially marks the end of the customer’s life cycle

Depending on the distributions underlying these processes, such models include:

1. Beta-Geometric/Negative Binomial (BG/NBD) model

2. Pareto/Negative-Binomial Distribution

3. Gamma-Gamma Model

Typically, the input data for these models consist of a dataset with the following columns:

– Recency – How recently did the customer purchase?

– Frequency – How often do they purchase?

– Monetary Value – How much do they spend?

– Time (Age) – The age of the customer since their first purchase to the current date

These models are implemented in Lifetimes and PyMC-marketing libraries.

Treating CLV Prediction as a Regression Task

When predicting by users, we can also build a predictive model that forecasts the CLV of a user. Here, we can use data that describe the user:

1. Purchases

2. Behaviour on the website (if available)

3. Behaviour before registration (e.g., the user registered after viewing ad X)

4. Socio-demographic indicators

Features from the cohort model can also be used, as the registration day is a factor that describes the user.

If we treat the CLV prediction task as a regression task, any machine learning model capable of regression can be used for prediction. Initially, gradient boosting models (XGBoost, LightGBM, or CatBoost) are commonly used as they have proven effective in tabular data tasks.

Once the model is built, attempts are made to outperform it using other methods. These methods include everything that can be used. The main limitation of these methods is that they are not capable of working with sequences, while there are many sequences in customer data:

1. Sequence of purchases

2. Sequence of actions/transitions on the application/website

3. Sequence of marketing touches before user registration

In the classic approach, we create features from these sequences, but in doing so, we inevitably need more information. For example, by calculating the average check and its dispersion per user, we might lose information on how it changed as the number of purchases increased—whether it grew or decreased. The intervals between purchases are also important. We assume that all these sequences matter when predicting the CLV of a user, so neural networks capable of processing these sequences are employed, namely, Transformers and Recurrent neural networks.

Building Machine Learning Model

Undoubtedly, building your machine learning model is complex and time-consuming, involving months of planning and development. Alongside initial AI coding, ongoing costs are incurred for algorithm maintenance and customisation to suit various analytics and business objectives.

Steps to create a predictive machine learning model for Customer Lifetime Value include:

1. Data Collection

First, we need to gather relevant data. Collect historical data on customer transactions, interactions, and demographics. Include variables such as purchase history, order frequency, order value, customer demographics, and any other data that might be relevant to predicting CLV.

2. Data Preprocessing

– Removing any outliers, missing values, or inconsistencies in the dataset.

– Feature engineering. Create new features or transform existing ones that might enhance predictive power. For example, calculate metrics like average order value, recency of last purchase, or frequency of purchases.

– Standardising numerical features to ensure they have a similar scale, which can improve the performance of some machine learning algorithms.

3. Model Selection

– At this step, you need to choose an appropriate algorithm. Select the algorithm suitable for predicting CLV based on the nature of your data and do not forget the problem you’re trying to solve.

– Then divide the dataset into training and testing sets to evaluate the model performance. Typically, 70-80% of the data is used for training and the remaining 20-30% for testing.

4. Model Training

– Fitting the selected machine learning model to the training data will allow it to learn the patterns and relationships within the data.

– Now you can improve performance by tuning the model’s hyperparameters through techniques like grid search or random search.

5. Model Evaluation

– Assess the model’s performance on the test dataset using appropriate evaluation metrics such as mean squared error, root mean squared error, or R-squared score.

– If the model performance is not satisfactory, consider revising the feature selection, engineering, or trying different algorithms.

6. Prediction

– Once the model is trained and evaluated, use it to make predictions on new or unseen data. This will generate predicted CLV values for individual customers.

7. Deployment

– Now we are ready to implement the predictive model into your business operations to generate ongoing predictions of customer CLV.

– Continuously monitoring the model’s performance and updating it as needed to ensure its accuracy and relevance over time.

For effective testing, aggregate 6-12 months of data, with three months of real data per six months of predictions for increased accuracy. You can use the UCI Machine Learning Repository dataset for sample data, but prioritise using your own data for better results.

When designing machine learning models it is important to incorporate quantifiable data such as:

– Number of customer support tickets

– Total customer spending

– Software engagements

– Complaint tickets within a specific period

– New customer acquisitions

Qualitative data like customer comments isn’t directly included but can be coded into quantitative data by assigning numerical values to keywords or sentiments.

Real-World Examples

Here are a few notable examples of organisations already using AI to predict Customer Lifetime Value:

Google

Analyses user search history, demographics, and online behaviour to predict CLV and offers personalised advertising and search results for users.

Airbnb

Predicts guest CLV by analysing booking history, demographics, and travel preferences. It also targets promotions and personalised recommendations for accommodations.

Facebook

It uses user data such as interactions, demographics, and interests to predict CLV. Gives content recommendations for users.

Analyses user engagement, demographics, and professional interests to predict CLV. Targets job recommendations and premium subscription offers to users.

Tesla

The company analyses customer driving patterns, demographics, and vehicle preferences to predict CLV.

Wrapping Up

As can be seen, implementing predictive models for CLV equips businesses with tools to understand the potential long-term value of their customers. By using data analytics and predictive algorithms, it is possible to personalise customer experiences and enhance customer retention efforts. These models give you all chances to identify high-value customers, tailor marketing campaigns accordingly, and allocate resources more efficiently to maximise ROI.

Moreover, predictive CLV models facilitate optimised pricing strategies, informed financial planning, and strategic decision-making, driving sustainable growth.