How to Identify Your Best Customers

by Trey Pruitt


pareto chart

Introduction

In a previous article, we saw how the top 20% of a company's customers can generate 50% to 80% of a company's revenue. Now, we aim to identify who these customers are. What sets them apart from lower-value customers? And, is it possible to identify them in advance?

Customer Percentile vs. Percent of Revenue

Why You Would Want to Identify High-Value Customers

Identifying high-value customers is crucial for several reasons:

  1. To understand which attributes distinguish high-value customers from low-value ones, and to what extent.

  2. To determine which attributes are the most significant predictors of future revenue.

  3. To use these attributes to create customer segments based on predicted revenue for organizational communication purposes

  4. To establish different "allowable" cost-per-acquisition (CPA) targets based on expected future revenue and profit, taking customer attributes into account.

  5. To improve forecasting by using customer segments to predict future revenue.

  6. To recognize and reward high-value customers through loyalty or rewards programs.

  7. To enhance product development by focusing on features that are important to high-value customers.

Segmenting High-Value Customers

First, we need to define what "high-value" means. In this example, we use cumulative revenue over the first 24 months since a customer first's purchase transaction, only including customers that have a tenure of at least 24 months. We then sort the customers by cumulative revenue and classify the top 20% as "high-value".

One step toward identifying your best customers is to segment them based on the attributes that are most predictive of being a high-value customer. To do this, we build a predictive model that does the following:

  • Is trained on historical customer data to classify customers as high-value or not.
  • Identifies the most crucial customer features for determining high-value status.
  • Predicts the likelihood, from 0% to 100%, that a customer will be high-value.
  • Validates its predictions using separate test data not used in model training.

A useful output of a high-value customer prediction model is a customer segmentation table with the following structure:

Customer Attribute 1 Customer Attribute 2 Customer Attribute 3 Percent of Customers Predicted Probability of High-Value Lift Score
? ? ? ? ? ?
? ? ? ? ? ?
... ... ... ... ... ...
All Customers 100% 20% 1.0x

Here, "Lift Score" refers to the predicted probability of being a high-value customer divided by the average probability of being a high-value customer. We are looking for segments with Lift Scores greater than 1.0x which means that they are more likely than average to be high-value customers. The higher the Lift Score, the better.

Training and Evaluating the Model

Predicting a binary outcome, such as whether a customer is high-value, is known as a classification problem. The features (customer attributes) used in the model can be identified based on business domain knowledge or through automated tools. Generally, we include only those features that enhance the predictive model's accuracy.

We divide the data into training and testing sets to train and test the model. We create a machine learning model using Python-based data science tools such as scikit-learn. After establishing a model performance metric, the classification model is then trained on the training data and its performance is evaluated on the testing data. We will likely need to experiment with different kinds of classification models including Logistic Regression, Random Forests, and Naive Bayes to see which machine learning algorithm performs best for our particular situation.

Which Customer Features Are Most Important?

Once we have a prediction model with good accuracy, we want discover which customer attributes are most crucial. We do this by extracting the importance of each feature from the model.

Feature Importance

In this example, we see that the most important customer attribute is the subscription term (annual or monthly) of the first purchase. First Subscription Term is roughly 3x more important to the prediction model's accuracy than the next most important feature, First Subscription Plan (Basic or Pro). First Subscription Plan is 2x more important than the third customer attribute, Customer Geography (USA or International). Finally, Customer Geography is which is 0.15x as important as First Subscription Term. Understanding the feature importance of a prediction model helps us gauge the impact of each customer attribute on the likelihood of being a high-value customer.

Bringing It All Together: High-Value Customer Segments

Now that we understand the relative importance of features, we can complete the Customer Segmentation table using predictions from the model:

First Subscription Term First Subscription Plan Customer Region Percent of Customers Predicted Probability of High-Value Lift Score
Annual Pro USA 15% 47% 2.4x
Annual Pro International 6% 41% 2.1x
Annual Basic USA 25% 28% 1.4x
All Customers 100% 20% 1.0x

For example, customers located in the USA who initially purchase annual "Pro" subscriptions (representing 15% of all customers) are 2.4x more likely to be high-value customers. Similarly, international customers purchasing the same products in their first transaction are 2.1x more likely to be high-value. The largest group, Annual-Basic-USA (25% of customers), is 40% more likely to be high-value. Altogether, these segments account for 46% of customers and are, on average, 1.7x more likely to be high-value customers.

Conclusion

Now that we know the most critical features for identifying high-value customers, should we exclusively focus on those? Not necessarily. Although high-value customers contribute the majority of revenue, other customers remain important. However, when making investments in customer acquisition, retention, upselling, etc., we should be mindful of relative customer value.

Next Steps

What are the most important attributes of your best customers? If you don't know, get in touch with me and we can discuss.


Related Posts