How to Build a Revenue Forecast Using Historical Customer Cohort Data

by Trey Pruitt

Detail from Zur Farbenlehre Pl.10 (1810) by Johann Wolfgang Von Goethe

Introduction

If you are reading this article, you probably understand the consequences of ignoring your customer cohorts. And you have likely decided that you want the benefits of a revenue forecast model that uses customer cohorts.

You may ask: what's the next step? How do I actually implement a cohort-based forecast model? Here are the steps to follow when building a revenue forecast using historical customer cohort data.

Steps to Build a Revenue Forecast Using Historical Customer Cohort Data

Project Goals, Stakeholders, and Parameters

Step 1: Establish Project Goals

The primary objective of this step is to establish the goals for the analysis and forecast process. Goals might include:

Forecasting revenue with greater accuracy
Gaining deeper insights into sales, retention rates, and Average Revenue Per User (ARPU) at the cohort level
Understanding the sensitivity of key forecast model drivers (e.g.,new-to-file customer acquisition, customer retention, upselling, price changes)
Projecting cohort revenue performance in order to measure marketing efficiency at the channel level

Step 2: Identify Stakeholders

Identifying the stakeholders is a critical component of this phase. The audience for the analysis could range from internal teams focused on strategic planning and operational efficiency to external partners or investors interested in the company's performance and growth potential. Tailoring the presentation of historical data analyses and forecasts to meet the informational needs and decision-making processes of these stakeholders is crucial for the project's success.

Step 3: Decide Time Frame and Granularity of the Forecast

Deciding on the time frame and the granularity of time— whether it's weekly, monthly, or quarterly— before starting the project work is essential for ensuring consistency and accuracy in data analysis and forecasting. This foundational step lays the groundwork for aligning the time-based resolution of the data with the specific objectives and analytical needs of the project.

Similarly, the granularity of cohort segments should match the grain of the forecast to maintain coherence in the analysis. For instance, if the forecast is on a monthly basis, then the cohorts should also be segmented monthly. This alignment is crucial for drawing meaningful insights and making accurate predictions, as it ensures that the temporal dynamics within the data are accurately represented and that the analysis reflects the true behavior and trends of the customer base. Establishing these parameters upfront facilitates a structured approach to the analysis, enabling more precise forecasts and more strategic decision-making.

Step 4: Establish a Forecast Accuracy Metric

For measuring and evaluating the forecast's accuracy, selecting an appropriate quantitative metric is key. Options such as Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), or Weighted MAPE (wMAPE) could be considered based on the project's specific needs and the nature of the data.

Data Preparation

Step 5: Collect Data

Beginning with monthly sales data by customer is an effective approach to understanding purchasing patterns. A crucial element for analysis is the date each customer was acquired. If the customer acquisition date isn't readily available in your CRM system or database, it can be inferred from the date a customer's ID first appears in transaction records.

In addition to the acquisition date, gathering details on the marketing channel through which each customer was attracted, their geographical location, industry sector, receipt of discounts or incentives, and the products purchased in the initial month can greatly enhance customer segmentation efforts.

Step 6: Clean and Organize Data

Before embarking on any analysis, evaluating the data quality is essential to uncover potential errors, outliers, and gaps. This assessment can be efficiently conducted by calculating basic statistics for numeric fields and analyzing the distribution of values within categorical fields. Additionally, visualizing the data through various plots is a powerful method to detect underlying issues that might not be evident from statistics alone. Utilizing data profiling tools specifically designed for machine learning projects can significantly expedite this process. These tools are adept at swiftly identifying discrepancies, ensuring that the analysis is built on a solid foundation of reliable data

Step 7: Segment Customer Data into Cohorts

This step focuses on structuring customer sales data into defined cohorts to facilitate the analysis. The cornerstone of this segmentation is identifying the acquisition period (labeled as `first_month` in a monthly model), which serves as a primary dimension for cohort analysis. This approach allows for the creation of additional dimensions that remain static for each customer or are identifiable within the first period, such as marketing channel, geography, industry, initial period discounts, and the first period product selection.

Next, numeric data is aggregated for each cohort. This typically includes summing up critical sales metrics like units sold and net sales by cohort for each period, alongside counting the unique customers making purchases within those periods. It's useful to include the original number of customers within each cohort, enabling straightforward computation of key metrics such as retention rates.

To facilitate analysis, a `customer_tenure` field is created to quantify the duration since acquisition, calculated by subtracting the `first_month` from the current month, with the inception month indexed as 0. This metric not only enhances the understanding of customer loyalty and longevity but also assists in identifying trends and patterns in customer behavior over time.

Cohort Behavior Analysis

Step 8: Calculate Cohort Metrics

This step involves creating metrics to generate insights into customer behavior and business health. For example, the following metrics could be computed for each customer cohort at each period of a cohort tenure:

Retention rate (customers purchasing in period divided by customers purchasing in prior period)
Churn rate (1 minus retention rate)
Survival rate (customers purchasing in period divided by customers purchasing in first period)
Average units per customer (units divided by customers purchasing)
Average price per unit (sales divided by units)
Average revenue per customer [aka ARPU] (revenue divided by customers purchasing)

Step 9: Identify Trends and Patterns

In this step, the focus shifts to leveraging the calculated cohort metrics to understand underlying trends and patterns that can inform strategic decision-making. Survival curves emerge as a critical tool in this step, plotting the longevity of customer engagement over time and highlighting how various cohorts differ in their long-term relationship with the business. Similarly, retention curves provide invaluable insights into how effectively the business maintains its customer base across different periods. Analysis of average revenue and cumulative revenue per customer (aka CLTR) reveals the financial impact of these engagement patterns, offering a direct measure of the value generated by different cohorts.

Seasonality plays a significant role as well; for instance, analyzing variations during holiday periods can uncover opportunities for strategic promotions or adjustments in customer engagement tactics. Additionally, cohort heat maps serve as a powerful visual tool, allowing for the at-a-glance comparison of cohort behaviors across multiple dimensions, such as acquisition period, retention, and revenue generation. Together, these analyses help paint a comprehensive picture of customer behavior, guiding the development of targeted interventions to foster growth and sustainability.

An analysis that can be helpful at this point is to determine to what extent various customer dimensions are correlated with longer-term cohort revenue. For example, one might look at average cumulative revenue over the first 24 months for each customer cohort and create a regression or decision tree model to determine the most important predictors of revenue. For example, if customers that purchase "Pro" versions of product in their first month have considerably larger cumulative revenue and customers that receive initial discounts generate less cumulative revenue, those dimensions can serve as a basis to further segment cohorts for analysis and forecasting.

Revenue Forecast Model Creation

Step 10: Set Up the Cohort Forecast Model Structure

Before developing the cohort model, it's necessary to decide how we want to build up revenue. For example for each period and customer cohort, we might have a model that is structured with the following drivers:

customers purchasing count (first period is an assumption for new-to-file cohorts, future months are calculated based on expected retention rate at cohort tenure 1, 2, 3, etc.)
average units per customer (based on cohort tenure at period 0, 1, 2, 3, etc.)
average price per unit (based on cohort tenure at period 0, 1, 2, 3, etc. and based on period if planned price changes)

From these drivers, we can then derive the following calculated fields for each cohort for each period:

units (customers purchasing count x average units per customer)
revenue (average price per unit x units )
average revenue per customer purchasing (revenue divided by customers purchasing count)

To determine total revenue for a period, we sum up revenue for all cohorts for that period. For total revenue for a cohort across time periods (e.g., first 12 months), we sum up revenue for that cohort for cohort tenure periods from 0 to 11.

Step 11: Fit the Forecast Model

We have historical metrics for each cohort. We want to set the assumptions for the forecast model drivers as a "best fit" of historical data. This helps us set a baseline model for "current course and speed" if things stay roughly the same in the future.

For new-to-file customer acquisition, you can use a seasonally adjusted rate of new customer acquisition by marketing channel. This can be adjusted in future step when incorporating business assumptions. For other assumptions, a good starting point is to calculate the weighted average by cohort tenure and use a lookup table for each cell in the forecast based on each cohort's tenure and that period. For example, to calculated the number of customers purchasing for cohort X in period Y, we would use the retention rate for the cohort tenure times the number of customers purchasing in the prior period. The same process can be used for average units per customer and average price per unit.

The calculated fields of units and revenue can then be multiplied by drivers for each cohort and summed up by period. This gives us our total revenue forecast by cohort for each period. If forecasting monthly, we can sum up months into quarters and years for an aggregated revenue forecast.

Forecast Validation, Assumptions, and Sensitivity

Step 12: Validate the Forecast Against Historical Data

Validating a revenue forecast model, especially one built on customer cohort data, it's crucial to adopt a method similar to that used in machine learning, where data is divided into "train" and "test" sets. The model is trained on the "train" set, a collection of historical data, and validated against the "test" set, which comprises more recent data. This division is essential to mitigate the risk of "overfitting," a common issue where a model predicts past data accurately but fails to generalize to new, unseen data. Given the nature of forecasting models that deal with time-series data, the objective is to predict future sales by learning from past patterns. Accordingly, the most recent periods serve as the "test" dataset. This approach ensures that the model can effectively capture and project the underlying trends and patterns of customer behavior over time, providing a reliable forecast for future month sales by cohort.

In this step, we use the forecast accuracy metric from Step 4 to measure how well our forecast fits the most recent historical data. For example, if the forecast accuracy metric is mean absolute error (MAE), we compare our forecast model predictions for each cohort for the most recent period vs. the actual data, and compute the mean deviation. The best fit forecast model is one that is derived from the "train" periods and validated on the "test" period.

Step 13: Incorporate Business Assumptions

In this step, you will modify the forecast model assumptions to include planned changes to the business which could include:

New Product Launches: Introducing new products can attract new customers, increase the spending of existing customers, and potentially change the overall revenue forecast.
Changes to Marketing Strategies:

New Customer Acquisition: Initiatives aimed at attracting new customers could increase the size of future cohorts.
Retention Efforts: Enhanced customer retention strategies may decrease churn rates and increase lifetime value.
Upselling and Cross-selling: Tactics focused on selling more or higher-value products to existing customers could raise average revenue per user (ARPU).
Price Changes: Adjusting the pricing of products or services, either upwards or downwards, can directly impact revenue by altering customer demand and purchase behavior.

Market Expansion or Contraction: Entering new markets or retracting from existing ones can significantly affect the size and characteristics of customer cohorts, thereby influencing revenue forecasts.
Economic Conditions: Fluctuations in the overall economy can affect customer purchasing power and willingness to spend, which in turn impacts revenue predictions.
Regulatory Changes: New laws or regulations could alter operational costs, market accessibility, or product compliance requirements, impacting revenue potential.
Competitor Actions: New entrants to the market or changes in competitor strategies (e.g., aggressive pricing, new offerings) could influence customer loyalty and acquisition rates.
Technological Advancements: Implementing new technologies could improve product offerings or operational efficiency, potentially affecting customer satisfaction and retention.
Supply Chain Variability: Changes in the availability or cost of goods due to supply chain disruptions can affect product pricing and margins, thus altering revenue forecasts.
Customer Behavior Trends: Shifts in consumer preferences or behaviors, such as an increased demand for sustainable products, can impact the sales of existing or future product lines.

Step 14: Perform Sensitivity Analysis

Performing sensitivity analysis is an important step in evaluating the robustness of your revenue forecast model. This involves systematically testing the impact of varying key model drivers by a specific percentage, typically 10%, to assess how changes in these drivers affect the overall revenue forecast. For example, you would adjust assumptions such as customer acquisition rates, churn rates, average revenue per customer, and price per unit by increasing or decreasing them by 10% and observing the change in forecasted revenue. This process helps in identifying which drivers are most sensitive and have the greatest impact on revenue, enabling analysts to understand the potential risk and variability in the forecast.

Visualization and Reporting

Step 15: Create Visualizations

Translating your data into compelling charts and graphs is crucial for effectively communicating the nuances of cohort behavior and revenue forecasts. Visual tools help stakeholders quickly grasp complex information and make informed decisions.

Cohort Heatmap: This chart provides a useful visual representation of metrics such as retention rates or revenue per cohort over time. In a cohort heatmap, rows represent different cohorts, while columns represent time intervals (e.g., months or years). By using color intensity to indicate the magnitude of the metric being measured (darker colors could indicate higher values), you can quickly identify patterns, such as which cohorts are performing better or how retention rates change over time.
Survival Curves: This visualization shows the percentage of "active" customers from each cohort in each cohort tenure period divided by the starting number of customers in the cohort. It answers the question: What percent of customers survive until X periods after their first purchase? Since lifetime revenue is highly correlated with retention (in subscription businesses) or repeat rates (in transaction businesses), comparing survival curves of cohorts over time or by customer attribute can be illuminating.
Average Cumulative Revenue per Customer: This chart shows how much revenue each cohort generates starting with the first period of cohort tenure through N periods. To illustrate how customer cohorts generate revenue over time, use a line chart displaying cumulative sales by cohort. A useful visualization technique is the "small multiple" or "trellis" chart in which cohorts with similar attributes are grouped, and each cohort within the group is plotted on its own axis.

When creating these visualizations, ensure clarity by choosing appropriate scales and labeling axes, cohorts, and time periods clearly. Utilize contrasting colors for different cohorts and metrics for easy differentiation. Tooltips or hover-texts that provide additional data points can enhance the interactivity and informativeness of the visualizations. These visual strategies not only aid in the immediate understanding of complex cohort behaviors and forecasted revenues but also facilitate deeper analytical discussions among stakeholders.

Step 16: Create Reports and Presentation Materials

In this step, the goal is to synthesize and communicate the insights derived from the revenue forecast and its underlying analyses to stakeholders in a clear, concise, and impactful manner. This step involves crafting detailed reports and presentation materials that highlight key findings, assumptions, sensitivities, and strategic recommendations. The reports should include an executive summary that provides a high-level overview of the revenue forecast, detailed sections on methodology, data analysis, cohort behaviors, and forecast assumptions. Visualizations such as cohort heatmaps, survival curves, and cumulative revenue charts should be prominently featured to illustrate trends and support conclusions.

Presentation materials, on the other hand, should distill this information into digestible slides that emphasize strategic insights, risks, and opportunities. These materials might include scenario analyses showing best-case and worst-case forecasts, sensitivity analyses highlighting key variables, and strategic plans. Whether for internal strategy meetings, board reviews, or investor briefings, these reports and presentations should be tailored to the audience, ensuring that the message is clear and actionable, and that stakeholders are well-informed to make strategic decisions.

Forecast Model Maintenance

Step 17: Update Historical Data

Regularly incorporating new data after each period is crucial to keep your revenue forecast model current and accurate. This process involves adding the latest data on sales, customer acquisitions, churn rates, and any other relevant metrics into your existing datasets. A systematic approach to this update is crucial; ensure you have a standardized format for new data to seamlessly integrate with the historical data. Automate data collection and entry where possible to reduce errors and save time. It's also important to perform quality checks on the new data to identify and correct any discrepancies or anomalies before they are added to the model. Updating your historical data regularly not only refines the accuracy of your forecasts but also helps in identifying trends, shifts in customer behavior, and the impact of recent business decisions or external events on your revenue. Keeping your data up-to-date ensures that your revenue forecasts remain relevant and reliable, providing a solid basis for strategic planning and decision-making.

Step 18: Revise Forecast

In this step, the focus is on adjusting your revenue forecast in light of new data, insights, or changes in business conditions. Begin by comparing your forecasted figures with actual outcomes to identify variances. Analyze these discrepancies to understand their causes, whether they stem from changes in market dynamics, customer behavior, or internal strategy shifts. Incorporate the latest historical data and reassess your assumptions and model parameters accordingly. This may involve adjusting for new product launches, marketing initiatives, or alterations in pricing strategies. It's also an opportunity to refine your model's assumptions based on performance feedback. Regular revision of your forecast ensures it reflects the most current information and assumptions, making it a more reliable tool for guiding strategic decisions and resource allocation. Engage stakeholders in this process to ensure that the revised forecast aligns with broader business goals, fostering a culture of data-driven decision-making.

Step 19: Refresh Reports for Stakeholders

In this step, the objective is to update the reports and presentation materials previously shared with stakeholders to reflect the latest forecasts, analyses, and strategic insights. This involves incorporating the most recent data, revising forecasts based on new information or changes in the business environment, and updating visualizations to highlight current trends and projections. The refreshed reports should provide a clear and updated overview of the revenue forecast, including any adjustments made during the "Revise Forecast" step, and explain the rationale behind these changes. It's important to maintain a clear, concise, and accessible format, ensuring that key findings and recommendations stand out for easy comprehension. This step is critical for keeping stakeholders informed about the latest developments, supporting transparency, and facilitating strategic discussions. By regularly refreshing these reports, you ensure that decision-makers have the most current and relevant information at their fingertips, enabling them to make informed decisions and adapt to changing market conditions effectively.

Conclusion

This article provides a guide for analysts looking to leverage historical customer cohort data to create revenue forecasts. By following these steps, analysts can derive actionable insights from their data and make informed predictions about future revenue streams.

Next Steps

For assistance with your revenue forecasting needs, contact me.