Vitoria Lima

Identifying High-Potential Accounts for Sales Targeting

A Data-Driven Approach using Additive Logistic Regression

Sales Targeting Project Overview
Figure 1: Overview of the Sales Targeting Project

Overview

This project focuses on identifying high-potential accounts for sales targeting using a data-driven approach. We leveraged a Target Account List (TAL) to identify 30 high-potential accounts, yielding potentially ~5.4 million USD yearly.

Data Overview

The dataset consists of over 3,000 companies across six different datasets. For training and testing purposes, the data was divided as follows:

Feature engineering was applied to enhance the dataset, resulting in around 100 features including lag and rolling features, days between engagements, average weekly engagement rate, etc.

Feature Selection

Random Forest was used for feature selection to identify the most important predictors for the model.

Model: Additive Logistic Regression

The model used for predicting high-potential accounts is Additive Logistic Regression. This model allows us to handle the sparsity in the data and provides a probabilistic framework for prediction.

The logistic regression model is defined as:

\[P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \sum_{i=1}^{n} \beta_i X_i)}}\]

Where:

In our additive model, we include interaction terms and polynomial terms to capture non-linear relationships:

\[P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \sum_{i=1}^{n} \beta_i X_i + \sum_{j=1}^{m} \gamma_j Z_j)}}\]

Where:

Model Performance Comparison
Figure 2: Additive logistic regression outperforming Random Forest

Prediction Results

The model provided the following prediction results:

The potential revenue from these high-potential accounts is estimated to be:

Key Features and Insights

The most important features identified through the analysis include:

Business Impact

This project provided significant insights and had a substantial business impact. It highlighted the most important factors driving sales and helped prioritize accounts with the highest potential, thereby optimizing the sales strategy. The data-driven approach enables more focused resource allocation and improved conversion rates.

Technical Implementation

The implementation involved several key steps:

Resources

GitHub Repository
Full Analysis (PDF)