Udacity Data Scientist Capstone Project

7 min readMar 25, 2021

Starbucks Offers Analysis based on Arvato dataset

Introduction

In this capstone project, I am using all the knowledge I have learned in the Udacity Data Scientist Nanodegree.

I chose the “Starbucks Challenge” as my final project where data was collected by Arvto and has 3 main datasets. The first one contains the customers data, the second contains the offers data and the third contains the event log for customer’s purchase.

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks.

Not all users receive the same offer, and that is the challenge to solve with this data set.

The main task of the project is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You’ll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

The given dataset contains transactional data showing user purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user actually views the offer. There are also records for when a user completes an offer.

Keep in mind as well that someone using the app might make a purchase through the app without having received an offer or seen an offer.

Problem Statement

The program used to create the data simulates how people make purchasing decisions and how those decisions are influenced by promotional offers.

Each person in the simulation has some hidden traits that influence their purchasing patterns and are associated with their observable traits. People produce various events, including receiving offers, opening offers, and making purchases.

As a simplification, there are no explicit products to track. Only the amounts of each transaction or offer are recorded.

There are three types of offers that can be sent: buy-one-get-one (BOGO), discount, and informational. In a BOGO offer, a user needs to spend a certain amount to get a reward equal to that threshold amount. In a discount, a user gains a reward equal to a fraction of the amount spent. In an informational offer, there is no reward, but neither is there a requisite amount that the user is expected to spend. Offers can be delivered via multiple channels.

The basic task is to use the data to identify which groups of people are most responsive to each type of offer, and how best to present each type of offer.

So in this project, we will use machine learning to predict the response of customers to offers either by “offer received”, “offer viewed” or “offer completed”. This information will be predicted based on some demographic information of the users as well as other purchasing data.

Data Exploration and Understanding

Before creating our machine learning model and predict the data, we need to first explore the datasets and get some insights and observations.

What are the available offers and their types?

This dataset has 4 offers with bogo and 4 with discount and 2 informational.

What are the Purchasing types of customers and their counts?

We can see that the transaction records have the biggest number because not all transactions are related to offers. There are a lot of transactions done by people without receiving an offer or maybe after receiving an offer but not using it due to the minimum amount needed to redeem the offer for example.

What are the most purchased type of offer?

We can see from the above chart that “Discount” offer has the biggest Completed offer number which means that people like discount offers more than bogo. Also, an obvious observation that the number of received offer is greater than the viewed one which is greater than the completed one because not all offers sent to customers are redeemed in real life.

Let’s now explore the customers data

Age distribution of customers

Most of the customers’ age range between 40 and 70 with a max at 50 years old.

Gender distribution of customers

The male customers are more than women customers.

Income distribution of customers

Data Modeling and Evaluation

In this part of the project, we will create a machine learning model to predict the purchase type of customers based on demographic attributes as well as other purchasing factors.

Here, we are using 3 different type of classifiers: Random Forest, K Nearest neighbours and Decision Tree.

We used Grid Search to know what are the best parameters to run each classifier.

For evaluation, we used accuracy and F1 score to compare the different used classifiers.

The results of each classifier is shown below:

Using classifier Random Forest: 

The best parameters for this classifier are: 

{'classifier__max_depth': 10, 'classifier__n_estimators': 10}The target value counts
      Target Value  Counts
0  offer completed    6643
1   offer received   23595
2     offer viewed    3279

 classification report 

                 precision    recall  f1-score   support

offer completed       1.00      1.00      1.00      6643
 offer received       0.64      0.99      0.78     15258
   offer viewed       0.97      0.27      0.43     11616

       accuracy                           0.75     33517
      macro avg       0.87      0.76      0.74     33517
   weighted avg       0.83      0.75      0.70     33517


 confusion matrix 

[[ 6643     0     0]
 [    0 15169    89]
 [    0  8426  3190]]

 Accuracy 

0.7459498165110242

 Model F1 Score 

0.7459498165110242

 ---------------------------------------------------------------- 

Using classifier KNN: 

The best parameters for this classifier are: 

{'classifier__max_depth': 10, 'classifier__n_estimators': 10}
The target value counts
      Target Value  Counts
0  offer completed    6711
1   offer received   22829
2     offer viewed    3977

 classification report 

                 precision    recall  f1-score   support

offer completed       1.00      1.00      1.00      6711
 offer received       0.66      0.99      0.80     15222
   offer viewed       0.98      0.34      0.50     11584

       accuracy                           0.77     33517
      macro avg       0.88      0.78      0.77     33517
   weighted avg       0.84      0.77      0.73     33517


 confusion matrix 

[[ 6711     0     0]
 [    0 15136    86]
 [    0  7693  3891]]

 Accuracy 

0.7679088223886386

 Model F1 Score 

0.7679088223886386

 ---------------------------------------------------------------- 

Using classifier Decision Tree: 

The best parameters for this classifier are: 

{'classifier__max_depth': 10, 'classifier__n_estimators': 10}
The target value counts
      Target Value  Counts
0  offer completed    6642
1   offer received   23566
2     offer viewed    3309

 classification report 

                 precision    recall  f1-score   support

offer completed       1.00      1.00      1.00      6642
 offer received       0.64      0.99      0.78     15183
   offer viewed       0.96      0.27      0.42     11692

       accuracy                           0.74     33517
      macro avg       0.87      0.75      0.73     33517
   weighted avg       0.82      0.74      0.70     33517


 confusion matrix 

[[ 6642     0     0]
 [    0 15041   142]
 [    0  8525  3167]]

 Accuracy 

0.7414148044276039

 Model F1 Score 

0.7414148044276039

Conclusion

In this project, we used machine learning to predict the purchasing type of users based on customer’s properties as well as other purchasing attributes. We used 3 versions for running the model using 3 different classifiers:

Random Forest
K Nearest Neighbours
Decision Tree

We can see that the three models perform almost the same with an accuracy between 74 ad 76% which is acceptable.

There is one comment on data imbalance as most of the “offers” have purchasing type “offer received” and not “viewed” or “completed” as shown in the above tables of the counts of predicted target values from all models so most of the events are predicted as “offer received” because “offer received” is the most occurring event.

To see more about this analysis, see the link to my Github available here.