In [55]:
In [56]:Out[56]:ConstituentIDMembershipOrganizationMembershipLevelInceptionDateInitiationDateExpirationDateCustomCategory010 68233 IndividualMembership Supporter 2004-10-20 1948-12-30 1949-12-31 (none)1 8056 IndividualMembership Senior 2004-04-06 1948-12-30 1949-12-29 (none)2 54161 IndividualMembership Supporter 2004-04-13 1948-12-30 1949-12-31 (none)#importing librariesimport pandas as pdimport matplotlib.pyplot as plt import seaborn as snsfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_report, confusion_matrix from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import ColumnTransformerfrom sklearn.pipeline import Pipeline from sklearn.naive_bayes import GaussianNBfrom sklearn.metrics import accuracy_score, confusion_matrix, classification_repor from sklearn.feature_selection import SelectKBest, mutual_info_classif#Importing the member_history datasetmember_history = pd.read_excel('Member_History.xlsx', sheet_name='MemHistory' member_history["Initiation Date"] = pd.to_datetime(member_history["Initiation Date member_history.head(3)Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#1 of 30 10/18/2023, 8:12 AMIn [57]:In [58]:Out[57]:Campaign nameAd SetName Ad name Month Delivery status Delivery level Reach Impressions0 NaN NaN NaN NaN NaN NaN 857447 70345381SFJAZZ At Home | Retargeting | 23-24 Always On...SFJAZZ | Watch Page Visitor 7dLB | West CoastVideo | SFJAZZ At Home Sizzle Reel 20232023-10-01-2023-10-05active ad 1382Concert | Prospecting | Meshell Ndegeocello (1...SFJAZZ | All Members CRM List LAL | 50 MileRa...Soundcard | Meshell Ndegeocello (10/27-10/29)2023-10-01-2023-10-05active ad 1719 19983SFJAZZ At Home | Prospecting | Sep 2023 Always...Interest | Jazz Targeted | West CoastVideo | SFJAZZ At Home Sep Broadcasts2023-10-01-2023-10-05not_delivering ad 04SFJAZZ At Home | Prospecting | Oct 2023 Always...SFJAZZ | All Members CRM List LAL | West CoastVideo | SFJAZZ At Home Oct Broadcasts2023-10-01-2023-10-05active ad 2435 42085 rows × 25 columnsOut[58]: Date Facebook reach0 2023-01-01 353901 2023-01-02 292412 2023-01-03 21768#Importing the jazz_membership datasetjazz_membership = pd.read_excel('sf_jazz_membership_data.xlsx') jazz_membership["Starts"] = pd.to_datetime(jazz_membership["Starts"], format jazz_membership.head(5)#Importing the facebook_Reach datasetfacebook_Reach = pd.read_csv('Facebook_Reach.csv', encoding='ISO-8859-1') facebook_Reach["Date"] = pd.to_datetime(facebook_Reach["Date"], format="%Y-%m-%d" facebook_Reach.head(3)Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#2 of 30 10/18/2023, 8:12 AMIn [59]:In [60]:In [61]:Out[59]: Date Facebook Page likes0 2023-01-01 2171 2023-01-02 2152 2023-01-03 118Out[60]: Date New Facebook Page likes0 2023-01-01 111 2023-01-02 142 2023-01-03 11Out[61]: Date New Facebook Page likes0 2023-01-01 111 2023-01-02 142 2023-01-03 113 2023-01-04 194 2023-01-05 14... ... ...552 2023-10-05 55553 2023-10-06 69554 2023-10-07 55555 2023-10-08 66556 2023-10-09 59557 rows × 2 columns#Importing the page_Profile_visits datasetpage_Profile_visits = pd.read_csv('page_Profile_visits.csv', encoding='ISO-8859-1' page_Profile_visits["Date"] = pd.to_datetime(page_Profile_visits["Date"], format page_Profile_visits.head(3)#Importing the New_likes_and_follows datasetNew_likes_and_follows = pd.read_csv('New_likes_and_follows.csv', encoding= New_likes_and_follows["Date"] = pd.to_datetime(New_likes_and_follows["Date" New_likes_and_follows.head(3)#converting the date to datetime object formatNew_likes_and_follows["Date"] = pd.to_datetime(New_likes_and_follows["Date"New_likes_and_followsJazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#3 of 30 10/18/2023, 8:12 AMIn [62]:In [63]:Out[63]:Campaign nameAd SetNameAd nameMonth Delivery statusDelivery level Reach Impressions Frequency0SFJAZZ At Home | Retargeting | 23-24 Always On...SFJAZZ | Watch Page Visitor 7dLB | West CoastVideo | SFJAZZ At Home Sizzle Reel 20232023-10-01-2023-10-05active ad 138 842 6.1014491SFJAZZ At Home | Retargeting | 23-24 Always On...SFJAZZ | Watch Page Visitor 7dLB | West CoastVideo | SFJAZZ At Home Sizzle Reel 20232023-10-01-2023-10-05active ad 138 842 6.1014492SFJAZZ At Home | Retargeting | 23-24 Always On...SFJAZZ | Watch Page Visitor 7dLB | West CoastVideo | SFJAZZ At Home Sizzle Reel 20232023-10-01-2023-10-05active ad 138 842 6.1014493SFJAZZ At Home | Retargeting | 23-24 Always On...SFJAZZ | Watch Page Visitor 7dLB | West CoastVideo | SFJAZZ At Home Sizzle Reel 20232023-10-01-2023-10-05active ad 138 842 6.1014494SFJAZZ At Home | Retargeting | 23-24 Always On...SFJAZZ | Watch Page Visitor 7dLB | West CoastVideo | SFJAZZ At Home Sizzle Reel 20232023-10-01-2023-10-05active ad 138 842 6.1014495 rows × 38 columns# Performing the merge based on Initiation Date and Startscombined_data = jazz_membership.merge(member_history, left_on='Starts', right_on combined_data = combined_data.merge(facebook_Reach, left_on='Starts', right_on combined_data = combined_data.merge(page_Profile_visits, left_on='Starts', combined_data = combined_data.merge(New_likes_and_follows, left_on='Starts'combined_data.head()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#4 of 30 10/18/2023, 8:12 AMIn [85]:In [86]:Out[86]:Month Delivery status Reach Impressions FrequencyAmount spent (USD)Cost per result Starts5922023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285932023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285942023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285952023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285962023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285 rows × 31 columns#Dropping unnecessary columnscolumns_to_drop = [ "Campaign name", "Ad Set Name", "Ad name","Delivery level", "Attribution setting", "Result type", "Results", "Constituent ID", "Expiration Date", "Custom Category 01"]combined_data = combined_data.drop(columns=columns_to_drop)combined_data.head(5)Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#5 of 30 10/18/2023, 8:12 AMIn [87]:In [88]:In [89]:Missing Values:Month 0Delivery status 0Reach 0 Impressions 0 Frequency 0Amount spent (USD) 0Cost per result 0Starts 0 Ends 0Link clicks 0CPC (cost per link click) 0CTR (all) 0CPM (cost per 1,000 impressions) 0Result rate 0Clicks (all) 0CPC (All) 0Reporting starts 0Reporting ends 0Membership Organization 0Membership Level 0Inception Date 0Initiation Date 0Date_x 0Facebook reach 0Date_y 0Facebook Page likes 0Date 0New Facebook Page likes 0Membership Reg 0Membership Reg Label 0Membership Reg Binary 0dtype: int64Out[88]: (184048, 31)Out[89]: (184048, 31)# Checking for missing valuesmissing_values = combined_data.isnull().sum() print("Missing Values:\n", missing_values)combined_data.shape#dropping null valuescombined_data = combined_data.dropna()#Checking the shape of the datasetcombined_data.shapeJazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#6 of 30 10/18/2023, 8:12 AMIn [90]:Exploratory Data AnalysisOut[90]:Reach Impressions Frequency Amount spent (USD)Cost per result Link clickscount 184048.000000 184048.000000 184048.000000 184048.000000 184048.000000 184048.000000mean 20190.627967 45421.637964 2.310083 434.256041 18.409868 436.720508std 23907.713927 57929.976816 1.153889 504.985397 14.888930 564.666127min 95.000000 513.000000 1.034668 17.270000 3.246533 6.00000025% 3193.000000 5830.000000 1.697043 84.820000 10.326667 42.00000050% 10078.000000 23722.000000 1.980823 281.780000 15.350000 207.00000075% 31982.000000 64649.000000 2.652004 566.130000 21.165846 652.000000max 107170.000000 304363.000000 11.826733 2372.460000 149.520000 2973.000000#Summary Statisticssummary_stats = combined_data.describe() summary_statsJazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#7 of 30 10/18/2023, 8:12 AMIn [91]: # Distribution of 'Amount spent (USD)' plt.figure(figsize=(12, 10))sns.histplot(data=combined_data, x='Amount spent (USD)', bins=10, kde=True plt.title('Distribution of Amount spent (USD)')plt.xlabel('Amount spent (USD)') plt.ylabel('Frequency') plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#8 of 30 10/18/2023, 8:12 AMIn [92]: # Relationship between 'Amount spent (USD)' and 'Impressions' plt.figure(figsize=(10, 6))sns.scatterplot(data=combined_data, x='Amount spent (USD)', y='Impressions' plt.title('Relationship between Amount spent (USD) and Impressions') plt.xlabel('Amount spent (USD)')plt.ylabel('Impressions')plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#9 of 30 10/18/2023, 8:12 AMIn [93]: # Box plot of 'Delivery status' vs. 'Impressions' plt.figure(figsize=(10, 6))sns.boxplot(data=combined_data, x='Delivery status', y='Impressions') plt.title('Delivery Status vs. Impressions')plt.xlabel('Delivery status') plt.ylabel('Impressions') plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#10 of 30 10/18/2023, 8:12 AMIn [94]: # Count of 'Delivery status'plt.figure(figsize=(8, 5)) sns.countplot(data=combined_data, x='Delivery status') plt.title('Count of Delivery Status') plt.xlabel('Delivery status')plt.ylabel('Count')plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#11 of 30 10/18/2023, 8:12 AMIn [95]: # Distribution of 'Link clicks' by 'Delivery status' plt.figure(figsize=(10, 6))sns.violinplot(data=combined_data, x='Delivery status', y='Link clicks') plt.title('Distribution of Link clicks by Delivery status') plt.xlabel('Delivery status')plt.ylabel('Link clicks')plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#12 of 30 10/18/2023, 8:12 AMIn [98]: # Correlation heatmap of numeric variablesnumeric_data = combined_data[['Impressions', 'Amount spent (USD)', 'Link clicks'correlation_matrix = numeric_data.corr() plt.figure(figsize=(8, 6))sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm') plt.title('Correlation Heatmap')plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#13 of 30 10/18/2023, 8:12 AMIn [99]: import matplotlib.pyplot as plt import seaborn as snsfrom matplotlib.ticker import MaxNLocator# Creating monthly datamonthly_data = combined_data.groupby('Month')[['Impressions', 'Amount spent (USD)'# Setting up the figure and the first y-axis fig, ax1 = plt.subplots(figsize=(12, 6)) ax1.set_xlabel('Month') ax1.set_ylabel('Impressions', color='tab:blue')sns.lineplot(x='Month', y='Impressions', data=monthly_data, marker='o', color# Creating the second y-axisax2 = ax1.twinx()ax2.set_ylabel('Amount spent (USD)', color='tab:red') sns.lineplot(x='Month', y='Amount spent (USD)', data=monthly_data, marker=# Rotating x-axis labels to be vertical and reduce the frequency ax1.xaxis.set_major_locator(MaxNLocator(nbins=10, integer=True)) plt.xticks(rotation=90)# Showing the plotplt.title('Monthly Trend of Impressions and Amount Spent') plt.tight_layout()plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#14 of 30 10/18/2023, 8:12 AMIn [100]: # Pairplot of selected numeric variablessns.pairplot(data=numeric_data, diag_kind='kde') plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#15 of 30 10/18/2023, 8:12 AMIn [101]: # Visualizing 'Facebook reach' plt.figure(figsize=(10, 6))sns.histplot(data=combined_data, x='Facebook reach', bins=20, kde=True) plt.title('Distribution of Facebook Reach')plt.xlabel('Facebook Reach') plt.ylabel('Frequency') plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#16 of 30 10/18/2023, 8:12 AMIn [102]: # Visualizing 'Facebook Page likes' plt.figure(figsize=(10, 6))sns.histplot(data=combined_data, x='Facebook Page likes', bins=20, kde=True plt.title('Distribution of Facebook Page Likes')plt.xlabel('Facebook Page Likes') plt.ylabel('Frequency') plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#17 of 30 10/18/2023, 8:12 AMIn [103]: # Visualizing 'New Facebook Page likes'plt.figure(figsize=(10, 6))sns.histplot(data=combined_data, x='New Facebook Page likes', bins=20, kde= plt.title('Distribution of New Facebook Page Likes')plt.xlabel('New Facebook Page Likes')plt.ylabel('Frequency')plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#18 of 30 10/18/2023, 8:12 AMIn [104]: # Box plot for Amount spent by Membership Organizationplt.figure(figsize=(12, 10))sns.boxplot(data=combined_data, x='Membership Organization', y='Amount spent (USD) plt.title('Amount Spent by Membership Organization')plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#19 of 30 10/18/2023, 8:12 AMIn [105]: # Box plot for Link clicks by Membership Organization plt.figure(figsize=(12, 10))sns.boxplot(data=combined_data, x='Membership Organization', y='Link clicks' plt.title('Link Clicks by Membership Organization')plt.show()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#20 of 30 10/18/2023, 8:12 AMIn [106]:Initiation Date 2023-01-09 1440 2023-01-24 3536 2023-02-17 7280 2023-03-02 5520 2023-03-03 4800 2023-03-28 6000 2023-05-01 54208 2023-05-09 992 2023-05-31 416 2023-06-07 1168 2023-06-10 1248 2023-06-23 17632 2023-07-03 2720 2023-07-05 1344 2023-07-14 2560 2023-07-20 2240 2023-07-21 4800 2023-07-28 7104 2023-08-01 56952 2023-08-21 216 2023-09-01 1872Name: Initiation Date, dtype: int64# Converting "Initiation Date" to a datetime objectcombined_data['Initiation Date'] = pd.to_datetime(combined_data['Initiation Date'# Grouping by day and count the occurrencesinitiation_date_grouped = combined_data.groupby(combined_data['Initiation Date'# Displaying the resultprint(initiation_date_grouped)Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#21 of 30 10/18/2023, 8:12 AMIn [107]:Out[107]:Month Delivery status Reach Impressions FrequencyAmount spent (USD)Cost per result Starts5922023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285932023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285942023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285952023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285962023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285 rows × 31 columns# Grouping by day and counting the occurrences, then transforming it to create a ncombined_data['Membership Reg'] = combined_data.groupby(combined_data['Initiation# Displaying the updated DataFramecombined_data.head()Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#22 of 30 10/18/2023, 8:12 AMIn [108]:Out[108]:Month Delivery status Reach Impressions FrequencyAmount spent (USD)Cost per result5922023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285932023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285942023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285952023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285962023-09-01 - 2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-28... ... ... ... ... ... ... ...2074752023-01-01-2023-01-31not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-092074762023-01-01-2023-01-31not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-092074772023-01-01-2023-01-31not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-092074782023-01-01-2023-01-31not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-092074792023-01-01-2023-01-31not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09184048 rows × 31 columns# Defining the threshold for "High" membership registration count threshold = 20000# Creating a new column "Membership Reg Label" based on the thresholdcombined_data['Membership Reg Label'] = combined_data['Membership Reg'].apply# Displaying the updated DataFramecombined_dataJazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#23 of 30 10/18/2023, 8:12 AMIn [109]:Out[109]:Month Delivery status Reach Impressions FrequencyAmount spent (USD)Cost per result5922023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285932023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285942023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285952023-09-01-2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-285962023-09-01 - 2023-09-30active 494 5510 11.153846 192.61 64.203333 2023-07-28... ... ... ... ... ... ... ...2074752023-01-01-2023-01-31not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-092074762023-01-01-2023-01-31not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-092074772023-01-01-2023-01-31not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-092074782023-01-01-2023-01-31not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-092074792023-01-01-2023-01-31not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09184048 rows × 31 columns# Mapping 'High' to 1 and 'Low' to 0 in a new column 'Membership Reg Binary'combined_data['Membership Reg Binary'] = combined_data['Membership Reg Label'# Displaying the updated DataFramecombined_dataJazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#24 of 30 10/18/2023, 8:12 AMIn [111]:One Hot Encoding# Selecting the independent variablesX = ['Reach', 'Impressions', 'Frequency','Amount spent (USD)','CPC (cost per link click)','CPM (cost per 1,000 impressions)', 'Clicks (all)','Facebook reach', 'Facebook Page likes', 'New Facebook Page likes', 'Membership Organization', 'Membership Level',]# Defining the dependent variabley = 'Membership Reg Binary'# Creating a new DataFrame with selected independent and dependent variablesselected_data = combined_data[X + [y]]Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#25 of 30 10/18/2023, 8:12 AMIn [113]:Splitting the datasetIn [114]:In [115]:1. Logistic Regression ModelOut[113]:Reach Impressions FrequencyAmount spent (USD)CPC (cost per link click)CPM (cost per 1,000 impressions)Clicks (all)Facebook reach592 -0.823863 593 -0.823863 594 -0.823863 595 -0.823863 596 -0.823863 | -0.688965 -0.688965 -0.688965 -0.688965 -0.688965 |