Tuesday, October 17, 2023

 In [55]:

In [56]:

Out[56]:

Constituent

ID

Membership

Organization

Membership

Level

Inception

Date

Initiation

Date

Expiration

Date

Custom

Category

01

0
68233 Individual

Membership Supporter 2004-10-20 1948-12-30 1949-12-31 (none)

1 8056 Individual

Membership Senior 2004-04-06 1948-12-30 1949-12-29 (none)

2 54161 Individual

Membership Supporter 2004-04-13 1948-12-30 1949-12-31 (none)

#importing libraries

import pandas as pd

import matplotlib.pyplot as plt import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report, confusion_matrix from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score, confusion_matrix, classification_repor from sklearn.feature_selection import SelectKBest, mutual_info_classif

#Importing the member_history dataset

member_history = pd.read_excel('Member_History.xlsx', sheet_name='MemHistory' member_history["Initiation Date"] = pd.to_datetime(member_history["Initiation Date member_history.head(3)

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

1 of 30 10/18/2023, 8:12 AM

In [57]:

In [58]:

Out[57]:

Campaign name

Ad Set

Name Ad name Month Delivery status Delivery level Reach Impressions

0
NaN NaN NaN NaN NaN NaN 857447 7034538

1

SFJAZZ At Home | Retargeting | 23-24 Always On...

SFJAZZ | Watch Page Visitor 7dLB | West Coast

Video | SFJAZZ At Home Sizzle Reel 2023

2023-10-01

-

2023-10-05

active ad 138

2

Concert | Prospecting | Meshell Ndegeocello (1...

SFJAZZ | All Members CRM List LAL | 50 Mile

Ra...

Soundcard | Meshell Ndegeocello (10/27-10/29)

2023-10-01

-

2023-10-05

active ad 1719 1998

3

SFJAZZ At Home | Prospecting | Sep 2023 Always...

Interest | Jazz Targeted | West Coast

Video | SFJAZZ At Home Sep Broadcasts

2023-10-01

-

2023-10-05

not_delivering ad 0

4

SFJAZZ At Home | Prospecting | Oct 2023 Always...

SFJAZZ | All Members CRM List LAL | West Coast

Video | SFJAZZ At Home Oct Broadcasts

2023-10-01

-

2023-10-05

active ad 2435 4208

5 rows × 25 columns

Out[58]: Date Facebook reach

0
2023-01-01 35390

1 2023-01-02 29241

2 2023-01-03 21768

#Importing the jazz_membership dataset

jazz_membership = pd.read_excel('sf_jazz_membership_data.xlsx') jazz_membership["Starts"] = pd.to_datetime(jazz_membership["Starts"], format jazz_membership.head(5)

#Importing the facebook_Reach dataset

facebook_Reach = pd.read_csv('Facebook_Reach.csv', encoding='ISO-8859-1') facebook_Reach["Date"] = pd.to_datetime(facebook_Reach["Date"], format="%Y-%m-%d" facebook_Reach.head(3)

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

2 of 30 10/18/2023, 8:12 AM

In [59]:

In [60]:

In [61]:

Out[59]: Date Facebook Page likes

0
2023-01-01 217

1 2023-01-02 215

2 2023-01-03 118

Out[60]: Date New Facebook Page likes

0
2023-01-01 11

1 2023-01-02 14

2 2023-01-03 11

Out[61]: Date New Facebook Page likes

0
2023-01-01 11

1 2023-01-02 14

2 2023-01-03 11

3 2023-01-04 19

4 2023-01-05 14

... ... ...

552 2023-10-05 55

553 2023-10-06 69

554 2023-10-07 55

555 2023-10-08 66

556 2023-10-09 59

557 rows × 2 columns

#Importing the page_Profile_visits dataset

page_Profile_visits = pd.read_csv('page_Profile_visits.csv', encoding='ISO-8859-1' page_Profile_visits["Date"] = pd.to_datetime(page_Profile_visits["Date"], format page_Profile_visits.head(3)

#Importing the New_likes_and_follows dataset

New_likes_and_follows = pd.read_csv('New_likes_and_follows.csv', encoding= New_likes_and_follows["Date"] = pd.to_datetime(New_likes_and_follows["Date" New_likes_and_follows.head(3)

#converting the date to datetime object format

New_likes_and_follows["Date"] = pd.to_datetime(New_likes_and_follows["Date"

New_likes_and_follows

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

3 of 30 10/18/2023, 8:12 AM

In [62]:

In [63]:

Out[63]:

Campaign name

Ad Set

Name

Ad name

Month Delivery status

Delivery level Reach Impressions Frequency

0

SFJAZZ At Home | Retargeting | 23-24 Always On...

SFJAZZ | Watch Page Visitor 7dLB | West Coast

Video | SFJAZZ At Home Sizzle Reel 2023

2023-10-01

-

2023-10-05

active ad 138 842 6.101449

1

SFJAZZ At Home | Retargeting | 23-24 Always On...

SFJAZZ | Watch Page Visitor 7dLB | West Coast

Video | SFJAZZ At Home Sizzle Reel 2023

2023-10-01

-

2023-10-05

active ad 138 842 6.101449

2

SFJAZZ At Home | Retargeting | 23-24 Always On...

SFJAZZ | Watch Page Visitor 7dLB | West Coast

Video | SFJAZZ At Home Sizzle Reel 2023

2023-10-01

-

2023-10-05

active ad 138 842 6.101449

3

SFJAZZ At Home | Retargeting | 23-24 Always On...

SFJAZZ | Watch Page Visitor 7dLB | West Coast

Video | SFJAZZ At Home Sizzle Reel 2023

2023-10-01

-

2023-10-05

active ad 138 842 6.101449

4

SFJAZZ At Home | Retargeting | 23-24 Always On...

SFJAZZ | Watch Page Visitor 7dLB | West Coast

Video | SFJAZZ At Home Sizzle Reel 2023

2023-10-01

-

2023-10-05

active ad 138 842 6.101449

5 rows × 38 columns

# Performing the merge based on Initiation Date and Starts

combined_data = jazz_membership.merge(member_history, left_on='Starts', right_on combined_data = combined_data.merge(facebook_Reach, left_on='Starts', right_on combined_data = combined_data.merge(page_Profile_visits, left_on='Starts', combined_data = combined_data.merge(New_likes_and_follows, left_on='Starts'

combined_data.head()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

4 of 30 10/18/2023, 8:12 AM

In [85]:

In [86]:

Out[86]:

Month Delivery status Reach Impressions Frequency

Amount spent (USD)

Cost per result Starts

592

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

593

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

594

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

595

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

596

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

5 rows × 31 columns

#Dropping unnecessary columns

columns_to_drop = [ "Campaign name", "Ad Set Name", "Ad name",

"Delivery level", "Attribution setting", "Result type", "Results", "Constituent ID", "Expiration Date", "Custom Category 01"

]

combined_data
= combined_data.drop(columns=columns_to_drop)

combined_data.head(
5)

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

5 of 30 10/18/2023, 8:12 AM

In [87]:

In [88]:

In [89]:

Missing Values:

Month 0

Delivery status 0

Reach 0 Impressions 0 Frequency 0

Amount spent (USD) 0

Cost per result 0

Starts 0 Ends 0

Link clicks 0

CPC (cost per link click) 0

CTR (all) 0

CPM (cost per 1,000 impressions) 0

Result rate 0

Clicks (all) 0

CPC (All) 0

Reporting starts 0

Reporting ends 0

Membership Organization 0

Membership Level 0

Inception Date 0

Initiation Date 0

Date_x 0

Facebook reach 0

Date_y 0

Facebook Page likes 0

Date 0

New Facebook Page likes 0

Membership Reg 0

Membership Reg Label 0

Membership Reg Binary 0

dtype: int64

Out[88]: (184048, 31)

Out[89]: (184048, 31)

# Checking for missing values

missing_values = combined_data.isnull().sum() print("Missing Values:\n", missing_values)

combined_data.shape

#dropping null values

combined_data = combined_data.dropna()

#Checking the shape of the dataset

combined_data.shape

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

6 of 30 10/18/2023, 8:12 AM

In [90]:

Exploratory Data Analysis

Out[90]:

Reach Impressions Frequency Amount spent (USD)

Cost per result Link clicks

count
184048.000000 184048.000000 184048.000000 184048.000000 184048.000000 184048.000000

mean 20190.627967 45421.637964 2.310083 434.256041 18.409868 436.720508

std 23907.713927 57929.976816 1.153889 504.985397 14.888930 564.666127

min 95.000000 513.000000 1.034668 17.270000 3.246533 6.000000

25% 3193.000000 5830.000000 1.697043 84.820000 10.326667 42.000000

50% 10078.000000 23722.000000 1.980823 281.780000 15.350000 207.000000

75% 31982.000000 64649.000000 2.652004 566.130000 21.165846 652.000000

max 107170.000000 304363.000000 11.826733 2372.460000 149.520000 2973.000000

#Summary Statistics

summary_stats = combined_data.describe() summary_stats

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

7 of 30 10/18/2023, 8:12 AM

In [91]: # Distribution of 'Amount spent (USD)' plt.figure(figsize=(12, 10))

sns.histplot(data
=combined_data, x='Amount spent (USD)', bins=10, kde=True plt.title('Distribution of Amount spent (USD)')

plt.xlabel(
'Amount spent (USD)') plt.ylabel('Frequency') plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

8 of 30 10/18/2023, 8:12 AM

In [92]: # Relationship between 'Amount spent (USD)' and 'Impressions' plt.figure(figsize=(10, 6))

sns.scatterplot(data
=combined_data, x='Amount spent (USD)', y='Impressions' plt.title('Relationship between Amount spent (USD) and Impressions') plt.xlabel('Amount spent (USD)')

plt.ylabel(
'Impressions')

plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

9 of 30 10/18/2023, 8:12 AM

In [93]: # Box plot of 'Delivery status' vs. 'Impressions' plt.figure(figsize=(10, 6))

sns.boxplot(data
=combined_data, x='Delivery status', y='Impressions') plt.title('Delivery Status vs. Impressions')

plt.xlabel(
'Delivery status') plt.ylabel('Impressions') plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

10 of 30 10/18/2023, 8:12 AM

In [94]: # Count of 'Delivery status'

plt.figure(figsize=(8, 5)) sns.countplot(data=combined_data, x='Delivery status') plt.title('Count of Delivery Status') plt.xlabel('Delivery status')

plt.ylabel(
'Count')

plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

11 of 30 10/18/2023, 8:12 AM

In [95]: # Distribution of 'Link clicks' by 'Delivery status' plt.figure(figsize=(10, 6))

sns.violinplot(data
=combined_data, x='Delivery status', y='Link clicks') plt.title('Distribution of Link clicks by Delivery status') plt.xlabel('Delivery status')

plt.ylabel(
'Link clicks')

plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

12 of 30 10/18/2023, 8:12 AM

In [98]: # Correlation heatmap of numeric variables

numeric_data = combined_data[['Impressions', 'Amount spent (USD)', 'Link clicks'

correlation_matrix = numeric_data.corr() plt.figure(figsize=(8, 6))

sns.heatmap(correlation_matrix, annot
=True, cmap='coolwarm') plt.title('Correlation Heatmap')

plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

13 of 30 10/18/2023, 8:12 AM

In [99]: import matplotlib.pyplot as plt import seaborn as sns

from matplotlib.ticker import MaxNLocator

# Creating monthly data

monthly_data = combined_data.groupby('Month')[['Impressions', 'Amount spent (USD)'

# Setting up the figure and the first y-axis fig, ax1 = plt.subplots(figsize=(12, 6)) ax1.set_xlabel('Month') ax1.set_ylabel('Impressions', color='tab:blue')

sns.lineplot(x
='Month', y='Impressions', data=monthly_data, marker='o', color

# Creating the second y-axis

ax2 = ax1.twinx()

ax2.set_ylabel(
'Amount spent (USD)', color='tab:red') sns.lineplot(x='Month', y='Amount spent (USD)', data=monthly_data, marker=

# Rotating x-axis labels to be vertical and reduce the frequency ax1.xaxis.set_major_locator(MaxNLocator(nbins=10, integer=True)) plt.xticks(rotation=90)

# Showing the plot

plt.title('Monthly Trend of Impressions and Amount Spent') plt.tight_layout()

plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

14 of 30 10/18/2023, 8:12 AM

In [100]: # Pairplot of selected numeric variables

sns.pairplot(data=numeric_data, diag_kind='kde') plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

15 of 30 10/18/2023, 8:12 AM

In [101]: # Visualizing 'Facebook reach' plt.figure(figsize=(10, 6))

sns.histplot(data
=combined_data, x='Facebook reach', bins=20, kde=True) plt.title('Distribution of Facebook Reach')

plt.xlabel(
'Facebook Reach') plt.ylabel('Frequency') plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

16 of 30 10/18/2023, 8:12 AM

In [102]: # Visualizing 'Facebook Page likes' plt.figure(figsize=(10, 6))

sns.histplot(data
=combined_data, x='Facebook Page likes', bins=20, kde=True plt.title('Distribution of Facebook Page Likes')

plt.xlabel(
'Facebook Page Likes') plt.ylabel('Frequency') plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

17 of 30 10/18/2023, 8:12 AM

In [103]: # Visualizing 'New Facebook Page likes'

plt.figure(figsize=(10, 6))

sns.histplot(data
=combined_data, x='New Facebook Page likes', bins=20, kde= plt.title('Distribution of New Facebook Page Likes')

plt.xlabel(
'New Facebook Page Likes')

plt.ylabel(
'Frequency')

plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

18 of 30 10/18/2023, 8:12 AM

In [104]: # Box plot for Amount spent by Membership Organization

plt.figure(figsize=(12, 10))

sns.boxplot(data
=combined_data, x='Membership Organization', y='Amount spent (USD) plt.title('Amount Spent by Membership Organization')

plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

19 of 30 10/18/2023, 8:12 AM

In [105]: # Box plot for Link clicks by Membership Organization plt.figure(figsize=(12, 10))

sns.boxplot(data
=combined_data, x='Membership Organization', y='Link clicks' plt.title('Link Clicks by Membership Organization')

plt.show()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

20 of 30 10/18/2023, 8:12 AM

In [106]:

Initiation Date 2023-01-09 1440 2023-01-24 3536 2023-02-17 7280 2023-03-02 5520 2023-03-03 4800 2023-03-28 6000 2023-05-01 54208 2023-05-09 992 2023-05-31 416 2023-06-07 1168 2023-06-10 1248 2023-06-23 17632 2023-07-03 2720 2023-07-05 1344 2023-07-14 2560 2023-07-20 2240 2023-07-21 4800 2023-07-28 7104 2023-08-01 56952 2023-08-21 216 2023-09-01 1872

Name: Initiation Date, dtype: int64

# Converting "Initiation Date" to a datetime object

combined_data['Initiation Date'] = pd.to_datetime(combined_data['Initiation Date'

# Grouping by day and count the occurrences

initiation_date_grouped = combined_data.groupby(combined_data['Initiation Date'

# Displaying the result

print(initiation_date_grouped)

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

21 of 30 10/18/2023, 8:12 AM

In [107]:

Out[107]:

Month Delivery status Reach Impressions Frequency

Amount spent (USD)

Cost per result Starts

592

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

593

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

594

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

595

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

596

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

5 rows × 31 columns

# Grouping by day and counting the occurrences, then transforming it to create a n

combined_data['Membership Reg'] = combined_data.groupby(combined_data['Initiation

# Displaying the updated DataFrame

combined_data.head()

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

22 of 30 10/18/2023, 8:12 AM

In [108]:

Out[108]:

Month Delivery status Reach Impressions Frequency

Amount spent (USD)

Cost per result

592

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

593

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

594

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

595

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

596

2023-09-01 - 2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

... ... ... ... ... ... ... ...

207475

2023-01-01

-

2023-01-31

not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09

207476

2023-01-01

-

2023-01-31

not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09

207477

2023-01-01

-

2023-01-31

not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09

207478

2023-01-01

-

2023-01-31

not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09

207479

2023-01-01

-

2023-01-31

not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09

184048 rows × 31 columns

# Defining the threshold for "High" membership registration count threshold = 20000

# Creating a new column "Membership Reg Label" based on the threshold

combined_data['Membership Reg Label'] = combined_data['Membership Reg'].apply

# Displaying the updated DataFrame

combined_data

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

23 of 30 10/18/2023, 8:12 AM

In [109]:

Out[109]:

Month Delivery status Reach Impressions Frequency

Amount spent (USD)

Cost per result

592

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

593

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

594

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

595

2023-09-01

-

2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

596

2023-09-01 - 2023-09-30

active 494 5510 11.153846 192.61 64.203333 2023-07-28

... ... ... ... ... ... ... ...

207475

2023-01-01

-

2023-01-31

not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09

207476

2023-01-01

-

2023-01-31

not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09

207477

2023-01-01

-

2023-01-31

not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09

207478

2023-01-01

-

2023-01-31

not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09

207479

2023-01-01

-

2023-01-31

not_delivering 4147 17250 4.159633 206.34 9.379091 2023-01-09

184048 rows × 31 columns

# Mapping 'High' to 1 and 'Low' to 0 in a new column 'Membership Reg Binary'

combined_data['Membership Reg Binary'] = combined_data['Membership Reg Label'

# Displaying the updated DataFrame

combined_data

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

24 of 30 10/18/2023, 8:12 AM

In [111]:

One Hot Encoding

# Selecting the independent variables

X = [

'Reach', 'Impressions', 'Frequency',

'Amount spent (USD)',

'CPC (cost per link click)',

'CPM (cost per 1,000 impressions)', 'Clicks (all)',

'Facebook reach', 'Facebook Page likes', 'New Facebook Page likes', 'Membership Organization', 'Membership Level',

]

# Defining the dependent variable

y = 'Membership Reg Binary'

# Creating a new DataFrame with selected independent and dependent variables

selected_data = combined_data[X + [y]]

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

25 of 30 10/18/2023, 8:12 AM

In [113]:

Splitting the dataset

In [114]:

In [115]:

1. Logistic Regression Model

Out[113]:

Reach Impressions Frequency

Amount spent (USD)

CPC (cost per link click)

CPM (cost per 1,000 impressions)

Clicks (all)

Facebook reach

592 -0.823863 593 -0.823863 594 -0.823863 595 -0.823863 596 -0.823863 -0.688965 -0.688965 -0.688965 -0.688965 -0.688965
1.413193

1.413193

1.413193

1.413193

-0.758285

5 rows × 34 columns

# One-hot encoding of categorical variables

data = pd.get_dummies(selected_data, columns=['Membership Organization', 'Membersh

# Separating the numerical and one-hot encoded features

numerical_features = ['Reach', 'Impressions', 'Frequency', 'Amount spent (USD)'

categorical_features = [col for col in data.columns if col not in numerical_featur

# Scaling the numerical features using StandardScaler

scaler = StandardScaler()

data[numerical_features]
= scaler.fit_transform(data[numerical_features])

# Displaying the encoded data

data.head()

# Spliting the dataset into features (X) and target (y)

X = data.drop(columns=['Membership Reg Binary']) y = data['Membership Reg Binary']

# Splitting the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_st

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

26 of 30 10/18/2023, 8:12 AM

In [116]:

2. Naive Bayes Model

Accuracy: 90.20% Logistic Regression:

precision recall f1-score support

0 0.91 0.84 0.87 14628

1 0.90 0.94 0.92 22182

accuracy 0.90 36810

macro avg 0.90 0.89 0.90 36810

weighted avg 0.90 0.90 0.90 36810

[[12285 2343] [ 1263 20919]]

C:\Users\DELL\anaconda3\lib\site-packages\sklearn\linear_model\_logistic. py:444: ConvergenceWarning: lbfgs failed to converge (status=1):

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown i n:

https://scikit-learn.org/stable/modules/preprocessing.html (https://s

cikit-learn.org/stable/modules/preprocessing.html)

Please also refer to the documentation for alternative solver options:

https://scikit-learn.org/stable/modules/linear_model.html#logistic-re gression (https://scikit-learn.org/stable/modules/linear_model.html#logis tic-regression)

n_iter_i = _check_optimize_result(

from sklearn.linear_model import LogisticRegression

# Create and fit the Logistic Regression model

logistic_model = LogisticRegression() logistic_model.fit(X_train, y_train)

# Predict on the test set

logistic_predictions = logistic_model.predict(X_test)

# Calculate accuracy

accuracy_log = accuracy_score(y_test, logistic_predictions)

# Evaluate Logistic Regression

print("Accuracy: {:.2f}%".format(accuracy_log * 100)) print("Logistic Regression:") print(classification_report(y_test, logistic_predictions)) print(confusion_matrix(y_test, logistic_predictions))

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

27 of 30 10/18/2023, 8:12 AM

In [118]:

3. K-Nearest Neighbour

Accuracy: 87.09% Confusion Matrix: [[10605 4023]

[ 728 21454]] Classification Report:

precision recall f1-score support

0 0.94 0.72 0.82 14628

1 0.84 0.97 0.90 22182

accuracy 0.87 36810

macro avg 0.89 0.85 0.86 36810

weighted avg 0.88 0.87 0.87 36810

# Initializing and training a Gaussian Naive Bayes classifier

naive_bayes_classifier = GaussianNB() naive_bayes_classifier.fit(X_train, y_train)

# Making predictions on the testing set

y_pred = naive_bayes_classifier.predict(X_test)

# Calculating accuracy

accuracy = accuracy_score(y_test, y_pred)

# Creating a confusion matrix

confusion_matrix_result = confusion_matrix(y_test, y_pred)

# Generating a classification report

classification_report_result = classification_report(y_test, y_pred)

# Displaying the results

print("Accuracy: {:.2f}%".format(accuracy * 100)) print("Confusion Matrix:") print(confusion_matrix_result) print("Classification Report:") print(classification_report_result)

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

28 of 30 10/18/2023, 8:12 AM

In [119]:

Feature: Facebook reach, Score: 0.6737807119570731

Feature: Amount spent (USD), Score: 0.6716786730812395

Feature: Frequency, Score: 0.6716515061813902

Feature: Impressions, Score: 0.6716413185939468

Feature: Reach, Score: 0.6716175475565788

Feature: CPM (cost per 1,000 impressions), Score: 0.6715632137568803

Feature: CPC (cost per link click), Score: 0.6715088799571819

Feature: Facebook Page likes, Score: 0.6664605683017847

Feature: Clicks (all), Score: 0.6265928324696309

Feature: New Facebook Page likes, Score: 0.5981146874141803

Feature: Membership Organization_Auto Renew Digital, Score: 0.14653950529 041393

Feature: Membership Level_Monthly Digital, Score: 0.10963007117148704

Feature: Membership Organization_Auto Renew Core, Score: 0.09113822980519 659

Feature: Membership Level_Annual Digital, Score: 0.03146238743828755

Feature: Membership Level_Supporter, Score: 0.0218728648286588

Feature: Membership Level_Contributor, Score: 0.01100593560702312

Feature: Membership Level_Senior, Score: 0.010553147368204119

# Initializing a KNN classifier

knn_classifier = KNeighborsClassifier(n_neighbors=3)

# Initializing SelectKBest with mutual information as the scoring function

selector = SelectKBest(score_func=mutual_info_classif, k='all')

# Fitting the selector to the training data

selector.fit(X_train, y_train)

# Getting feature scores

feature_scores = selector.scores_

# Getting the names of features and their scores

feature_names = X.columns

feature_scores_dict
= dict(zip(feature_names, feature_scores))

# Sorting features by their scores

sorted_features = sorted(feature_scores_dict.items(), key=lambda x: x[1], reverse

# Printing the sorted features and their scores

for feature, score in sorted_features: print(f"Feature: {feature}, Score: {score}")

# Training KNN with the selected features

selected_features = selector.transform(X_train) knn_classifier.fit(selected_features, y_train)

# Making predictions on the testing set

selected_test_features = selector.transform(X_test) y_pred = knn_classifier.predict(selected_test_features)

# Calculating accuracy with the selected features

accuracy = accuracy_score(y_test, y_pred)

# Displaying accuracy

print("Accuracy with selected features: {:.2f}%".format(accuracy * 100))

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

29 of 30 10/18/2023, 8:12 AM

In [ ]:

Feature: Membership Organization_Digital Membership, Score: 0.00401456777 7799849

Feature: Membership Level_Presenter, Score: 0.003670884719077838

Feature: Membership Level_Student, Score: 0.003644116868627645

Feature: Membership Organization_Complimentary Membership, Score: 0.00341 27659588549797

Feature: Membership Level_Benefactor, Score: 0.003381116702806608

Feature: Membership Organization_Individual Membership, Score: 0.00286567 69728065967

Feature: Membership Level_Director, Score: 0.002366614215765006

Feature: Membership Level_Visionary, Score: 0.002102882854883248

Feature: Membership Level_Producer, Score: 0.0019864536065339333

Feature: Membership Level_Legend, Score: 0.000676206656551992

Feature: Membership Level_Artist, Score: 0.0006015666572929401

Feature: Membership Level_Emerging Patron, Score: 0.000439211408163942

Feature: Membership Level_One Month Digital, Score: 0.0001906287739976697 5

Feature: Membership Level_Master, Score: 0.0

Feature: Membership Level_Patron, Score: 0.0

Feature: Membership Level_Presenters Circle, Score: 0.0

Accuracy with selected features: 99.84%

Jazz Membership(1) - Jupyter Notebook http://localhost:8888/notebooks/Downloads/Jazz%20Membership(1).ipynb#

30 of 30 10/18/2023, 8:12 AM

No comments:

Post a Comment

The Need for Efficient Cable Organizer in the Digital Age

    Table of Contents 1       Introduction . 5 2       Literature review .. 5 3       Quality of Theoretical Foundati...