Aller au contenu

Les règles d'association

Les règles d'association sont un algorithme d'apprentissage non supervisé. Il est utilisé pour analyser les habitudes d'achats des clients.

png

Itemset

U itemset est un utiliser l'algorithme du KMeans pour former un modèle, il faut des données numériques.

Support

Le support d'un itemset est la fréquence d'un itemset.

Méthode d'optimisation

Condition d'arrêt

Association Rules

Packages

# !pip install -q pandas mlxtend
import re

import pandas as pd

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth

Paths & Config

DATA_PATH = "/kaggle/input/sales-forecasting/train.csv"

SUPPORT = 0.2

Get Data

sales_data = pd.read_csv(DATA_PATH)
sales_data.sample(10)
Row ID Order ID Order Date Ship Date Ship Mode Customer ID Customer Name Segment Country City State Postal Code Region Product ID Category Sub-Category Product Name Sales
2624 2625 CA-2018-127180 22/10/2018 24/10/2018 First Class TA-21385 Tom Ashbrook Home Office United States New York City New York 10024.0 East TEC-PH-10001494 Technology Phones Polycom CX600 IP Phone VoIP phone 2399.600
4891 4892 CA-2017-135776 23/12/2017 30/12/2017 Standard Class EH-13765 Edward Hooks Corporate United States Seattle Washington 98103.0 West OFF-PA-10001295 Office Supplies Paper Computer Printout Paper with Letter-Trim Perfo... 37.940
3462 3463 CA-2016-152611 20/02/2016 23/02/2016 Second Class KA-16525 Kelly Andreada Consumer United States Perth Amboy New Jersey 8861.0 East OFF-AR-10003903 Office Supplies Art Sanford 52201 APSCO Electric Pencil Sharpener 286.790
2295 2296 CA-2016-113145 01/11/2016 05/11/2016 Standard Class VD-21670 Valerie Dominguez Consumer United States New York City New York 10011.0 East OFF-PA-10002659 Office Supplies Paper Avoid Verbal Orders Carbonless Minifold Book 13.520
223 224 CA-2016-169397 24/12/2016 27/12/2016 First Class JB-15925 Joni Blumstein Consumer United States Dublin Ohio 43017.0 East TEC-MA-10001148 Technology Machines Swingline SM12-08 MicroCut Jam Free Shredder 479.988
3559 3560 CA-2018-152737 07/11/2018 12/11/2018 Standard Class TS-21505 Tony Sayre Consumer United States San Francisco California 94122.0 West TEC-AC-10004975 Technology Accessories Plantronics Audio 995 Wireless Stereo Headset 439.800
9642 9643 CA-2015-104563 07/03/2015 12/03/2015 Standard Class CM-12715 Craig Molinari Corporate United States Seattle Washington 98103.0 West FUR-CH-10002780 Furniture Chairs Office Star - Task Chair with Contemporary Loo... 436.704
3761 3762 CA-2018-104577 12/05/2018 17/05/2018 Standard Class CK-12205 Chloris Kastensmidt Consumer United States Everett Massachusetts 2149.0 East OFF-PA-10000659 Office Supplies Paper TOPS Carbonless Receipt Book, Four 2-3/4 x 7-1... 87.600
3551 3552 CA-2017-152555 29/03/2017 02/04/2017 Second Class ME-17320 Maria Etezadi Home Office United States Chicago Illinois 60653.0 Central FUR-CH-10002965 Furniture Chairs Global Leather Highback Executive Chair with P... 844.116
7010 7011 US-2015-135881 23/05/2015 27/05/2015 Standard Class GT-14710 Greg Tran Consumer United States New York City New York 10035.0 East OFF-BI-10000829 Office Supplies Binders Avery Non-Stick Binders 17.960

Explore Data

sales_data.shape
(9800, 18)
sales_data.describe()
Row ID Postal Code Sales
count 9800.000000 9789.000000 9800.000000
mean 4900.500000 55273.322403 230.769059
std 2829.160653 32041.223413 626.651875
min 1.000000 1040.000000 0.444000
25% 2450.750000 23223.000000 17.248000
50% 4900.500000 58103.000000 54.490000
75% 7350.250000 90008.000000 210.605000
max 9800.000000 99301.000000 22638.480000
sales_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9800 entries, 0 to 9799
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Row ID         9800 non-null   int64  
 1   Order ID       9800 non-null   object 
 2   Order Date     9800 non-null   object 
 3   Ship Date      9800 non-null   object 
 4   Ship Mode      9800 non-null   object 
 5   Customer ID    9800 non-null   object 
 6   Customer Name  9800 non-null   object 
 7   Segment        9800 non-null   object 
 8   Country        9800 non-null   object 
 9   City           9800 non-null   object 
 10  State          9800 non-null   object 
 11  Postal Code    9789 non-null   float64
 12  Region         9800 non-null   object 
 13  Product ID     9800 non-null   object 
 14  Category       9800 non-null   object 
 15  Sub-Category   9800 non-null   object 
 16  Product Name   9800 non-null   object 
 17  Sales          9800 non-null   float64
dtypes: float64(2), int64(1), object(15)
memory usage: 1.3+ MB

Clean Data

new_columns = [re.sub(' |-', '_', column).strip() for column in list(sales_data.columns)]
sales_data.columns = new_columns
sales_data.head()
Row_ID Order_ID Order_Date Ship_Date Ship_Mode Customer_ID Customer_Name Segment Country City State Postal_Code Region Product_ID Category Sub_Category Product_Name Sales
0 1 CA-2017-152156 08/11/2017 11/11/2017 Second Class CG-12520 Claire Gute Consumer United States Henderson Kentucky 42420.0 South FUR-BO-10001798 Furniture Bookcases Bush Somerset Collection Bookcase 261.9600
1 2 CA-2017-152156 08/11/2017 11/11/2017 Second Class CG-12520 Claire Gute Consumer United States Henderson Kentucky 42420.0 South FUR-CH-10000454 Furniture Chairs Hon Deluxe Fabric Upholstered Stacking Chairs,... 731.9400
2 3 CA-2017-138688 12/06/2017 16/06/2017 Second Class DV-13045 Darrin Van Huff Corporate United States Los Angeles California 90036.0 West OFF-LA-10000240 Office Supplies Labels Self-Adhesive Address Labels for Typewriters b... 14.6200
3 4 US-2016-108966 11/10/2016 18/10/2016 Standard Class SO-20335 Sean O'Donnell Consumer United States Fort Lauderdale Florida 33311.0 South FUR-TA-10000577 Furniture Tables Bretford CR4500 Series Slim Rectangular Table 957.5775
4 5 US-2016-108966 11/10/2016 18/10/2016 Standard Class SO-20335 Sean O'Donnell Consumer United States Fort Lauderdale Florida 33311.0 South OFF-ST-10000760 Office Supplies Storage Eldon Fold 'N Roll Cart System 22.3680
sales_data.columns
Index(['Row_ID', 'Order_ID', 'Order_Date', 'Ship_Date', 'Ship_Mode',
       'Customer_ID', 'Customer_Name', 'Segment', 'Country', 'City', 'State',
       'Postal_Code', 'Region', 'Product_ID', 'Category', 'Sub_Category',
       'Product_Name', 'Sales'],
      dtype='object')

Transform Data

def create_cart(item):
    return [elt for elt in list(item)]
df = sales_data[["Customer_ID", "Sub_Category"]].groupby(by="Customer_ID").agg({"Sub_Category": create_cart}).reset_index().rename(columns={"Sub_Category": "cart"})
# df = sales_data[["Customer_ID", "Product_Name"]].groupby(by="Customer_ID").agg({"Product_Name": create_cart}).reset_index().rename(columns={"Product_Name": "cart"})
df.head()
Customer_ID cart
0 AA-10315 [Appliances, Binders, Storage, Binders, Applia...
1 AA-10375 [Storage, Furnishings, Accessories, Binders, A...
2 AA-10480 [Paper, Furnishings, Paper, Storage, Paper, Pa...
3 AA-10645 [Chairs, Phones, Chairs, Furnishings, Envelope...
4 AB-10015 [Chairs, Art, Storage, Storage, Phones, Bookca...
df['cart_size'] = df['cart'].apply(lambda x: len(x))
df.sample(15)
Customer_ID cart cart_size
378 JK-15625 [Chairs, Chairs, Bookcases, Art, Paper, Chairs... 12
363 JF-15565 [Fasteners, Paper, Appliances, Paper, Binders,... 16
422 KD-16270 [Paper, Tables, Paper, Storage, Machines, Art,... 16
505 MG-17875 [Appliances, Storage, Tables, Paper, Furnishin... 7
306 GM-14680 [Furnishings, Art, Fasteners, Phones, Tables, ... 11
415 KB-16315 [Paper, Furnishings, Binders, Art, Labels, Env... 22
594 PJ-18835 [Paper, Accessories, Accessories, Paper, Envel... 13
542 MT-17815 [Envelopes, Binders, Storage, Paper, Tables, P... 10
769 TT-21070 [Paper, Furnishings, Paper, Fasteners, Machine... 14
435 KM-16225 [Furnishings, Furnishings, Phones, Storage, Fa... 19
696 SJ-20500 [Appliances, Accessories, Binders, Chairs, Bin... 7
317 HA-14920 [Storage, Accessories, Furnishings, Binders, A... 18
384 JL-15175 [Paper, Furnishings, Chairs, Appliances, Binde... 7
251 EB-13975 [Binders, Binders, Copiers, Supplies, Binders,... 6
493 MC-18100 [Phones, Chairs, Paper, Furnishings, Storage, ... 19
X = df['cart'].values
te = TransactionEncoder()
te_ary = te.fit(X).transform(X)
df_x = pd.DataFrame(te_ary, columns=te.columns_)
df_x.head()
Accessories Appliances Art Binders Bookcases Chairs Copiers Envelopes Fasteners Furnishings Labels Machines Paper Phones Storage Supplies Tables
0 True True False True False False False False True True False False True True True True False
1 True False True True False False False False False True False False True True True False False
2 True False True False False False False False False True False False True True True False True
3 False False True True True True False True False True False False True True True False False
4 False False True False True True False False False False False False False True True False False
frequent_itemsets = fpgrowth(df_x, min_support=SUPPORT, use_colnames=True)
#### alternatively:
#frequent_itemsets = apriori(df, min_support=SUPPORT, use_colnames=True)
# frequent_itemsets = fpmax(df, min_support=SUPPORT, use_colnames=True)

frequent_itemsets.sort_values(by='support', ascending=False).head(10)
support itemsets
0 0.815889 (Binders)
1 0.757881 (Paper)
2 0.658260 (Furnishings)
3 0.641866 (Storage)
4 0.641866 (Phones)
14 0.640605 (Paper, Binders)
8 0.617907 (Art)
5 0.590164 (Accessories)
15 0.544767 (Binders, Furnishings)
18 0.537201 (Binders, Storage)
from mlxtend.frequent_patterns import association_rules

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7).head(10)
antecedents consequents antecedent support consequent support support confidence lift leverage conviction zhangs_metric
0 (Paper) (Binders) 0.757881 0.815889 0.640605 0.845258 1.035996 0.022258 1.189792 0.143506
1 (Binders) (Paper) 0.815889 0.757881 0.640605 0.785162 1.035996 0.022258 1.126983 0.188720
2 (Furnishings) (Binders) 0.658260 0.815889 0.544767 0.827586 1.014337 0.007700 1.067844 0.041359
3 (Furnishings) (Paper) 0.658260 0.757881 0.519546 0.789272 1.041419 0.020663 1.148963 0.116379
4 (Paper, Furnishings) (Binders) 0.519546 0.815889 0.438840 0.844660 1.035264 0.014948 1.185214 0.070896
5 (Binders, Furnishings) (Paper) 0.544767 0.757881 0.438840 0.805556 1.062904 0.025971 1.245181 0.130003
6 (Storage) (Binders) 0.641866 0.815889 0.537201 0.836935 1.025795 0.013509 1.129066 0.070216
7 (Storage) (Paper) 0.641866 0.757881 0.527112 0.821218 1.083571 0.040654 1.354267 0.215353
8 (Paper, Binders) (Storage) 0.640605 0.641866 0.451450 0.704724 1.097930 0.040267 1.212879 0.248182
9 (Paper, Storage) (Binders) 0.527112 0.815889 0.451450 0.856459 1.049725 0.021385 1.282640 0.100171

References

Let get in touch

Github Badge Facebook Badge Linkedin Badge Twitter Badge Gmail Badge