Notebook de la séance 8
import pandas
df = pandas.read_excel("Donnees_M2_RD.xlsx")
df["reussi"] = (((df["Dist_A"] > df["Dist_B"]) & (df["Response"] == 2))
| ((df["Dist_A"] < df["Dist_B"]) & (df["Response"] == 1)))
Index à partir de séries
s123 = pandas.Series([1,2,3])
sabcd = pandas.Series(["A","B","C","D"])
pandas.MultiIndex.from_product(
[s123, sabcd],
names=["chiffres", "lettres"])
MultiIndex([(1, 'A'),
(1, 'B'),
(1, 'C'),
(1, 'D'),
(2, 'A'),
(2, 'B'),
(2, 'C'),
(2, 'D'),
(3, 'A'),
(3, 'B'),
(3, 'C'),
(3, 'D')],
names=['chiffres', 'lettres'])
Créer une liste à partir d’une liste existante
valeurs = [ 3, 4, 5, 6 , 7, 8 ]
valeurs_plus_3 = [ v+3 for v in valeurs ]
valeurs_plus_3
[6, 7, 8, 9, 10, 11]
df["Subject"].drop_duplicates()
0 P_ADI_331
400 P_ALM_345
800 P_AMY_346
1200 P_BAM_347
1600 P_BEH_340
2000 P_BLC_325
2399 P_BLR_321
2798 P_BOA_321
3197 P_BOC_342
3597 P_CAR_327
3995 P_CAV_333
4395 P_CON_336
4795 P_GAM_338
5195 P_GHM_334
5595 P_GRC_341
5995 P_GRF_322
6394 P_LAC_354
6794 P_LEG_335
7194 P_MOE_339
7594 P_ROS_336
7994 P_SOA_337
8394 P_TAI_343
8794 P_VAL_329
9194 P_VAR_330
Name: Subject, dtype: object
df["Name_A"].drop_duplicates()
0 0
1 1
2 4
3 2
17 3
Name: Name_A, dtype: int64
colonnes = [ "Subject", "Space", "Name_A", "Name_B", "Dist_A", "Dist_B", "Mode", "Side" ]
valeurs_colonnes = [ df[col].drop_duplicates() for col in colonnes ]
Fabrication d’un index par combinaison de ces valeurs
combinaisons = pandas.MultiIndex.from_product(
valeurs_colonnes,
names=colonnes)
combinaisons
MultiIndex([('P_ADI_331', 'E', 0, 2, 2, 4, 'Dic', 'D'),
('P_ADI_331', 'E', 0, 2, 2, 4, 'Dic', 'G'),
('P_ADI_331', 'E', 0, 2, 2, 4, 'Dio', 'D'),
('P_ADI_331', 'E', 0, 2, 2, 4, 'Dio', 'G'),
('P_ADI_331', 'E', 0, 2, 2, 1, 'Dic', 'D'),
('P_ADI_331', 'E', 0, 2, 2, 1, 'Dic', 'G'),
('P_ADI_331', 'E', 0, 2, 2, 1, 'Dio', 'D'),
('P_ADI_331', 'E', 0, 2, 2, 1, 'Dio', 'G'),
('P_ADI_331', 'E', 0, 2, 2, 2, 'Dic', 'D'),
('P_ADI_331', 'E', 0, 2, 2, 2, 'Dic', 'G'),
...
('P_VAR_330', 'I', 3, 0, 5, 2, 'Dio', 'D'),
('P_VAR_330', 'I', 3, 0, 5, 2, 'Dio', 'G'),
('P_VAR_330', 'I', 3, 0, 5, 3, 'Dic', 'D'),
('P_VAR_330', 'I', 3, 0, 5, 3, 'Dic', 'G'),
('P_VAR_330', 'I', 3, 0, 5, 3, 'Dio', 'D'),
('P_VAR_330', 'I', 3, 0, 5, 3, 'Dio', 'G'),
('P_VAR_330', 'I', 3, 0, 5, 5, 'Dic', 'D'),
('P_VAR_330', 'I', 3, 0, 5, 5, 'Dic', 'G'),
('P_VAR_330', 'I', 3, 0, 5, 5, 'Dio', 'D'),
('P_VAR_330', 'I', 3, 0, 5, 5, 'Dio', 'G')],
names=['Subject', 'Space', 'Name_A', 'Name_B', 'Dist_A', 'Dist_B', 'Mode', 'Side'], length=120000)
df.loc[df["reussi"], "nombre_reussis"] = 1
df.loc[~ (df["reussi"]), "nombre_reussis"] = 0
df["essais"] = 1
df2 = df.set_index(colonnes)[["nombre_reussis", "essais"]]
df3 = df2.reindex(index=combinaisons, fill_value=0)
df3
nombre_reussis | essais | ||||||||
---|---|---|---|---|---|---|---|---|---|
Subject | Space | Name_A | Name_B | Dist_A | Dist_B | Mode | Side | ||
P_ADI_331 | E | 0 | 2 | 2 | 4 | Dic | D | 0.0 | 1 |
G | 0.0 | 0 | |||||||
Dio | D | 0.0 | 0 | ||||||
G | 0.0 | 0 | |||||||
1 | Dic | D | 0.0 | 0 | |||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
P_VAR_330 | I | 3 | 0 | 5 | 3 | Dio | G | 0.0 | 0 |
5 | Dic | D | 0.0 | 0 | |||||
G | 0.0 | 0 | |||||||
Dio | D | 0.0 | 0 | ||||||
G | 0.0 | 0 |
120000 rows × 2 columns
Les combinaisons sans essai
df3[df3["essais"] == 0]
nombre_reussis | essais | ||||||||
---|---|---|---|---|---|---|---|---|---|
Subject | Space | Name_A | Name_B | Dist_A | Dist_B | Mode | Side | ||
P_ADI_331 | E | 0 | 2 | 2 | 4 | Dic | G | 0.0 | 0 |
Dio | D | 0.0 | 0 | ||||||
G | 0.0 | 0 | |||||||
1 | Dic | D | 0.0 | 0 | |||||
G | 0.0 | 0 | |||||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
P_VAR_330 | I | 3 | 0 | 5 | 3 | Dio | G | 0.0 | 0 |
5 | Dic | D | 0.0 | 0 | |||||
G | 0.0 | 0 | |||||||
Dio | D | 0.0 | 0 | ||||||
G | 0.0 | 0 |
110406 rows × 2 columns
Ajout d’une colonne du nombre de combinaisons
df3["combinaisons"] = 1
Taux de combinaisons explorées par sujet
comb_subj = df3.groupby(by="Subject")["combinaisons"].sum()
essais_subj = df3.groupby(by="Subject")["essais"].sum()
essais_subj / comb_subj
Subject
P_ADI_331 0.0800
P_ALM_345 0.0800
P_AMY_346 0.0800
P_BAM_347 0.0800
P_BEH_340 0.0800
P_BLC_325 0.0798
P_BLR_321 0.0798
P_BOA_321 0.0798
P_BOC_342 0.0800
P_CAR_327 0.0796
P_CAV_333 0.0800
P_CON_336 0.0800
P_GAM_338 0.0800
P_GHM_334 0.0800
P_GRC_341 0.0800
P_GRF_322 0.0798
P_LAC_354 0.0800
P_LEG_335 0.0800
P_MOE_339 0.0800
P_ROS_336 0.0800
P_SOA_337 0.0800
P_TAI_343 0.0800
P_VAL_329 0.0800
P_VAR_330 0.0800
dtype: float64
Taux de combinaisons explorées par sujet et Mode
comb_subj = df3.groupby(by=["Subject","Mode"])["combinaisons"].sum()
essais_subj = df3.groupby(by=["Subject", "Mode"])["essais"].sum()
essais_subj / comb_subj
Subject Mode
P_ADI_331 Dic 0.0800
Dio 0.0800
P_ALM_345 Dic 0.0800
Dio 0.0800
P_AMY_346 Dic 0.0800
Dio 0.0800
P_BAM_347 Dic 0.0800
Dio 0.0800
P_BEH_340 Dic 0.0800
Dio 0.0800
P_BLC_325 Dic 0.0796
Dio 0.0800
P_BLR_321 Dic 0.0796
Dio 0.0800
P_BOA_321 Dic 0.0796
Dio 0.0800
P_BOC_342 Dic 0.0800
Dio 0.0800
P_CAR_327 Dic 0.0796
Dio 0.0796
P_CAV_333 Dic 0.0800
Dio 0.0800
P_CON_336 Dic 0.0800
Dio 0.0800
P_GAM_338 Dic 0.0800
Dio 0.0800
P_GHM_334 Dic 0.0800
Dio 0.0800
P_GRC_341 Dic 0.0800
Dio 0.0800
P_GRF_322 Dic 0.0796
Dio 0.0800
P_LAC_354 Dic 0.0800
Dio 0.0800
P_LEG_335 Dic 0.0800
Dio 0.0800
P_MOE_339 Dic 0.0800
Dio 0.0800
P_ROS_336 Dic 0.0800
Dio 0.0800
P_SOA_337 Dic 0.0800
Dio 0.0800
P_TAI_343 Dic 0.0800
Dio 0.0800
P_VAL_329 Dic 0.0800
Dio 0.0800
P_VAR_330 Dic 0.0800
Dio 0.0800
dtype: float64