UE8: notebook séance 5
Notebook UE8 séance 5
import pandas
df = pandas.read_excel("Donnees_M2_RD.xlsx")
Distance en mètres
dist_i_m = pandas.DataFrame(
{ "Dist_I_m": [ 0.2, 0.3, 0.4, 0.6, 0.8 ] },
index = [1,2,3,4,5]
)
dist_e_m = pandas.DataFrame(
{ "Dist_E_m": [ 2, 3, 4, 6, 8 ] },
index = [1,2,3,4,5]
)
df_i = df[df["Space"] == "I"]
dfa = pandas.merge(df_i, dist_i_m, left_on='Dist_A', right_index=True)
dfa2 = dfa.rename(columns={ 'Dist_I_m': 'Dist_A_m' })
dfab = pandas.merge(dfa2, dist_i_m, left_on='Dist_B', right_index=True)
df_i_m = dfab.rename(columns={ 'Dist_I_m': 'Dist_B_m' })
df_e = df[df["Space"] == "E"]
dfa = pandas.merge(df_e, dist_e_m, left_on='Dist_A', right_index=True)
dfa2 = dfa.rename(columns={ 'Dist_E_m': 'Dist_A_m' })
dfab = pandas.merge(dfa2, dist_e_m, left_on='Dist_B', right_index=True)
df_e_m = dfab.rename(columns={ 'Dist_E_m': 'Dist_B_m' })
df_m = pandas.concat([df_i_m, df_e_m])
df_m
Subject | Name_A | Name_B | Dist_A | Dist_B | Mode | Space | Side | Response | RT | Dist_A_m | Dist_B_m | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
150 | P_ADI_331 | 2 | 3 | 3 | 4 | Dic | I | D | 2 | 14608 | 0.4 | 0.6 |
180 | P_ADI_331 | 4 | 3 | 3 | 4 | Dic | I | D | 1 | 9086 | 0.4 | 0.6 |
207 | P_ADI_331 | 4 | 2 | 3 | 4 | Dic | I | G | 1 | 7251 | 0.4 | 0.6 |
213 | P_ADI_331 | 0 | 4 | 3 | 4 | Dic | I | G | 1 | 9298 | 0.4 | 0.6 |
250 | P_ADI_331 | 1 | 0 | 3 | 4 | Dio | I | D | 1 | 11246 | 0.4 | 0.6 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9364 | P_VAR_330 | 1 | 0 | 5 | 2 | Dic | E | D | 2 | 8296 | 8.0 | 3.0 |
9374 | P_VAR_330 | 2 | 1 | 5 | 2 | Dic | E | D | 2 | 12260 | 8.0 | 3.0 |
9412 | P_VAR_330 | 4 | 3 | 5 | 2 | Dio | E | D | 2 | 9414 | 8.0 | 3.0 |
9433 | P_VAR_330 | 3 | 1 | 5 | 2 | Dio | E | D | 2 | 16334 | 8.0 | 3.0 |
9437 | P_VAR_330 | 1 | 2 | 5 | 2 | Dio | E | D | 1 | 8802 | 8.0 | 3.0 |
9594 rows × 12 columns
Exercices fin séance 4
Donner le temps de réaction maximal par sujet.
df_m.groupby(by="Subject").max()['RT']
Subject
P_ADI_331 18865
P_ALM_345 51807
P_AMY_346 29020
P_BAM_347 24332
P_BEH_340 50600
P_BLC_325 32306
P_BLR_321 44985
P_BOA_321 89205
P_BOC_342 103152
P_CAR_327 65254
P_CAV_333 36168
P_CON_336 37280
P_GAM_338 52262
P_GHM_334 33643
P_GRC_341 38528
P_GRF_322 42200
P_LAC_354 93540
P_LEG_335 49580
P_MOE_339 34250
P_ROS_336 54405
P_SOA_337 31123
P_TAI_343 62619
P_VAL_329 80916
P_VAR_330 80729
Name: RT, dtype: int64
Donner le temps de réaction moyen par premier nom (Name_A
).
df_m.groupby("Name_A")["RT"].mean()
Name_A
0 12154.498436
1 11884.791667
2 11746.023983
3 12335.281918
4 12129.829078
Name: RT, dtype: float64
Donner le temps de réaction moyen selon la distance en mètres du premier prénom prononcé.
df_m.groupby(by="Dist_A_m")["RT"].mean()
Dist_A_m
0.2 10430.084375
0.3 11383.621086
0.4 11689.054223
0.6 11451.791667
0.8 11121.932292
2.0 11842.881126
3.0 13238.736458
4.0 12948.668405
6.0 13187.716371
8.0 13206.540625
Name: RT, dtype: float64
Donner le sujet ayant le plus faible temps de réaction moyen (utiliser .idxmin()).
df_m.groupby("Subject")["RT"].mean()
Subject
P_ADI_331 10042.732500
P_ALM_345 13103.350000
P_AMY_346 9561.292500
P_BAM_347 9096.540000
P_BEH_340 13187.195000
P_BLC_325 11366.338346
P_BLR_321 9657.929825
P_BOA_321 15787.130326
P_BOC_342 12779.680000
P_CAR_327 12457.844221
P_CAV_333 10551.257500
P_CON_336 8352.742500
P_GAM_338 11805.215000
P_GHM_334 11220.800000
P_GRC_341 11213.115000
P_GRF_322 15652.556391
P_LAC_354 15617.710000
P_LEG_335 10904.325000
P_MOE_339 8470.447500
P_ROS_336 16019.425000
P_SOA_337 10051.175000
P_TAI_343 16413.327500
P_VAL_329 13661.730000
P_VAR_330 12240.965000
Name: RT, dtype: float64
df_m.groupby("Subject")["RT"].mean().idxmin()
'P_CON_336'
Retour sur les données du M1
Chargement des données
expes = []
for i in range(8):
filename = "expe/subject-"+str(i)+".csv"
print("Loading "+filename)
df = pandas.read_csv(filename)
df['Subject'] = i
expes.append(df)
dfm1 = pandas.concat(expes)
dfm1 = dfm1.reset_index()
Loading expe/subject-0.csv
Loading expe/subject-1.csv
Loading expe/subject-2.csv
Loading expe/subject-3.csv
Loading expe/subject-4.csv
Loading expe/subject-5.csv
Loading expe/subject-6.csv
Loading expe/subject-7.csv
On créée deux colonnes:
Sourire
qui contient un booléenGenre
qui vaut soitF
, soitM
Les valeurs de ces colonnes sont issues des noms des images utilisées.
Si on a une série de valeurs textuelles, on peut utiliser .str
pour accéder aux opérations habituelles de Python de manipulation de texte.
Ici, on utilise .get(n)
pour récupérer le nième caractère
dfm1['Sourire'] = dfm1['image'].str.get(0) == 'S'
dfm1['Genre'] = dfm1['image'].str.get(1)
dfm1[['image','Sourire','Genre']]
image | Sourire | Genre | |
---|---|---|---|
0 | SF-1018.jpg | True | F |
1 | NF-1060.jpg | False | F |
2 | NF-1043.jpg | False | F |
3 | NF-1012.jpg | False | F |
4 | SF-1108.jpg | True | F |
... | ... | ... | ... |
3499 | NF-1032.jpg | False | F |
3500 | SM-1064.jpg | True | M |
3501 | SF-1030.jpg | True | F |
3502 | SF-1106.jpg | True | F |
3503 | SF-1006.jpg | True | F |
3504 rows × 3 columns