UE8: notebook séance 4
Notebook de la séance 4
import pandas
df = pandas.read_excel("Donnees_M2_RD.xlsx")
Jointures (merge)
On fabrique une Dataframe avec les correspondance entre indice de distance et distance en mètres. Deux dataframes: une pour la situation péripersonnel (space = I) et extrapersonnel (space = E)
dist_i_m = pandas.DataFrame(
{ "Dist_I_m": [ 0.2, 0.3, 0.4, 0.6, 0.8 ] },
index = [1,2,3,4,5]
)
dist_e_m = pandas.DataFrame(
{ "Dist_E_m": [ 2, 3, 4, 6, 8 ] },
index = [1,2,3,4,5]
)
dist_ie_m = pandas.concat([dist_i_m,dist_e_m],axis=1)
dist_i_m
Dist_I_m | |
---|---|
1 | 0.2 |
2 | 0.3 |
3 | 0.4 |
4 | 0.6 |
5 | 0.8 |
dist_e_m
Dist_E_m | |
---|---|
1 | 2 |
2 | 3 |
3 | 4 |
4 | 6 |
5 | 8 |
dist_ie_m
Dist_I_m | Dist_E_m | |
---|---|---|
1 | 0.2 | 2 |
2 | 0.3 | 3 |
3 | 0.4 | 4 |
4 | 0.6 | 6 |
5 | 0.8 | 8 |
# ou bien
dist_ie_m2 = dist_i_m.copy()
dist_ie_m2["Dist_E_m"] = dist_ie_m2["Dist_I_m"] * 10
dist_ie_m2
Dist_I_m | Dist_E_m | |
---|---|---|
1 | 0.2 | 2.0 |
2 | 0.3 | 3.0 |
3 | 0.4 | 4.0 |
4 | 0.6 | 6.0 |
5 | 0.8 | 8.0 |
On combine df et dist_i_m pour récupérer une distance en mètres
pandas.merge(df, dist_i_m, left_on='Dist_A', right_index=True)
Subject | Name_A | Name_B | Dist_A | Dist_B | Mode | Space | Side | Response | RT | Dist_I_m | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | P_ADI_331 | 0 | 2 | 2 | 4 | Dic | E | D | 2 | 18865 | 0.3 |
4 | P_ADI_331 | 1 | 2 | 2 | 4 | Dic | E | D | 1 | 11801 | 0.3 |
5 | P_ADI_331 | 2 | 1 | 2 | 3 | Dic | E | D | 2 | 12117 | 0.3 |
7 | P_ADI_331 | 0 | 3 | 2 | 4 | Dic | E | D | 1 | 13237 | 0.3 |
19 | P_ADI_331 | 3 | 1 | 2 | 3 | Dic | E | D | 2 | 12041 | 0.3 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9566 | P_VAR_330 | 4 | 2 | 5 | 1 | Dio | I | D | 2 | 9526 | 0.8 |
9574 | P_VAR_330 | 0 | 3 | 5 | 2 | Dio | I | D | 2 | 5526 | 0.8 |
9581 | P_VAR_330 | 3 | 2 | 5 | 4 | Dio | I | D | 1 | 11327 | 0.8 |
9590 | P_VAR_330 | 3 | 2 | 5 | 1 | Dio | I | D | 2 | 6349 | 0.8 |
9593 | P_VAR_330 | 0 | 3 | 5 | 1 | Dio | I | D | 2 | 10139 | 0.8 |
9594 rows × 11 columns
Plusieurs problèmes se posent:
- On a calculé une distance concrète que pour
Dist_A
, mais pas pourDist_B
- La distance est fausse si
Space = E
Si on fait deux jointures (une pour Dist_A
et une pour Dist_B
) on aura deux colonnes Dist_I_m
dans le resultat, il faut donc renommer.
dfa = pandas.merge(df, dist_i_m, left_on='Dist_A', right_index=True)
dfa2 = dfa.rename(columns={ 'Dist_I_m': 'Dist_A_m' })
dfab = pandas.merge(dfa2, dist_i_m, left_on='Dist_B', right_index=True)
dfm = dfab.rename(columns={ 'Dist_I_m': 'Dist_B_m' })
dfm
Subject | Name_A | Name_B | Dist_A | Dist_B | Mode | Space | Side | Response | RT | Dist_A_m | Dist_B_m | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | P_ADI_331 | 0 | 2 | 2 | 4 | Dic | E | D | 2 | 18865 | 0.3 | 0.6 |
4 | P_ADI_331 | 1 | 2 | 2 | 4 | Dic | E | D | 1 | 11801 | 0.3 | 0.6 |
7 | P_ADI_331 | 0 | 3 | 2 | 4 | Dic | E | D | 1 | 13237 | 0.3 | 0.6 |
36 | P_ADI_331 | 3 | 1 | 2 | 4 | Dic | E | D | 1 | 9523 | 0.3 | 0.6 |
51 | P_ADI_331 | 2 | 4 | 2 | 4 | Dio | E | G | 1 | 11331 | 0.3 | 0.6 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9437 | P_VAR_330 | 1 | 2 | 5 | 2 | Dio | E | D | 1 | 8802 | 0.8 | 0.3 |
9509 | P_VAR_330 | 4 | 2 | 5 | 2 | Dic | I | G | 2 | 5336 | 0.8 | 0.3 |
9522 | P_VAR_330 | 3 | 4 | 5 | 2 | Dic | I | G | 2 | 6058 | 0.8 | 0.3 |
9543 | P_VAR_330 | 4 | 1 | 5 | 2 | Dic | I | G | 2 | 5379 | 0.8 | 0.3 |
9574 | P_VAR_330 | 0 | 3 | 5 | 2 | Dio | I | D | 2 | 5526 | 0.8 | 0.3 |
9594 rows × 12 columns
Les valeurs sont OK pour Space
= I
, mais toujours fausses pour E
.
df_i = df[df["Space"] == "I"]
dfa = pandas.merge(df_i, dist_i_m, left_on='Dist_A', right_index=True)
dfa2 = dfa.rename(columns={ 'Dist_I_m': 'Dist_A_m' })
dfab = pandas.merge(dfa2, dist_i_m, left_on='Dist_B', right_index=True)
df_i_m = dfab.rename(columns={ 'Dist_I_m': 'Dist_B_m' })
df_i_m
Subject | Name_A | Name_B | Dist_A | Dist_B | Mode | Space | Side | Response | RT | Dist_A_m | Dist_B_m | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
150 | P_ADI_331 | 2 | 3 | 3 | 4 | Dic | I | D | 2 | 14608 | 0.4 | 0.6 |
180 | P_ADI_331 | 4 | 3 | 3 | 4 | Dic | I | D | 1 | 9086 | 0.4 | 0.6 |
207 | P_ADI_331 | 4 | 2 | 3 | 4 | Dic | I | G | 1 | 7251 | 0.4 | 0.6 |
213 | P_ADI_331 | 0 | 4 | 3 | 4 | Dic | I | G | 1 | 9298 | 0.4 | 0.6 |
250 | P_ADI_331 | 1 | 0 | 3 | 4 | Dio | I | D | 1 | 11246 | 0.4 | 0.6 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9498 | P_VAR_330 | 4 | 2 | 4 | 3 | Dic | I | G | 2 | 4811 | 0.6 | 0.4 |
9503 | P_VAR_330 | 3 | 0 | 4 | 3 | Dic | I | G | 2 | 6628 | 0.6 | 0.4 |
9526 | P_VAR_330 | 0 | 1 | 4 | 3 | Dic | I | G | 2 | 5458 | 0.6 | 0.4 |
9535 | P_VAR_330 | 2 | 4 | 4 | 3 | Dic | I | G | 1 | 8956 | 0.6 | 0.4 |
9546 | P_VAR_330 | 2 | 3 | 4 | 3 | Dio | I | D | 2 | 13525 | 0.6 | 0.4 |
4797 rows × 12 columns
df_e = df[df["Space"] == "E"]
dfa = pandas.merge(df_e, dist_e_m, left_on='Dist_A', right_index=True)
dfa2 = dfa.rename(columns={ 'Dist_E_m': 'Dist_A_m' })
dfab = pandas.merge(dfa2, dist_e_m, left_on='Dist_B', right_index=True)
df_e_m = dfab.rename(columns={ 'Dist_E_m': 'Dist_B_m' })
df_e_m
Subject | Name_A | Name_B | Dist_A | Dist_B | Mode | Space | Side | Response | RT | Dist_A_m | Dist_B_m | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | P_ADI_331 | 0 | 2 | 2 | 4 | Dic | E | D | 2 | 18865 | 3 | 6 |
4 | P_ADI_331 | 1 | 2 | 2 | 4 | Dic | E | D | 1 | 11801 | 3 | 6 |
7 | P_ADI_331 | 0 | 3 | 2 | 4 | Dic | E | D | 1 | 13237 | 3 | 6 |
36 | P_ADI_331 | 3 | 1 | 2 | 4 | Dic | E | D | 1 | 9523 | 3 | 6 |
51 | P_ADI_331 | 2 | 4 | 2 | 4 | Dio | E | G | 1 | 11331 | 3 | 6 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9364 | P_VAR_330 | 1 | 0 | 5 | 2 | Dic | E | D | 2 | 8296 | 8 | 3 |
9374 | P_VAR_330 | 2 | 1 | 5 | 2 | Dic | E | D | 2 | 12260 | 8 | 3 |
9412 | P_VAR_330 | 4 | 3 | 5 | 2 | Dio | E | D | 2 | 9414 | 8 | 3 |
9433 | P_VAR_330 | 3 | 1 | 5 | 2 | Dio | E | D | 2 | 16334 | 8 | 3 |
9437 | P_VAR_330 | 1 | 2 | 5 | 2 | Dio | E | D | 1 | 8802 | 8 | 3 |
4797 rows × 12 columns
df_m = pandas.concat([df_i_m, df_e_m])
df_m
Subject | Name_A | Name_B | Dist_A | Dist_B | Mode | Space | Side | Response | RT | Dist_A_m | Dist_B_m | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
150 | P_ADI_331 | 2 | 3 | 3 | 4 | Dic | I | D | 2 | 14608 | 0.4 | 0.6 |
180 | P_ADI_331 | 4 | 3 | 3 | 4 | Dic | I | D | 1 | 9086 | 0.4 | 0.6 |
207 | P_ADI_331 | 4 | 2 | 3 | 4 | Dic | I | G | 1 | 7251 | 0.4 | 0.6 |
213 | P_ADI_331 | 0 | 4 | 3 | 4 | Dic | I | G | 1 | 9298 | 0.4 | 0.6 |
250 | P_ADI_331 | 1 | 0 | 3 | 4 | Dio | I | D | 1 | 11246 | 0.4 | 0.6 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9364 | P_VAR_330 | 1 | 0 | 5 | 2 | Dic | E | D | 2 | 8296 | 8.0 | 3.0 |
9374 | P_VAR_330 | 2 | 1 | 5 | 2 | Dic | E | D | 2 | 12260 | 8.0 | 3.0 |
9412 | P_VAR_330 | 4 | 3 | 5 | 2 | Dio | E | D | 2 | 9414 | 8.0 | 3.0 |
9433 | P_VAR_330 | 3 | 1 | 5 | 2 | Dio | E | D | 2 | 16334 | 8.0 | 3.0 |
9437 | P_VAR_330 | 1 | 2 | 5 | 2 | Dio | E | D | 1 | 8802 | 8.0 | 3.0 |
9594 rows × 12 columns
GroupBy
df.groupby(by='Subject').mean()
Name_A | Name_B | Dist_A | Dist_B | Response | RT | |
---|---|---|---|---|---|---|
Subject | ||||||
P_ADI_331 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.530000 | 10042.732500 |
P_ALM_345 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.480000 | 13103.350000 |
P_AMY_346 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.467500 | 9561.292500 |
P_BAM_347 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.540000 | 9096.540000 |
P_BEH_340 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.437500 | 13187.195000 |
P_BLC_325 | 2.005013 | 2.002506 | 3.000000 | 3.005013 | 1.458647 | 11366.338346 |
P_BLR_321 | 2.005013 | 2.000000 | 3.005013 | 3.000000 | 1.581454 | 9657.929825 |
P_BOA_321 | 2.000000 | 2.005013 | 3.002506 | 2.997494 | 1.508772 | 15787.130326 |
P_BOC_342 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.457500 | 12779.680000 |
P_CAR_327 | 1.997487 | 2.000000 | 3.002513 | 3.007538 | 1.592965 | 12457.844221 |
P_CAV_333 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.502500 | 10551.257500 |
P_CON_336 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.442500 | 8352.742500 |
P_GAM_338 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.495000 | 11805.215000 |
P_GHM_334 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.505000 | 11220.800000 |
P_GRC_341 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.500000 | 11213.115000 |
P_GRF_322 | 1.994987 | 1.997494 | 2.997494 | 2.994987 | 1.551378 | 15652.556391 |
P_LAC_354 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.537500 | 15617.710000 |
P_LEG_335 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.467500 | 10904.325000 |
P_MOE_339 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.475000 | 8470.447500 |
P_ROS_336 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.472500 | 16019.425000 |
P_SOA_337 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.442500 | 10051.175000 |
P_TAI_343 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.570000 | 16413.327500 |
P_VAL_329 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.470000 | 13661.730000 |
P_VAR_330 | 2.000000 | 2.000000 | 3.000000 | 3.000000 | 1.570000 | 12240.965000 |
df.groupby(by='Subject').mean()["RT"]
Subject
P_ADI_331 10042.732500
P_ALM_345 13103.350000
P_AMY_346 9561.292500
P_BAM_347 9096.540000
P_BEH_340 13187.195000
P_BLC_325 11366.338346
P_BLR_321 9657.929825
P_BOA_321 15787.130326
P_BOC_342 12779.680000
P_CAR_327 12457.844221
P_CAV_333 10551.257500
P_CON_336 8352.742500
P_GAM_338 11805.215000
P_GHM_334 11220.800000
P_GRC_341 11213.115000
P_GRF_322 15652.556391
P_LAC_354 15617.710000
P_LEG_335 10904.325000
P_MOE_339 8470.447500
P_ROS_336 16019.425000
P_SOA_337 10051.175000
P_TAI_343 16413.327500
P_VAL_329 13661.730000
P_VAR_330 12240.965000
Name: RT, dtype: float64
df_gb = df.groupby(by='Subject')
df_mean = df_gb.mean()
mean_rt = df_mean["RT"]
mean_rt
Subject
P_ADI_331 10042.732500
P_ALM_345 13103.350000
P_AMY_346 9561.292500
P_BAM_347 9096.540000
P_BEH_340 13187.195000
P_BLC_325 11366.338346
P_BLR_321 9657.929825
P_BOA_321 15787.130326
P_BOC_342 12779.680000
P_CAR_327 12457.844221
P_CAV_333 10551.257500
P_CON_336 8352.742500
P_GAM_338 11805.215000
P_GHM_334 11220.800000
P_GRC_341 11213.115000
P_GRF_322 15652.556391
P_LAC_354 15617.710000
P_LEG_335 10904.325000
P_MOE_339 8470.447500
P_ROS_336 16019.425000
P_SOA_337 10051.175000
P_TAI_343 16413.327500
P_VAL_329 13661.730000
P_VAR_330 12240.965000
Name: RT, dtype: float64