UE8: notebook séance 4

Notebook de la séance 4

import pandas
df = pandas.read_excel("Donnees_M2_RD.xlsx")

Jointures (merge)

On fabrique une Dataframe avec les correspondance entre indice de distance et distance en mètres. Deux dataframes: une pour la situation péripersonnel (space = I) et extrapersonnel (space = E)

dist_i_m = pandas.DataFrame(
    { "Dist_I_m": [ 0.2, 0.3, 0.4, 0.6, 0.8 ] },
    index = [1,2,3,4,5]
)
dist_e_m = pandas.DataFrame(
    { "Dist_E_m": [ 2, 3, 4, 6, 8 ] },
    index = [1,2,3,4,5]
)
dist_ie_m = pandas.concat([dist_i_m,dist_e_m],axis=1)
dist_i_m

Dist_I_m
1 0.2
2 0.3
3 0.4
4 0.6
5 0.8
dist_e_m

Dist_E_m
1 2
2 3
3 4
4 6
5 8
dist_ie_m

Dist_I_m Dist_E_m
1 0.2 2
2 0.3 3
3 0.4 4
4 0.6 6
5 0.8 8
# ou bien
dist_ie_m2 = dist_i_m.copy()
dist_ie_m2["Dist_E_m"] = dist_ie_m2["Dist_I_m"] * 10
dist_ie_m2

Dist_I_m Dist_E_m
1 0.2 2.0
2 0.3 3.0
3 0.4 4.0
4 0.6 6.0
5 0.8 8.0

On combine df et dist_i_m pour récupérer une distance en mètres

pandas.merge(df, dist_i_m, left_on='Dist_A', right_index=True)

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT Dist_I_m
0 P_ADI_331 0 2 2 4 Dic E D 2 18865 0.3
4 P_ADI_331 1 2 2 4 Dic E D 1 11801 0.3
5 P_ADI_331 2 1 2 3 Dic E D 2 12117 0.3
7 P_ADI_331 0 3 2 4 Dic E D 1 13237 0.3
19 P_ADI_331 3 1 2 3 Dic E D 2 12041 0.3
... ... ... ... ... ... ... ... ... ... ... ...
9566 P_VAR_330 4 2 5 1 Dio I D 2 9526 0.8
9574 P_VAR_330 0 3 5 2 Dio I D 2 5526 0.8
9581 P_VAR_330 3 2 5 4 Dio I D 1 11327 0.8
9590 P_VAR_330 3 2 5 1 Dio I D 2 6349 0.8
9593 P_VAR_330 0 3 5 1 Dio I D 2 10139 0.8

9594 rows × 11 columns

Plusieurs problèmes se posent:

  • On a calculé une distance concrète que pour Dist_A, mais pas pour Dist_B
  • La distance est fausse si Space = E

Si on fait deux jointures (une pour Dist_A et une pour Dist_B) on aura deux colonnes Dist_I_m dans le resultat, il faut donc renommer.

dfa = pandas.merge(df, dist_i_m, left_on='Dist_A', right_index=True)
dfa2 = dfa.rename(columns={ 'Dist_I_m': 'Dist_A_m' })
dfab = pandas.merge(dfa2, dist_i_m, left_on='Dist_B', right_index=True)
dfm = dfab.rename(columns={ 'Dist_I_m': 'Dist_B_m' })
dfm

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT Dist_A_m Dist_B_m
0 P_ADI_331 0 2 2 4 Dic E D 2 18865 0.3 0.6
4 P_ADI_331 1 2 2 4 Dic E D 1 11801 0.3 0.6
7 P_ADI_331 0 3 2 4 Dic E D 1 13237 0.3 0.6
36 P_ADI_331 3 1 2 4 Dic E D 1 9523 0.3 0.6
51 P_ADI_331 2 4 2 4 Dio E G 1 11331 0.3 0.6
... ... ... ... ... ... ... ... ... ... ... ... ...
9437 P_VAR_330 1 2 5 2 Dio E D 1 8802 0.8 0.3
9509 P_VAR_330 4 2 5 2 Dic I G 2 5336 0.8 0.3
9522 P_VAR_330 3 4 5 2 Dic I G 2 6058 0.8 0.3
9543 P_VAR_330 4 1 5 2 Dic I G 2 5379 0.8 0.3
9574 P_VAR_330 0 3 5 2 Dio I D 2 5526 0.8 0.3

9594 rows × 12 columns

Les valeurs sont OK pour Space = I, mais toujours fausses pour E.

df_i = df[df["Space"] == "I"]
dfa = pandas.merge(df_i, dist_i_m, left_on='Dist_A', right_index=True)
dfa2 = dfa.rename(columns={ 'Dist_I_m': 'Dist_A_m' })
dfab = pandas.merge(dfa2, dist_i_m, left_on='Dist_B', right_index=True)
df_i_m = dfab.rename(columns={ 'Dist_I_m': 'Dist_B_m' })
df_i_m

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT Dist_A_m Dist_B_m
150 P_ADI_331 2 3 3 4 Dic I D 2 14608 0.4 0.6
180 P_ADI_331 4 3 3 4 Dic I D 1 9086 0.4 0.6
207 P_ADI_331 4 2 3 4 Dic I G 1 7251 0.4 0.6
213 P_ADI_331 0 4 3 4 Dic I G 1 9298 0.4 0.6
250 P_ADI_331 1 0 3 4 Dio I D 1 11246 0.4 0.6
... ... ... ... ... ... ... ... ... ... ... ... ...
9498 P_VAR_330 4 2 4 3 Dic I G 2 4811 0.6 0.4
9503 P_VAR_330 3 0 4 3 Dic I G 2 6628 0.6 0.4
9526 P_VAR_330 0 1 4 3 Dic I G 2 5458 0.6 0.4
9535 P_VAR_330 2 4 4 3 Dic I G 1 8956 0.6 0.4
9546 P_VAR_330 2 3 4 3 Dio I D 2 13525 0.6 0.4

4797 rows × 12 columns

df_e = df[df["Space"] == "E"]
dfa = pandas.merge(df_e, dist_e_m, left_on='Dist_A', right_index=True)
dfa2 = dfa.rename(columns={ 'Dist_E_m': 'Dist_A_m' })
dfab = pandas.merge(dfa2, dist_e_m, left_on='Dist_B', right_index=True)
df_e_m = dfab.rename(columns={ 'Dist_E_m': 'Dist_B_m' })
df_e_m

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT Dist_A_m Dist_B_m
0 P_ADI_331 0 2 2 4 Dic E D 2 18865 3 6
4 P_ADI_331 1 2 2 4 Dic E D 1 11801 3 6
7 P_ADI_331 0 3 2 4 Dic E D 1 13237 3 6
36 P_ADI_331 3 1 2 4 Dic E D 1 9523 3 6
51 P_ADI_331 2 4 2 4 Dio E G 1 11331 3 6
... ... ... ... ... ... ... ... ... ... ... ... ...
9364 P_VAR_330 1 0 5 2 Dic E D 2 8296 8 3
9374 P_VAR_330 2 1 5 2 Dic E D 2 12260 8 3
9412 P_VAR_330 4 3 5 2 Dio E D 2 9414 8 3
9433 P_VAR_330 3 1 5 2 Dio E D 2 16334 8 3
9437 P_VAR_330 1 2 5 2 Dio E D 1 8802 8 3

4797 rows × 12 columns

df_m = pandas.concat([df_i_m, df_e_m])
df_m

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT Dist_A_m Dist_B_m
150 P_ADI_331 2 3 3 4 Dic I D 2 14608 0.4 0.6
180 P_ADI_331 4 3 3 4 Dic I D 1 9086 0.4 0.6
207 P_ADI_331 4 2 3 4 Dic I G 1 7251 0.4 0.6
213 P_ADI_331 0 4 3 4 Dic I G 1 9298 0.4 0.6
250 P_ADI_331 1 0 3 4 Dio I D 1 11246 0.4 0.6
... ... ... ... ... ... ... ... ... ... ... ... ...
9364 P_VAR_330 1 0 5 2 Dic E D 2 8296 8.0 3.0
9374 P_VAR_330 2 1 5 2 Dic E D 2 12260 8.0 3.0
9412 P_VAR_330 4 3 5 2 Dio E D 2 9414 8.0 3.0
9433 P_VAR_330 3 1 5 2 Dio E D 2 16334 8.0 3.0
9437 P_VAR_330 1 2 5 2 Dio E D 1 8802 8.0 3.0

9594 rows × 12 columns

GroupBy

df.groupby(by='Subject').mean()

Name_A Name_B Dist_A Dist_B Response RT
Subject
P_ADI_331 2.000000 2.000000 3.000000 3.000000 1.530000 10042.732500
P_ALM_345 2.000000 2.000000 3.000000 3.000000 1.480000 13103.350000
P_AMY_346 2.000000 2.000000 3.000000 3.000000 1.467500 9561.292500
P_BAM_347 2.000000 2.000000 3.000000 3.000000 1.540000 9096.540000
P_BEH_340 2.000000 2.000000 3.000000 3.000000 1.437500 13187.195000
P_BLC_325 2.005013 2.002506 3.000000 3.005013 1.458647 11366.338346
P_BLR_321 2.005013 2.000000 3.005013 3.000000 1.581454 9657.929825
P_BOA_321 2.000000 2.005013 3.002506 2.997494 1.508772 15787.130326
P_BOC_342 2.000000 2.000000 3.000000 3.000000 1.457500 12779.680000
P_CAR_327 1.997487 2.000000 3.002513 3.007538 1.592965 12457.844221
P_CAV_333 2.000000 2.000000 3.000000 3.000000 1.502500 10551.257500
P_CON_336 2.000000 2.000000 3.000000 3.000000 1.442500 8352.742500
P_GAM_338 2.000000 2.000000 3.000000 3.000000 1.495000 11805.215000
P_GHM_334 2.000000 2.000000 3.000000 3.000000 1.505000 11220.800000
P_GRC_341 2.000000 2.000000 3.000000 3.000000 1.500000 11213.115000
P_GRF_322 1.994987 1.997494 2.997494 2.994987 1.551378 15652.556391
P_LAC_354 2.000000 2.000000 3.000000 3.000000 1.537500 15617.710000
P_LEG_335 2.000000 2.000000 3.000000 3.000000 1.467500 10904.325000
P_MOE_339 2.000000 2.000000 3.000000 3.000000 1.475000 8470.447500
P_ROS_336 2.000000 2.000000 3.000000 3.000000 1.472500 16019.425000
P_SOA_337 2.000000 2.000000 3.000000 3.000000 1.442500 10051.175000
P_TAI_343 2.000000 2.000000 3.000000 3.000000 1.570000 16413.327500
P_VAL_329 2.000000 2.000000 3.000000 3.000000 1.470000 13661.730000
P_VAR_330 2.000000 2.000000 3.000000 3.000000 1.570000 12240.965000
df.groupby(by='Subject').mean()["RT"]
Subject
P_ADI_331    10042.732500
P_ALM_345    13103.350000
P_AMY_346     9561.292500
P_BAM_347     9096.540000
P_BEH_340    13187.195000
P_BLC_325    11366.338346
P_BLR_321     9657.929825
P_BOA_321    15787.130326
P_BOC_342    12779.680000
P_CAR_327    12457.844221
P_CAV_333    10551.257500
P_CON_336     8352.742500
P_GAM_338    11805.215000
P_GHM_334    11220.800000
P_GRC_341    11213.115000
P_GRF_322    15652.556391
P_LAC_354    15617.710000
P_LEG_335    10904.325000
P_MOE_339     8470.447500
P_ROS_336    16019.425000
P_SOA_337    10051.175000
P_TAI_343    16413.327500
P_VAL_329    13661.730000
P_VAR_330    12240.965000
Name: RT, dtype: float64
df_gb = df.groupby(by='Subject')
df_mean = df_gb.mean()
mean_rt = df_mean["RT"]
mean_rt
Subject
P_ADI_331    10042.732500
P_ALM_345    13103.350000
P_AMY_346     9561.292500
P_BAM_347     9096.540000
P_BEH_340    13187.195000
P_BLC_325    11366.338346
P_BLR_321     9657.929825
P_BOA_321    15787.130326
P_BOC_342    12779.680000
P_CAR_327    12457.844221
P_CAV_333    10551.257500
P_CON_336     8352.742500
P_GAM_338    11805.215000
P_GHM_334    11220.800000
P_GRC_341    11213.115000
P_GRF_322    15652.556391
P_LAC_354    15617.710000
P_LEG_335    10904.325000
P_MOE_339     8470.447500
P_ROS_336    16019.425000
P_SOA_337    10051.175000
P_TAI_343    16413.327500
P_VAL_329    13661.730000
P_VAR_330    12240.965000
Name: RT, dtype: float64
Emmanuel Coquery
Emmanuel Coquery
Maître de conférences en Informatique