UE8: notebook séance 2

UE8: Notebook de la séance 2

# Pour utiliser les bibliothèques
import pandas

Chargement des données du fichier Donnees_M2_RD.xlsx

dfxls = pandas.read_excel("Donnees_M2_RD.xlsx")

Chargement des dataframes expérimentaux du M1 dans un tableau

expes = []
for i in range(8):
    filename = "expe/subject-"+str(i)+".csv"
    print("Loading "+filename)
    df = pandas.read_csv(filename)
    df['Subject'] = i
    expes.append(df)
dfm1 = pandas.concat(expes)
Loading expe/subject-0.csv
Loading expe/subject-1.csv
Loading expe/subject-2.csv
Loading expe/subject-3.csv
Loading expe/subject-4.csv
Loading expe/subject-5.csv
Loading expe/subject-6.csv
Loading expe/subject-7.csv
dfxls

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT
0 P_ADI_331 0 2 2 4 Dic E D 2 18865
1 P_ADI_331 1 4 4 1 Dic E D 2 13157
2 P_ADI_331 4 3 3 2 Dic E D 1 11628
3 P_ADI_331 2 4 4 1 Dic E D 1 10068
4 P_ADI_331 1 2 2 4 Dic E D 1 11801
... ... ... ... ... ... ... ... ... ... ...
9589 P_VAR_330 0 1 3 5 Dio I D 1 7626
9590 P_VAR_330 3 2 5 1 Dio I D 2 6349
9591 P_VAR_330 2 0 4 2 Dio I D 2 9031
9592 P_VAR_330 0 2 2 1 Dio I D 2 16323
9593 P_VAR_330 0 3 5 1 Dio I D 2 10139

9594 rows × 10 columns

dfm1

acc accuracy average_response_time avg_rt background canvas_backend clock_backend color_backend correct correct_message_debut_tirage ... time_tirage time_tirage_loop time_welcome tirage title total_correct total_response_time total_responses width Subject
0 0 0 1012 1012 #3d3846 legacy legacy legacy 0 undefined ... 54014 53974 1329 20 Nouvelle expérience 0 1012.0 1 1024 0
1 100 100 662 662 #3d3846 legacy legacy legacy 1 undefined ... 54014 53974 1329 20 Nouvelle expérience 1 662.0 1 1024 0
2 0 0 710 710 #3d3846 legacy legacy legacy 0 undefined ... 54014 53974 1329 20 Nouvelle expérience 0 710.0 1 1024 0
3 0 0 742 742 #3d3846 legacy legacy legacy 0 undefined ... 54014 53974 1329 20 Nouvelle expérience 0 742.0 1 1024 0
4 0 0 806 806 #3d3846 legacy legacy legacy 0 undefined ... 54014 53974 1329 20 Nouvelle expérience 0 806.0 1 1024 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
433 100 100 305 305 #3d3846 legacy legacy legacy 1 undefined ... 790915 7847 1443 5 Nouvelle expérience 1 305.0 1 1024 7
434 0 0 290 290 #3d3846 legacy legacy legacy 0 undefined ... 790915 7847 1443 5 Nouvelle expérience 0 290.0 1 1024 7
435 0 0 260 260 #3d3846 legacy legacy legacy 0 undefined ... 790915 7847 1443 5 Nouvelle expérience 0 260.0 1 1024 7
436 0 0 554 554 #3d3846 legacy legacy legacy 0 undefined ... 790915 7847 1443 5 Nouvelle expérience 0 554.0 1 1024 7
437 0 0 605 605 #3d3846 legacy legacy legacy 0 undefined ... 790915 7847 1443 5 Nouvelle expérience 0 605.0 1 1024 7

3504 rows × 110 columns

for col in df.axes[1]:
    print(col)
acc
accuracy
average_response_time
avg_rt
background
canvas_backend
clock_backend
color_backend
correct
correct_message_debut_tirage
correct_reponse_question1
correct_reponse_question2
correct_welcome
count_boucle_images
count_choix_csv
count_choix_csv2
count_config_script
count_experiment
count_getting_started
count_image
count_image_et_questions
count_message_debut_tirage
count_question1
count_question1_feedback
count_question2
count_question2_feedback
count_reponse_question1
count_reponse_question2
count_resultats_logger
count_tirage
count_tirage_loop
count_welcome
datetime
description
disable_garbage_collection
experiment_file
experiment_path
fichier_csv
font_bold
font_family
font_italic
font_size
font_underline
foreground
form_clicks
fullscreen
height
image
keyboard_backend
live_row
live_row_boucle_images
live_row_tirage_loop
logfile
mouse_backend
nombre_tirages
num_tirage
opensesame_codename
opensesame_version
q1
q2
rep_autorisee1
rep_autorisee2
rep_ok1
rep_ok2
repeat_cycle
response
response_message_debut_tirage
response_reponse_question1
response_reponse_question2
response_time
response_time_message_debut_tirage
response_time_reponse_question1
response_time_reponse_question2
response_time_welcome
response_welcome
round_decimals
sampler_backend
sound_buf_size
sound_channels
sound_freq
sound_sample_size
start
subject_nr
subject_parity
time_boucle_images
time_choix_csv
time_choix_csv2
time_config_script
time_experiment
time_getting_started
time_image
time_image_et_questions
time_message_debut_tirage
time_question1
time_question1_feedback
time_question2
time_question2_feedback
time_reponse_question1
time_reponse_question2
time_resultats_logger
time_tirage
time_tirage_loop
time_welcome
tirage
title
total_correct
total_response_time
total_responses
width
Subject
dfxls["Subject"].drop_duplicates()
0       P_ADI_331
400     P_ALM_345
800     P_AMY_346
1200    P_BAM_347
1600    P_BEH_340
2000    P_BLC_325
2399    P_BLR_321
2798    P_BOA_321
3197    P_BOC_342
3597    P_CAR_327
3995    P_CAV_333
4395    P_CON_336
4795    P_GAM_338
5195    P_GHM_334
5595    P_GRC_341
5995    P_GRF_322
6394    P_LAC_354
6794    P_LEG_335
7194    P_MOE_339
7594    P_ROS_336
7994    P_SOA_337
8394    P_TAI_343
8794    P_VAL_329
9194    P_VAR_330
Name: Subject, dtype: object
dfxls

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT
0 P_ADI_331 0 2 2 4 Dic E D 2 18865
1 P_ADI_331 1 4 4 1 Dic E D 2 13157
2 P_ADI_331 4 3 3 2 Dic E D 1 11628
3 P_ADI_331 2 4 4 1 Dic E D 1 10068
4 P_ADI_331 1 2 2 4 Dic E D 1 11801
... ... ... ... ... ... ... ... ... ... ...
9589 P_VAR_330 0 1 3 5 Dio I D 1 7626
9590 P_VAR_330 3 2 5 1 Dio I D 2 6349
9591 P_VAR_330 2 0 4 2 Dio I D 2 9031
9592 P_VAR_330 0 2 2 1 Dio I D 2 16323
9593 P_VAR_330 0 3 5 1 Dio I D 2 10139

9594 rows × 10 columns

rt = dfxls["RT"]
rt
0       18865
1       13157
2       11628
3       10068
4       11801
        ...  
9589     7626
9590     6349
9591     9031
9592    16323
9593    10139
Name: RT, Length: 9594, dtype: int64
rt.max()
103152
rt.mean()
12050.088597039816
rt.std()
7085.96882950782
rt.quantile(0.4)
9315.000000000002
dfxls.min()
Subject     P_ADI_331
Name_A              0
Name_B              0
Dist_A              1
Dist_B              1
Mode              Dic
Space               E
Side                D
Response            1
RT               2703
dtype: object
rt.quantile([0.25,0.5,0.75])
0.25     7896.25
0.50    10262.50
0.75    13696.50
Name: RT, dtype: float64
condition = (rt >= 12000)
condition
0        True
1        True
2       False
3       False
4       False
        ...  
9589    False
9590    False
9591    False
9592     True
9593    False
Name: RT, Length: 9594, dtype: bool
rt
0       18865
1       13157
2       11628
3       10068
4       11801
        ...  
9589     7626
9590     6349
9591     9031
9592    16323
9593    10139
Name: RT, Length: 9594, dtype: int64
dfxls

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT
0 P_ADI_331 0 2 2 4 Dic E D 2 18865
1 P_ADI_331 1 4 4 1 Dic E D 2 13157
2 P_ADI_331 4 3 3 2 Dic E D 1 11628
3 P_ADI_331 2 4 4 1 Dic E D 1 10068
4 P_ADI_331 1 2 2 4 Dic E D 1 11801
... ... ... ... ... ... ... ... ... ... ...
9589 P_VAR_330 0 1 3 5 Dio I D 1 7626
9590 P_VAR_330 3 2 5 1 Dio I D 2 6349
9591 P_VAR_330 2 0 4 2 Dio I D 2 9031
9592 P_VAR_330 0 2 2 1 Dio I D 2 16323
9593 P_VAR_330 0 3 5 1 Dio I D 2 10139

9594 rows × 10 columns

Ici condition est une série de booléens

dfxls[condition]

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT
0 P_ADI_331 0 2 2 4 Dic E D 2 18865
1 P_ADI_331 1 4 4 1 Dic E D 2 13157
5 P_ADI_331 2 1 2 3 Dic E D 2 12117
6 P_ADI_331 2 1 3 4 Dic E D 1 16347
7 P_ADI_331 0 3 2 4 Dic E D 1 13237
... ... ... ... ... ... ... ... ... ... ...
9582 P_VAR_330 1 0 3 5 Dio I D 2 15942
9583 P_VAR_330 0 4 4 5 Dio I D 2 45627
9585 P_VAR_330 0 3 2 3 Dio I D 2 16671
9586 P_VAR_330 0 1 2 3 Dio I D 1 18002
9592 P_VAR_330 0 2 2 1 Dio I D 2 16323

3336 rows × 10 columns

rt = dfxls["RT"]
cond = rt >= 12000
dfxls[cond]

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT
0 P_ADI_331 0 2 2 4 Dic E D 2 18865
1 P_ADI_331 1 4 4 1 Dic E D 2 13157
5 P_ADI_331 2 1 2 3 Dic E D 2 12117
6 P_ADI_331 2 1 3 4 Dic E D 1 16347
7 P_ADI_331 0 3 2 4 Dic E D 1 13237
... ... ... ... ... ... ... ... ... ... ...
9582 P_VAR_330 1 0 3 5 Dio I D 2 15942
9583 P_VAR_330 0 4 4 5 Dio I D 2 45627
9585 P_VAR_330 0 3 2 3 Dio I D 2 16671
9586 P_VAR_330 0 1 2 3 Dio I D 1 18002
9592 P_VAR_330 0 2 2 1 Dio I D 2 16323

3336 rows × 10 columns

dfxls[dfxls["RT"] >= 12000]

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT
0 P_ADI_331 0 2 2 4 Dic E D 2 18865
1 P_ADI_331 1 4 4 1 Dic E D 2 13157
5 P_ADI_331 2 1 2 3 Dic E D 2 12117
6 P_ADI_331 2 1 3 4 Dic E D 1 16347
7 P_ADI_331 0 3 2 4 Dic E D 1 13237
... ... ... ... ... ... ... ... ... ... ...
9582 P_VAR_330 1 0 3 5 Dio I D 2 15942
9583 P_VAR_330 0 4 4 5 Dio I D 2 45627
9585 P_VAR_330 0 3 2 3 Dio I D 2 16671
9586 P_VAR_330 0 1 2 3 Dio I D 1 18002
9592 P_VAR_330 0 2 2 1 Dio I D 2 16323

3336 rows × 10 columns

dfxls["RT" >= 12000]
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Cell In[30], line 1
----> 1 dfxls["RT" >= 12000]


TypeError: '>=' not supported between instances of 'str' and 'int'
# on se rappelle que rt contient dfxls["RT"]
dfxls[rt >= 12000]

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT
0 P_ADI_331 0 2 2 4 Dic E D 2 18865
1 P_ADI_331 1 4 4 1 Dic E D 2 13157
5 P_ADI_331 2 1 2 3 Dic E D 2 12117
6 P_ADI_331 2 1 3 4 Dic E D 1 16347
7 P_ADI_331 0 3 2 4 Dic E D 1 13237
... ... ... ... ... ... ... ... ... ... ...
9582 P_VAR_330 1 0 3 5 Dio I D 2 15942
9583 P_VAR_330 0 4 4 5 Dio I D 2 45627
9585 P_VAR_330 0 3 2 3 Dio I D 2 16671
9586 P_VAR_330 0 1 2 3 Dio I D 1 18002
9592 P_VAR_330 0 2 2 1 Dio I D 2 16323

3336 rows × 10 columns

sup_8000 = (rt >= 8000)
inf_12000 = (rt <= 12000)
sup_8000
0        True
1        True
2        True
3        True
4        True
        ...  
9589    False
9590    False
9591     True
9592     True
9593     True
Name: RT, Length: 9594, dtype: bool
inf_12000
0       False
1       False
2        True
3        True
4        True
        ...  
9589     True
9590     True
9591     True
9592    False
9593     True
Name: RT, Length: 9594, dtype: bool
entre_8000_12000 = (sup_8000 & inf_12000)
entre_8000_12000
0       False
1       False
2        True
3        True
4        True
        ...  
9589    False
9590    False
9591     True
9592    False
9593     True
Name: RT, Length: 9594, dtype: bool
dfxls[entre_8000_12000]

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT
2 P_ADI_331 4 3 3 2 Dic E D 1 11628
3 P_ADI_331 2 4 4 1 Dic E D 1 10068
4 P_ADI_331 1 2 2 4 Dic E D 1 11801
9 P_ADI_331 2 1 4 2 Dic E D 2 10973
11 P_ADI_331 1 2 4 2 Dic E D 2 11471
... ... ... ... ... ... ... ... ... ... ...
9576 P_VAR_330 1 2 2 4 Dio I D 1 9141
9578 P_VAR_330 2 3 2 4 Dio I D 2 11208
9581 P_VAR_330 3 2 5 4 Dio I D 1 11327
9591 P_VAR_330 2 0 4 2 Dio I D 2 9031
9593 P_VAR_330 0 3 5 1 Dio I D 2 10139

3761 rows × 10 columns

dfxls[(dfxls["RT"] >= 8000) & (dfxls["RT"] <= 12000)]

Subject Name_A Name_B Dist_A Dist_B Mode Space Side Response RT
2 P_ADI_331 4 3 3 2 Dic E D 1 11628
3 P_ADI_331 2 4 4 1 Dic E D 1 10068
4 P_ADI_331 1 2 2 4 Dic E D 1 11801
9 P_ADI_331 2 1 4 2 Dic E D 2 10973
11 P_ADI_331 1 2 4 2 Dic E D 2 11471
... ... ... ... ... ... ... ... ... ... ...
9576 P_VAR_330 1 2 2 4 Dio I D 1 9141
9578 P_VAR_330 2 3 2 4 Dio I D 2 11208
9581 P_VAR_330 3 2 5 4 Dio I D 1 11327
9591 P_VAR_330 2 0 4 2 Dio I D 2 9031
9593 P_VAR_330 0 3 5 1 Dio I D 2 10139

3761 rows × 10 columns

rt = dfxls["RT"]
df_8_12 = dfxls[(rt >= 8000) & (rt <= 12000)]
df_8_12["Name_A"]
2       4
3       2
4       1
9       2
11      1
       ..
9576    1
9578    2
9581    3
9591    2
9593    0
Name: Name_A, Length: 3761, dtype: int64
dfxls[dfxls["RT"] >= 80000]["Subject"].drop_duplicates()
2798    P_BOA_321
3299    P_BOC_342
6447    P_LAC_354
9091    P_VAL_329
9425    P_VAR_330
Name: Subject, dtype: object
Emmanuel Coquery
Emmanuel Coquery
Maître de conférences en Informatique