The goal of this project is to apply the concepts and the technologies previously seen. To this end, you have to choose a public (or personal) data set that has to be validated.
On the considered dataset, you have to either bring some insights according to an already given task (e.g., classification task) or define your self the general aims and discover some knowledge from the data (produce added value from the data). To this end, you can use any data mining/machine learning method as well as any algorithm or software (Knime, Sci-Kit Learn (Python), Web Api (Google, Bing, Yahoo, …)).
Datasets | Possible Mining Task |
Datasets available on http://www.kaggle.com/ | the related aims or other ones, I have to valid your choice |
Other datasets you want I have to valid your choice |
<note important>The dataset and the main goals must be validated on April 4th, or by email. </note>
GID | Subject | Time | |
G1 | P. Simonaitis & D. Lajou | Football data challenge | 13h30 |
G2 | M. Boritchev & M. Chardet | San Francisco Crimes | 13h45 |
G3 | J-Y. Franceschi, F. Lebeau & V. Mollimard | LoL | 14h |
G4 | B. Brikci-Sid, S. Tendjaoui & H. Yampa | Detecting gender from micro-reviews | 14h15 |
G5 | V. Michielini, E. Moutot & E. Oshurko | Death cause prediction | 14h30 |
G6 | R. Grünblatt, S. Mauras & X. Vu | District mapping from social media | 14h45 |
G7 | C. Lucas | 15h |
You have – using the different concepts seen during the lectures (but not uniquely) – produce added value from data (answer the a specific question, discover knowledge, …). You can use any tools/techno/algorithms.
You have to:
<note important> The report, presentation and source code must be sent by email (marc.plantevit-at-liris.cnrs.fr) before 04/26/2016 (23h59) 1). </note>
<note important> You can work in group of maximum 3 persons.
</note>