User Tools

Site Tools


Sidebar

Practical Information:

Teaching:

Bâtiment Nautibus
43, Bd du 11 Novembre 1918
69622 Villeurbanne Cedex.
☏: +33(0)472 43 16 35
email: marc.plantevit-at-univ-lyon1.fr

Research:

Bureau 501.319
Bâtiment Blaise Pascal
7, Avenue Jean Capelle
69621 Villeurbanne Cedex
☏: +33(0)472 43 84 87
Fax: +33(0)472 43 87 13
email: marc.plantevit-at-liris.cnrs.fr

m1ens2017_project

M1ENS -- DBDM -- DM Project

The goal of this project is to apply the concepts and the technologies previously seen. To this end, you have to choose one of the following datasets1).

On the considered dataset, you have to either bring some insights according to an already given task or combination of tasks (e.g., clustering, pattern mining ) or define yourself the general aims and discover some knowledge from the data (produce added value from the data). To this end, you can use any data mining/machine learning method2) as well as any algorithm or software (Knime, Sci-Kit Learn (Python), Web Api (Google, Bing, Yahoo, …)).

Datasets

Datasets Possible Mining Tasks
Collection of tweets in N.Y. Detection of geolocated events
Foursquare datasets (several cities) Characterization of a city. I like Croix Rousse, where should I live in N.Y or S.F? Characterization of food supply in a city.
Flickr data Discovery and characterization of points of interest.
European parliament votes What are the subjects of votes that are consensual or polarizing?
French presidential election candidates tweetsWhat are the terms (through time) that characterize the candidates?
Esport data (e.g. LoL Picks and Bans3)) What are the victorious or losing choices?
Telematic dataUser Re-identification
Keystroke analyticsUser identification based on typing patterns (create your dataset capturing keyboard signals)

Tentative schedule

GIDSubjectTime
M. Philibert and P.-E. Polet Flickr 13h30
E. Kerinec and N. Derumigny French presidential elections13h50
X. Badin de Montjoye, H. Menet, L. Paulin and Y. GazielloKeystroke analysis and user re-idendification 14h10
R. Cerda, N. Levy, A. Slowik and D. Sintiari Credit card fraud detection 14h30
E. Prebet and R. Coudert League of Legends 14h50
A. Martin, F. Lecuyer, T. Sterin, S.-M. Mutsotso and T. Nguyen Twitter event detection15h10
Etienne Desbois and P. Mangold Flickr15h30

Expectations

You have – using the different concepts seen during the lectures (but not uniquely) – produce added value from data (answer the a specific question, discover knowledge, …). You can use any tools/techno/algorithms. These datasets can also be the support of the development of your proper algorithms (pattern sampling approach, interactive exploration, …).

You have to:

  • Write a report (pdf format) describing your work (10 pages max., appendices are possible);
  • Provide an archive of your code;
  • Present your work on May 11th: a 10-minute presentation followed by questions (5 minutes)

<note important> The report, presentation and source code must be sent by email (marc.plantevit-at-liris.cnrs.fr, cc: marc.plantevit@univ-lyon1.fr) before May, 22nd, 2016 (23h59) 4). </note>

<note important> You can work in group of maximum 5 persons.

  • Expected work = f(|group|) with f strictly increasing ;-).

</note>

1)
If you are not interested in any proposed datasets, you can propose one but it has to be validated by the teachers.
2)
You can marginally use some classification techniques but you have to perform a descriptive analysis.
3)
item+ denote the choice of the player that won the game while item- is related to the loser's one.
4)
If the archive is too big, provide a link to download it.
m1ens2017_project.txt · Last modified: 2017/05/10 17:59 by mplantev

CNRS INSA de Lyon Université Lyon 1 Université Lyon 2 École centrale de Lyon