Ph.D Keziban Gunce Orman
Contribution to the interpretation of evolving communities in complex networks : Application to the study of social interactions
Download Draft here
Defended on July 16th 2014 at INSA de Lyon
Committee
Pr. Jean-François Boulicaut, INSA de
Lyon, LIRIS, co-directeur
Pr. Éric Gaussier, Université
Grenoble 1, LIG
Dr. Jean-Loup Guillaume, Université
Paris 6, LIP6
Pr. François Jacquenet, Université
de Saint-Etienne, LaHC, rapporteur
Dr. Vincent Labatut, Université
Galatasaray, Turquie, co-directeur
Pr. Céline Rouveirol, Université
Paris 13, LIPN
Dr. Maguelonne Teisseire, IRSTEA
UMR TETIS, Montpellier
Pr. Emmanuel Viennet, Université Paris 13, LE2I, rapporteur
Résumé. Les réseaux complexes constituent un outil pratique pour modéliser de nombreux systèmes réels. Une multitude d’outils existent pour étudier de tels réseaux. Parmi eux, l’étude des structures communautaires est très répandue : une communauté est un groupe de nœuds plus densément connectés entre eux qu’avec le reste du réseau. Un grand nombre de méthodes ont été proposées pour détecter de telles structures. Généralement, elles se basent sur des algorithmes qui calculent des partitions de l’ensemble des nœuds. En fait, ce ne sont pas tant les groupes que ce qu’ils peuvent nous apprendre qui peut nous intéresser. De fait, l’interprétation et la caractérisation de structures communautaires ont été peu étudiées, a fortiori dans le contexte des réseaux attribués et dynamiques. Dans cette thèse, nous voyons l’interprétation des communautés comme un problème indépendant du processus de leur détection. Nous construisons des représentations des communautés qui vont permettre la sélection objective de traits caractéristiques. L’information encodée dans les réseaux qui sont analysés est utilisée pour représenter chaque communauté sous la forme de séquences temporelles de descripteurs associés à chaque nœud individuellement. Ces descripteurs peuvent être des mesures topologiques et/ou des attributs nodaux. Nous détectons ensuite les motifs séquentiels émergents dans cette représentation afin d’identifier ceux qui sont les caractérisent le mieux chaque communauté. Nous pouvons d’ailleurs identifier certaines anomalies comme des nœuds qui ne vérifient pas les caractéristiques de la communauté à laquelle ils appartiennent. Nous proposons d’abord une validation empirique de la méthode sur des réseaux attribués dynamiques générés artificiellement. A cette occasion, nous étudions son comportement relativement à des changements structurels des communautés et à des modifications des valeurs des attributs. Nous l’appliquons également à deux contextes du monde réel : un réseau de collaborations scientifiques codé à partir des données DBLP, et un réseau d’interactions sociales et musicales obtenu via le service LastFM.
Abstract. Complex
Networks constitute a convenient tool to model real-world complex
systems. For this reason, they have become very popular in the last
decade. It is possible to encode various types of information in a
complex network, depending on the studied system and modeling
objectives and needs. In its most basic form, a complex network
contains only nodes and links (plain network). But it is also possible
to enrich it by setting link directions (directed network), associating
attributes to nodes (attributed network) or links, or introducing a
temporal dimension (dynamic networks). Many tools exist to study
complex networks. Among them, community detection is one of the most
important. A community is roughly defined as a group of nodes more
connected internally than to the rest of the network. In the
literature, this intuitive definition has been formalized in many ways,
leading to countless different methods and variants to detect
communities. In the large majority of cases, the result of these
methods is set of node groups in which each node group corresponds to a
community. From the applicative point of view, the meaning of these
groups is as important as their detection. However, although the task
of detecting communities in itself took a lot of attraction, the
problem of interpreting them has not been properly tackled until now.
In this thesis, we see the interpretation of communities as a problem
independent from the community detection process, consisting in
identifying the most characteristic features of communities. We break
it down into two sub-problems: 1) finding an appropriate way to
represent a community and 2) objectively selecting the most
characteristic parts of this representation. To solve them, we take
advantage of the information encoded in dynamic attributed networks. We
propose a new representation of communities under the form of temporal
sequences of topological measures and attribute values associated to
individual nodes. We then look for emergent sequential patterns in this
dataset, in order to identify the most characteristic community
features. Our solution to the interpretation problem takes the form of
a five-stepped framework. First, we detect the communities and process
the nodal topological measures. Second, we create a database of nodal
sequences, thanks to these measures as well as the nodal attributes
originating from the studied real-world system. Third, we identify the
sequential patterns contained in this database. Fourth, we filter them
using some objective criteria related to the notions of prevalence
(support measure), emergence (growth rate) and closeness. Fifth, we
finally select the most characteristic patterns covering each
community. Our framework produces the characteristic sequential
patterns and also allows identifying outlier nodes. We perform a
validation of our framework on artificially generated dynamic
attributed networks. At this occasion, we study its behavior relatively
to changes in the temporal evolution of the communities, and to the
distribution and evolution of nodal features. We also apply our
framework to real-world systems: a DBLP network of scientific
collaborations, and a LastFM network of social and musical
interactions. Our results show that the detected communities are not
completely homogeneous, in the sense several node topic or interests
can be identified for a given community. Some communities are composed
of smaller groups of nodes which tend to evolve together as time goes
by, be it in terms of individual (attributes, topological measures) or
relational (community migration) features. The detected anomalies
generally fit some generic profiles: nodes misplaced by the community
detection tool, nodes relatively similar to their communities, but also
significantly different on certain features and/or not synchronized
with their community evolution, and finally nodes with completely
different interests.
Publications liées à la thèse
Gunce
Keziban Orman, Vincent Labatut, Marc Plantevit, Jean-Francois
Boulicaut. Interpreting Communities based on the Evolution of a Dynamic
Attributed Network. Social Network Analysis and Mining 5(1): 20:1-20:22 (2015).