Fabien Duchateau

Page de Fabien Duchateau

Site web UCBL Site web LIRIS


Version anglaise

BMatch

BMatch (a.k.a Approxivect) has been designed to discover mappings between schemas. Its semantic aspect consists in combining both terminological and structural similarity measures. Terminological measures enable the discovery of mappings whose schema elements share similar labels. Conversely, structural measures, based on cosine measure, detects mappings when schema elements have the same neighbourhood. BMatch's second aspect aims at improving the time performance by using an indexing structure, the B-tree, to accelerate the schema matching process. Indeed, we cluster schema element's labels which share the same tokens to reduce search space during matching.

Related publications

  1. Improving quality and performance of schema matching in large scale
    Ingénierie des Systèmes d'Information (ISI), 2008
    Fabien Duchateau and Zohra Bellahsene and Mathieu Roche

    @article {isi2008,
      author = {Fabien Duchateau and Zohra Bellahsene and Mathieu Roche},
      title = {Improving quality and performance of schema matching in large scale},
      journal = {Ingénierie des Systèmes d'Information (ISI)},
      volume = {13},
      number = {5},
      year = {2008},
      pages = {59-82},
      ee = {http://dx.doi.org/10.3166/isi.13.5.59-82},
      url = {http://dx.doi.org/10.3166/isi.13.5.59-82},
      bibsource = {DBLP, http://dblp.uni-trier.de},
    }

  2. A Context-based Measure for Discovering Approximate Semantic Matching between Schema Elements
    Research Challenges in Information Science (RCIS), 2007
    Fabien Duchateau and Zohra Bellahsene and Mathieu Roche

    @inproceedings {rcis2007,
      author = {Fabien Duchateau and Zohra Bellahsene and Mathieu Roche},
      title = {A Context-based Measure for Discovering Approximate Semantic Matching between Schema Elements},
      booktitle = {Research Challenges in Information Science (RCIS)},
      year = {2007},
      pages = {9-20},
      bibsource = {DBLP, http://dblp.uni-trier.de},
    }

  3. An Indexing Structure for Automatic Schema Matching
    International Conference on Data Engineering (ICDE) - Workshops, 2007
    Fabien Duchateau and Zohra Bellahsene and Mark Roantree and Mathieu Roche

    @inproceedings {smdb2007,
      author = {Fabien Duchateau and Zohra Bellahsene and Mark Roantree and Mathieu Roche},
      title = {An Indexing Structure for Automatic Schema Matching},
      booktitle = {International Conference on Data Engineering (ICDE) - Workshops},
      year = {2007},
      pages = {485-491},
      ee = {http://dx.doi.org/10.1109/ICDEW.2007.4401032},
      url = {http://dx.doi.org/10.1109/ICDEW.2007.4401032},
      bibsource = {DBLP, http://dblp.uni-trier.de},
    }

  4. BMatch: a Semantically Context-based Tool Enhanced by an Indexing Structure to Accelerate Schema Matching
    Base de Données Avancées (BDA), 2007
    Fabien Duchateau and Zohra Bellahsene and Mathieu Roche

    @inproceedings {bda2007,
      author = {Fabien Duchateau and Zohra Bellahsene and Mathieu Roche},
      title = {BMatch: a Semantically Context-based Tool Enhanced by an Indexing Structure to Accelerate Schema Matching},
      booktitle = {Base de Données Avancées (BDA)},
      year = {2007},
      bibsource = {DBLP, http://dblp.uni-trier.de},
    }

Appendix

In addition to the publications, an appendix containing more experiment results (ROC curves) is available.

Prototype

Installation:

  • Download the BMatch tool. It contains both the source and the compiled jar. Ensure you have Java installed on your computer (JRE >= 1.4 to launch BMatch, JDK >= 1.6 to compile BMatch).
  • Unzip all files of the BMatch archive in a folder. The subdirectory schemas contains 4 pairs of schemas from different domains (biology, person, univ-dept and order).

Options:

  • Providing a pair of schemas: you can use the option -rep PATH, where PATH is the filepath where the two schemas are stored.

    java Approxivect -rep PATH
    java Approxivect -rep schemas/person/

  • Tuning the parameters: you can edit the source code to tune some of the parameters (refer to the papers if you want to know more about the impact of these parameters).
    • AlgoApp.SIM_THRESHOLD = threshold to be reached to accept a similarity between 2 nodes
    • AlgoApp.REPLACE_THRESHOLD = threshold to be reached to replace strings in the Btree
    • SpecialTreeNode.NB_NIVEAUX = number of levels (both up and down) to select the neighbours
    • SpecialTreeNode.MIN_WEIGHT = minimum weight to accept a node as a neighbour
    • SpecialTreeNode.K = weigth in the weight formula when considering neighbours

  • Compiling BMatch: If the application does not work, or if you want to edit the source code, you might need to compile BMatch. To do so, type in :

    javac Approxivect.java

The correspondences discovered by BMatch are stored in a file approxivectMappings.txt.

Screenshots

screenshot of BMatch
Execution of BMatch