Fabien Duchateau

Page de Fabien Duchateau

Site web UCBL Site web LIRIS


Version anglaise

YAM

YAM is a schema matcher factory. YAM (Yet Another Matcher) is not (yet) another schema matching system as it enables the generation of a la carte schema matchers according to user requirements. These requirements include a preference for recall or precision, a training data set (schemas already matched) and provided expert correspondences. YAM uses a knowledge base that includes a (possibly large) set of similarity measures and classifiers. Based on the user requirements, YAM learns how to best apply these tools (similarity measures and classifiers) in concert to achieve the best matching quality. In our demonstration, we will let users apply YAM to build the best schema matcher for different user requirements.

Related publications

  1. YAM: A Step Forward for Generating a Dedicated Schema Matcher
    Trans. Large-Scale Data- and Knowledge-Centered Systems (TLDKS), 2016
    Fabien Duchateau and Zohra Bellahsene

    @article {tldks2016,
      author = {Fabien Duchateau and Zohra Bellahsene},
      title = {YAM: A Step Forward for Generating a Dedicated Schema Matcher},
      journal = {Trans. Large-Scale Data- and Knowledge-Centered Systems (TLDKS)},
      volume = {25},
      pages = {150–185},
      year = {2016},
      url = {http://dx.doi.org/10.1007/978-3-662-49534-6_5},
      doi = {10.1007/978-3-662-49534-6_5},
    }

  2. (Not) Yet Another Matcher
    Conference on Information and Knowledge Management (CIKM), 2009
    Fabien Duchateau and Remi Coletta and Zohra Bellahsene and Renée J. Miller

    @inproceedings {cikm09,
      author = {Fabien Duchateau and Remi Coletta and Zohra Bellahsene and Renée J. Miller},
      title = {(Not) Yet Another Matcher},
      booktitle = {Conference on Information and Knowledge Management (CIKM)},
      year = {2009},
      pages = {1537-1540},
      ee = {http://doi.acm.org/10.1145/1645953.1646165},
      url = {http://doi.acm.org/10.1145/1645953.1646165},
      bibsource = {DBLP, http://dblp.uni-trier.de},
    }

  3. YAM: a Schema Matcher Factory
    Conference on Information and Knowledge Management (CIKM), 2009
    Fabien Duchateau and Remi Coletta and Zohra Bellahsene and Renée J. Miller

    @inproceedings {cikm09demo,
      author = {Fabien Duchateau and Remi Coletta and Zohra Bellahsene and Renée J. Miller},
      title = {YAM: a Schema Matcher Factory},
      booktitle = {Conference on Information and Knowledge Management (CIKM)},
      year = {2009},
      pages = {2079-2080},
      ee = {http://doi.acm.org/10.1145/1645953.1646311},
      url = {http://doi.acm.org/10.1145/1645953.1646311},
    }

  4. Encore un outil de découverte de correspondances entre schémas XML?
    Bases de Données Avancées (BDA), 2009
    Fabien Duchateau and Remi Coletta and Zohra Bellahsene and Renée J. Miller

    @inproceedings {bda09demo,
      author = {Fabien Duchateau and Remi Coletta and Zohra Bellahsene and Renée J. Miller},
      title = {Encore un outil de découverte de correspondances entre schémas XML?},
      booktitle = {Bases de Données Avancées (BDA)},
      year = {2009},
    }

Prototype

Installation:

  • Download the YAM tool (command line version).
  • Unzip the content of this archive. The subdirectory data contains scenarios, expert mappings, etc.

We illustrate the 4 scenarios which are detailed in our publications. Please note that in our articles, experiments were the results of tens or hundreds of runs. To speed up the process in this demo, we limited to 1 run. Consequently, due to randomness during training process, you might have to run several times the same experiment to notice what is expected. To accelerate the process, we have restricted the number of Weka classifiers to 6 (instead of the 20 that we have tested so far).
Examples of command are given for our running scenario (about hotel booking, webForm85):

  • Robust matcher: without any user preferences, YAM uses the whole KB to learn the most robust schema matcher. The only input is the schema matching scenario.

    java -jar YAMdemo.jar matching_scenario
    java -jar YAMdemo.jar webForm85

  • Promoting recall: to generate a schema matcher which promotes recall, you can use the option -w X, where X is a positive number (mainly between 2 to 5). The higher the number is, the more recall is promoted.

    java -jar YAMdemo.jar -w X matching_scenario
    java -jar YAMdemo.jar -w 5 webForm85

  • Training on similar schemas: you can choose to train on (domain-)similar schemas than those to be matched. For instance, if you have other hotel booking schemas (that have already been matched), you can provide these inputs to YAM to generate a better matcher for matching scenarios from hotel booking domain. Use option -s, followed by the similar training scenarios.

    java -jar YAMdemo.jar -s similar_scenarios matching_scenario
    java -jar YAMdemo.jar -s hotel-domain webForm85

  • Providing expert feedback: finally, user can provide some expert correspondences between the schemas to be matched. YAM uses this knowledge to improve the dedicated schema matcher that it generates. This option is provided thanks to the -f switch, followed by the percentage X (a value between 1 and 100) of expert correspondences to provide. In the command line version, it is easier to use a percentage of random expert correspondences rather than a list selected by the user.

    java -jar YAMdemo.jar -f X matching_scenario
    java -jar YAMdemo.jar -f 10 webForm85

Screenshots

screenshot of YAM
Execution of YAM