Fabien Duchateau

Page de Fabien Duchateau

Site web UCBL Site web LIRIS


Version anglaise

XBenchMatch

XBenchMatch is a benchmark involving a set of criteria for testing and evaluating schema matching tools. It focuses on the assessment of schema matching tools in terms of matching quality (precision, recall, F-measure, post-match effort, overall) and time performance. We also provide a testbed involving a large schema corpus that can be used by everyone to quickly benchmark their new schema matching algorithms. Finally, XBenchMatch includes new metrics for evaluating the quality of an integrated schema.
Keywords: XBenchMatch, schema matching, benchmark, schema matching evaluation, XML datasets, data integration.

Related publications

  1. Designing a Benchmark for the Assessment of Schema Matching Tools
    Open Journal of Databases (OJDB), 2014
    Fabien Duchateau and Zohra Bellahsene

    @inproceedings {ojdb14,
      author = {Fabien Duchateau and Zohra Bellahsene},
      title = {Designing a Benchmark for the Assessment of Schema Matching Tools},
      booktitle = {Open Journal of Databases (OJDB)},
      year = {2014},
      issn = {2199-3459},
      volume = {1},
      number = {1},
      pages = {3-25},
      url = {https://www.ronpub.com/OJDB-v1i1n02_Duchateau.pdf},
      publisher = {RonPub, Germany},
    }

  2. Matching and Alignment: What is the Cost of User Post-match Effort?
    OTM Conferences, CooPerative Information Systems (CooPIS), 2011
    Fabien Duchateau and Zohra Bellahsene and Remi Coletta

    @inproceedings {coopis11,
      author = {Fabien Duchateau and Zohra Bellahsene and Remi Coletta},
      title = {Matching and Alignment: What is the Cost of User Post-match Effort?},
      booktitle = {OTM Conferences, CooPerative Information Systems (CooPIS)},
      year = {2011},
      pages = {421-428},
      publisher = {Springer},
    }

  3. On Evaluating Schema Matching and Mapping
    Schema Matching and Mapping, 2011
    Angela Bonifati and Zohra Bellahsene and Fabien Duchateau and Yannis Velegrakis

    @inbook {BonifatiBFV2011,
      title = {On Evaluating Schema Matching and Mapping},
      booktitle = {Schema Matching and Mapping},
      publisher = {Data-Centric Systems and Applications, Springer},
      year = {2011},
      chapter = {9},
      pages = {253-291},
      url = {http://www.springer.com/computer/book/978-3-642-16517-7},
      author = {Angela Bonifati and Zohra Bellahsene and Fabien Duchateau and Yannis Velegrakis},
    }

  4. Measuring the Quality of an Integrated Schema
    Conference on Conceptual Modelling (ER), 2010
    Fabien Duchateau and Zohra Bellahsene

    @inproceedings {er2010,
      author = {Fabien Duchateau and Zohra Bellahsene},
      title = {Measuring the Quality of an Integrated Schema},
      booktitle = {Conference on Conceptual Modelling (ER)},
      year = {2010},
      pages = {261-273},
      ee = {http://dx.doi.org/10.1007/978-3-642-16373-9_19},
      url = {http://dx.doi.org/10.1007/978-3-642-16373-9_19},
      bibsource = {DBLP, http://dblp.uni-trier.de},
    }

  5. XBenchMatch: a Benchmark for XML Schema Matching Tools
    Very Large DataBases (VLDB), 2007
    Fabien Duchateau and Zohra Bellahsene and Ela Hunt

    @inproceedings {vldb2007demo,
      author = {Fabien Duchateau and Zohra Bellahsene and Ela Hunt},
      title = {XBenchMatch: a Benchmark for XML Schema Matching Tools},
      booktitle = {Very Large DataBases (VLDB)},
      year = {2007},
      pages = {1318-1321},
      ee = {http://www.vldb.org/conf/2007/papers/demo/p1318-duchateau.pdf},
      url = {http://www.vldb.org/conf/2007/papers/demo/p1318-duchateau.pdf},
    }

Appendix

  • The datasets are described in our VLDB 2007 and CoopIS 2011 papers. Each of the ten datasets contains a set of XML schemas to be matched, a set of expert correspondences (ground truth) and a reference integrated schema.
  • Download the archive with the datasets (100 KB).

Prototype

  • The XBenchMatch tool is described in the VLDB 2007 paper.
  • Download the XBenchMatch tool (14 MB). The current version only works for Linux.
  • Unzip the content of this archive. It includes four datasets (person, order, biology, university) in the subdirectory defaultbenchmark. Each dataset contains the schemas to be matched, their expert set of correspondences and one possible integrated schema. The subdirectory matchers_input contains examples of sets of correspondences and integrated schemas generated by some matching tools (COMA++, Porsche, Approxivect/BMatch, Similarity Flooding) for the default datasets.
  • To run XBenchMatch, either execute the script XBenchMatch_unix.sh or directly execute java on the jar.
    sh XBenchMatch_unix.sh
    OR
    java -jar XBenchMatch_fat.jar
  • Example of scenario: in the GUI, you can choose to run one of the default dataset from the menu bar. The application requests the schemas and set of correspondences generated by a matching tool for these default datasets. Then, XBenchMatch computes the matching quality and generates various plots to assess this quality.

Screenshots

screenshot of XBenchMatch - selecting scenario
Selecting a scenario
screenshot of XBenchMatch - parameters
Tuning parameters
screenshot of XBenchMatch - old GUI
Benchmarking results in XBenchMatch