Fabien Duchateau, UCBL

XBenchMatch

XBenchMatch is a benchmark involving a set of criteria for testing and evaluating schema matching tools. It focuses on the assessment of schema matching tools in terms of matching quality (precision, recall, F-measure, post-match effort, overall) and time performance. We also provide a testbed involving a large schema corpus that can be used by everyone to quickly benchmark their new schema matching algorithms. Finally, XBenchMatch includes new metrics for evaluating the quality of an integrated schema.
Keywords: XBenchMatch, schema matching, benchmark, schema matching evaluation, XML datasets, data integration.

Related publications

Designing a Benchmark for the Assessment of Schema Matching Tools
Open Journal of Databases (OJDB), 2014
Fabien Duchateau and Zohra Bellahsene

@inproceedings {ojdb14,
  author = {Fabien Duchateau and Zohra Bellahsene},
  title = {Designing a Benchmark for the Assessment of Schema Matching Tools},
  booktitle = {Open Journal of Databases (OJDB)},
  year = {2014},
  issn = {2199-3459},
  volume = {1},
  number = {1},
  pages = {3-25},
  url = {https://www.ronpub.com/OJDB-v1i1n02_Duchateau.pdf},
  publisher = {RonPub, Germany},
}
Matching and Alignment: What is the Cost of User Post-match Effort?
OTM Conferences, CooPerative Information Systems (CooPIS), 2011
Fabien Duchateau and Zohra Bellahsene and Remi Coletta

@inproceedings {coopis11,
  author = {Fabien Duchateau and Zohra Bellahsene and Remi Coletta},
  title = {Matching and Alignment: What is the Cost of User Post-match Effort?},
  booktitle = {OTM Conferences, CooPerative Information Systems (CooPIS)},
  year = {2011},
  pages = {421-428},
  publisher = {Springer},
}
On Evaluating Schema Matching and Mapping
Schema Matching and Mapping, 2011
Angela Bonifati and Zohra Bellahsene and Fabien Duchateau and Yannis Velegrakis

@inbook {BonifatiBFV2011,
  title = {On Evaluating Schema Matching and Mapping},
  booktitle = {Schema Matching and Mapping},
  publisher = {Data-Centric Systems and Applications, Springer},
  year = {2011},
  chapter = {9},
  pages = {253-291},
  url = {http://www.springer.com/computer/book/978-3-642-16517-7},
  author = {Angela Bonifati and Zohra Bellahsene and Fabien Duchateau and Yannis Velegrakis},
}
Measuring the Quality of an Integrated Schema
Conference on Conceptual Modelling (ER), 2010
Fabien Duchateau and Zohra Bellahsene

@inproceedings {er2010,
  author = {Fabien Duchateau and Zohra Bellahsene},
  title = {Measuring the Quality of an Integrated Schema},
  booktitle = {Conference on Conceptual Modelling (ER)},
  year = {2010},
  pages = {261-273},
  ee = {http://dx.doi.org/10.1007/978-3-642-16373-9_19},
  url = {http://dx.doi.org/10.1007/978-3-642-16373-9_19},
  bibsource = {DBLP, http://dblp.uni-trier.de},
}
XBenchMatch: a Benchmark for XML Schema Matching Tools
Very Large DataBases (VLDB), 2007
Fabien Duchateau and Zohra Bellahsene and Ela Hunt

@inproceedings {vldb2007demo,
  author = {Fabien Duchateau and Zohra Bellahsene and Ela Hunt},
  title = {XBenchMatch: a Benchmark for XML Schema Matching Tools},
  booktitle = {Very Large DataBases (VLDB)},
  year = {2007},
  pages = {1318-1321},
  ee = {http://www.vldb.org/conf/2007/papers/demo/p1318-duchateau.pdf},
  url = {http://www.vldb.org/conf/2007/papers/demo/p1318-duchateau.pdf},
}

Appendix

The datasets are described in our VLDB 2007 and CoopIS 2011 papers. Each of the ten datasets contains a set of XML schemas to be matched, a set of expert correspondences (ground truth) and a reference integrated schema.
Download the archive with the datasets (100 KB).

Prototype

The XBenchMatch tool is described in the VLDB 2007 paper.
Download the XBenchMatch tool (14 MB). The current version only works for Linux.
Unzip the content of this archive. It includes four datasets (person, order, biology, university) in the subdirectory defaultbenchmark. Each dataset contains the schemas to be matched, their expert set of correspondences and one possible integrated schema. The subdirectory matchers_input contains examples of sets of correspondences and integrated schemas generated by some matching tools (COMA++, Porsche, Approxivect/BMatch, Similarity Flooding) for the default datasets.
To run XBenchMatch, either execute the script XBenchMatch_unix.sh or directly execute java on the jar.
sh XBenchMatch_unix.sh
OR
java -jar XBenchMatch_fat.jar
Example of scenario: in the GUI, you can choose to run one of the default dataset from the menu bar. The application requests the schemas and set of correspondences generated by a matching tool for these default datasets. Then, XBenchMatch computes the matching quality and generates various plots to assess this quality.

Page de Fabien Duchateau

XBenchMatch

Related publications

Appendix

Prototype

Screenshots