Modified EUSES Corpus

The original EUSES spreadsheet corpus  is a collection of spreadsheets obtained from an internet search. The spreadsheets come from different domains, e.g. data management and financial computations, and they vary in size and structure.

Since the original corpus does not come with documented faults, we have automatically created 576 single-fault versions of the spreadsheets by randomly selecting a formula cell and applying a mutation operator to it. The testing decisions for this corpus were created by comparing the mutated version of a spreadsheet with the original spreadsheet. All result cells whose value differ from the original value were marked as erroneous, while all result cells that were identical to the original were marked as correct. More details about the modified corpus can be found in our paper “On the Empirical Evaluation of Fault Localization Techniques for Spreadsheets“.

This modified version of the EUSES corpus can be downloaded here.
If you want to cite this corpus, please refer to the initial paper using the following bibtex entry.

@inproceedings{Hofer2013,
author = {Birgit Hofer and Andr{\'{e}} Riboira and Franz Wotawa and Rui Abreu and Elisabeth Getzner},
title = {On the Empirical Evaluation of Fault Localization Techniques for Spreadsheets},
booktitle = {Proceedings of the 16th International Conference on Fundamental Approaches to Software Engineering (FASE)},
series = {Lecture Notes in Computer Science},
volume = {7793},
publisher = {Springer},
pages = {68--82},
year = {2013},
doi = {10.1007/978-3-642-37057-1_6},
}
Additionally, we offer a version containg double and triple faults,
which can be downloaded here.