This page serves as overview (and link collection) of existing corpora and specific subsets of them (e.g., known faults, smell annotation, version history).
Enron
- Original ENRON corpus
- VENRON (Enron enhanced with version information)
- ENRON errors corpus (Enron spreadsheet containing faults, PDF)
- Subset of ENRON enhanced with type annotations (Meta-data, headers, attributes, data, derived data)
EUSES
- EUSES corpus
- Subset of EUSES with annotated smells (CUSTODES)
- Modified EUSES (Subset of EUSES with inserted faults and test verdicts)
- Subset of EUSES enhanced with type annotations (Meta-data, headers, attributes, data, derived data)
FUSE
- FUSE corpus
- Subset of FUSE enhanced with type annotations (Meta-data, headers, attributes, data, derived data)
Payroll/Gradebook
- Original Forms3 spreadsheets with inserted faults and test verdicts in a log file (PDF, authors send corpus on request)
- Excel version
Info1
- Corpus with real faults and simulated test verdicts
Integer
- Collection of spreadsheets with inserted faults and test verdicts. All spreadsheets of this corpus comprise only integer values.
Hawaii Kooker
- Collection of faulty spreadsheets created by undergraduate business students (PDF, authors send corpus on request)