Versioned Spreadsheet Corpora for Enron, EUSES and FUSE

About Versioned Spreadsheets

End users may create new spreadsheets based on existing ones and reuse the data layout and computational logic. These new created spreadsheets share the same/similar data layout and computational logic with existing ones, and can be considered as the updated versions of the existing ones. However, spreadsheets are rarely maintained by version control tools. The version information between spreadsheets is usually missing and different versions of a spreadsheet coexist as individual and similar spreadsheets.

The version information of spreadsheets can be used to study spreadsheet evolution, error detection, etc. It is important to recover version information of spreadsheets, and further build versioned spreadsheet corpora for future research. In this project, we propose some approaches (ICSE SEIP 2016, MSR 2017) to recover version information of spreadsheets. Based on our approaches, we further provide several versioned spreadsheet corpora.

Please use the data only for teaching and research purposes. Should you have any questions, please contact Wensheng Dou ( and Liang Xu (

From this website, you can download the following versioned spreadsheet corpora.

Obtain Versioned Spreadsheet Corpora

