0 citations0 references

Characteristics of Open Data CSV Files

2016pp. 72–79

Citations Over TimeTop 10% of 2016 papers

Johann Mitlöhner, Sebastian Neumaier, Jürgen Umbrich, Axel Polleres

Abstract

This work analyzes an Open Data corpus containing 200K tabular resources with a total file size of 413 GB from a data consumer perspective. Our study shows that ~10% of the resources in Open Data portals are labelled as a tabular data of which only 50% can be considered CSV files. The study inspects the general shape of these tabular data, reports on column and row distribution, analyses the availability of (multiple) header rows and if a file contains multiple tables. In addition, we inspect and analyze the table column types, detect missing values and report about the distribution of the values.

Related Papers

→ SubTab: Data Exploration with Informative Sub-Tables(2022)7 cited
→ Precise Table Recognition by Making Use of Reference Tables(1999)16 cited
MATNAMES: Stata module to return matrix row and column names(2009)
→ Unpivoting Columns to Rows(2020)