csv,conf 2014 - July 15, 2014
11:55 session CSV Validation by Adam Retter @adamretter
Video available at https://www.youtube.com/watch?v=9pIBG5iv5XI
- has been working at UK National Archives
- there is a new digital archive for the UK - the Digital Records Infrastructure (DRI)
- need to ingest metadata from many sources
- CSV was decided upon as file format for metadata
- many possible errors in the CSV provided (including some people just rename Excel files, rather than exporting as CSV)
- approach: fail fast as repeat
- was decided to create a CSV Validator
- providing code to suppliers - now the suppliers can do up-front data validation
- want to be able to write rules, beyond just basic checks, and have it make easy enough to do so that domain experts can write rules
- CSV Schema language as a way to write rules
- https://github.com/digital-preservation/
- looking for developers to help advance this work
See also UK National Archives blog: CSV validator – a new digital preservation tool.
Comments