Step 7: Add metadata
While following the roadmap, and especially in the last step, the organization of governance, you will realize that metadata about your dataset is of crucial importance. In this step we will introduce three levels of metadata that you can use when describing your dataset.
In order to make the dataset self-describing and thus support the re-usage of data, extra information about the data needs to be added to the data by the data supplier. Self-describing data suggests that information about the encodings used for each representation is provided explicitly within the representation. Such data about data is called metadata and includes information about the data origin, the data production date and for which applications the data can be used. Metadata that describes the process of data development is also referred to as provenance [Freie et al. 2008]. Provenance gives an indication of the reliability of the data. Another metadata aspect interesting for reusing data is information about the usability of the data. It might be interesting for data users to learn about successful applications of other data users. Information about data usability is also very valuable for Linked Data. It can provide a good indication of the potential success of similar applications in the future. Metadata can be added by simply adding triples to the RDF version of the dataset obtained in Step 5 describing facts about the dataset.
Linked Data published on the Web should be as self-describing as possible in order to make it easier for clients to understand and use the data. Important aspects of self-descriptiveness are making vocabulary terms de-referenceable according to the best practices described in Publishing RDF Vocabularies, using terms from common vocabularies and providing vocabulary mappings for proprietary vocabulary terms.
We structure this section using the three levels of metadata described by CKAN:
We extend the aspects mentioned in that classification with aspects from our quality model developed during earlier research. The Dutch government has published a list with elements that metadata of datasets published at data.overheid.nl should include. Most of the elements are compulsory. The elements fall in four categories: context, data source, characteristics, involved organizations. We add these elements to the tables provided for the three levels of metadata using their original identifier from data.overheid.nl.
[Freie et al.] Freire, J., Koop, D., & Moreau, L. (2008). Second International Provenance and Annotation Workshop. Paper presented at the IPAW 2008, Salt Lake City, Utah.