Step 2: Prepare the data
Once the data has been selected, the next step is to prepare this data for publication. The following sub-steps have to be considered:
- a) Obtain access to the data source and or data extracts or create a new dataset in a way that can be replicated.
- b) Obtain a copy of the logical and physical model of the database to be used in the data modelling Step 3.
- c) Perform a data quality assessment to get insights into the data quality of the dataset.
- d) Use data cleansing were needed to improve the data quality, e.g., by removing outdated, obsolete and irrelevant data
- e) Implement technical interventions, such as anonymization of sensitive data elements or the integration of datasets identified when selecting the data.
Different tools can be used for these steps ranging from general purpose spreadsheet and database tools to dedicated data cleansing tools. We will now describe sub-steps c, d and e, the data quality assessment, data cleansing and data integration in more detail.