Step 1: Select data
The first step of our roadmap is to select the data that you want to publish and determine if any restrictions apply that permit you to publish the data. The reasons for publishing data as open and or linked open data can be very diverse: from compliance to data laws to following competitors to realizing new unexpected value from data. Once an organization has decided to open up some of its datasets, either to a specified community or the general public, a data manager or other responsible person needs to decided which datasets he or she actually wants to publish. This can be done by setting up a data strategy or by inventorying the datasets of an organization and deciding, based on the goals to be reached by open data, which datasets are interesting to be published. Hereby it is important not to be too selective, as others might be able to use the data for new innovative applications that one does not think of in the first place.
Once datasets have been selected for publication one needs to analyze if and how the datasets can actually be opened up or if publication restrictions apply for (parts of) the data. The following aspects should be taken into account when making a decision about opening data: ownership, privacy, economic, data quality and technical format. The open data decision tree shown in Figure 3 can be used to structurally analyze datasets for possible constraints. The decision model works as follows. If a certain constraint to data sharing is present in a given situation, the next step is to analyze if the constraint can be overcome by an intervention (the light green curved arrow in Figure 3). For example, when a privacy constraint occurs, anonymization by filtering or aggregation by combining a dataset into a single record, are potential interventions. Interventions are usually of a technical nature, but also include organizational mechanisms. When no suitable intervention can be identified the dataset cannot be shared. This means that the five constraints can be interpreted as knock-out criteria. The data only be opened if all identified constraints in all categories can be overcome by interventions. This is shown by the arrow on the right-hand side of Figure 3.
We will now provide some exemplary questions per category to give a bit more information on the level of detail of the analysis:
- Ownership: Is the person entitled to decide about opening the data positive about it?
- Privacy: Does the data source contain information that can be traced to individual persons or companies?
- Economic: Is the business case of opening the data positive? (here several business case options can be compared, e.g. the costs and benefits of several technical opening formats)
- Data Quality: Is the data validated to be correct?
- Technical: Is the data published as raw data? Can the data be published in an open format (minimum 2 stars)?
In many cases raw data is appreciated and also might overcome some responsibility issues.
The decision model should be applied both on a dataset level as well as on individual data properties and even data values of a dataset. It should be noted that the decision model that is presented in this section, often serves as an example rather than a definite set of issues that needs to be addressed. While the categories remain more or less the same, for every use case new issues can be added to the categories.
Once the datasets to be published and necessary interventions have been identified, the data publisher can use this information to formulate his data publication strategy and continue the process of preparing the data for publication (Step 2 of this roadmap).