Step 3: Model the data
Once access to the data has been ensured and the data quality has been described and improved where necessary, the next step is to model the data. Modeling linked data is often a very time consuming task, but it makes the data more widely understandable and usable both within and across organizations. When creating linked data, one should employ proper engineering practices in order to create datasets of high quality that possibly make use of existing resources on the Web rather than creating them from scratch, and express the intended semantics correctly so that others (both machines and humans) can properly understand and reuse the datasets being built to extend the web of data. In this respect, the following process should be followed for producing high quality linked datasets.
The term linking data is sometimes confusingly used, particularly because one can create “links” in multiple ways. It is also important to notice that “links” between datasets can be done at several steps in the process of data modeling. Different types of “links” can be made: ontology links and data links. We will highlight three different options to link datasets during the process of modeling data using italics.
- Make a conceptual model of the data by defining concepts and their relationships and properties. You can use the logical data model obtained when preparing the data as input for this step.
- Sketch or draw the objects on a white board (or similar) and draw lines to express how they are related to each other. Assign one or more data elements to each object. This kind of data element linking (Option 1) will be discussed in more detail in Step 9.
- Look for real world objects of interest such as people, places, things and locations.
- Use common sense to decide whether or not to make links.
- Investigate how others are already describing similar or related data in vocabularies.
- Reuse existing, standardized and widely adopted vocabularies (Option 2) as much as possible to facilitate data merging and reuse. Since others use the same vocabularies, your dataset will be linked to the dataset of others with the vocabulary as bridge. This is very important to increase the usability of the dataset (click here for more in depth information).
- If reuse is not possible use your own or create a new vocabulary (Option 3) according to the best practices for modelling linked data. Linked data is created by linking your own vocabulary via ontology-links to existing vocabularies (click here for more information).
- Formalize the model and your vocabulary, preferably in the Web Ontology Language OWL, alternatively in RDFS or SKOS.
While modelling you should put aside immediate needs of any application and be sure to test the assumptions in the schema with subject matter experts familiar with the data. It is not necessary to define the ultimate model of the data at once. More the contrary; the philosophy of linked data offers you the possibility to start without modelling the data, do it later or not, or go for a step-by step approach. Tools that help you model the data include Topbraid Composer and Protégé.
We will now elaborate in more detail on two types of ontology linking: the reuse of standard vocabularies and the creation of new vocabularies.