Below you can find the main steps of the ODM processing workflow:

Step 1: Catalogue registration. The first step of the process is to register a new open data catalogue for monitoring. This is done via a Web-based UI, which presents a form requesting several attributes that have to be filled in order to indicate the profile of the catalogue and to guide the metadata extraction process.

Step 2: Creation of harvesting job. Once a new catalogue is registered for monitoring and its profile is filled in, a corresponding harvesting job is created, configured and submitted to the Job Manager. The Job Manager inserts the job in the queue and schedules it for execution.

Step 3: Triggering of harvesting job. Periodically and/or on demand (as specified during a catalogue’s registration), the Job Manager de-queues a harvesting job and initiates its execution. This is done by invoking the appropriate Metadata Harvester and using the configuration properties specified in the description of the job.

Step 4: Metadata extraction. The invoked Metadata Harvester applies the configured extraction rules to retrieve the relevant metadata. The extracted metadata are stored in the Raw Metadata Repository. During this step, some preliminary actions for cleaning and integrating the metadata also take place. For example, by applying the specified extraction rules, some of the collected metadata are mapped to the internal representation.

Step 5: Staging of collected metadata. The raw collected metadata are heterogeneous and hence need to undertake a series of cleaning and harmonisation operations before they become available for further analysis and use. Nevertheless, for provenance reasons, it is desirable to also keep the original metadata. For example, this can be useful if needed to trace back the initial form of a processed item or if some steps of the cleaning and harmonisation need to be re-executed (e.g. because new/improved cleaning or harmonisation rules have been configured). Thus, before further processing takes place, the collected metadata are moved to the Staging Area.

Step 6: Metadata cleaning and harmonisation. Once moved to the staging area, a series of cleaning and harmonisation operations is executed in order to transform the initial metadata to a consistent, internal representation. This applies to both attribute names and values, and involves tasks such as mapping attribute names from other schemas to the internal one, validating and normalising different date formats, normalising names of file formats, licence titles, etc. The final results are stored in the Processed Metadata Repository.

Step 7: Metadata analysis. After the cleaning and integration steps have been performed, the metadata become available to the Analysis Engine. This applies the necessary aggregations or other computations to calculate the key metrics that have been defined for monitoring.

Step 8: Accessing the results. Finally, the results are made available through an API to other components, in particular to the Demonstration Site (, which produces various charts, visualisations and reports for the end user. The API provides both the metadata records themselves (e.g. all the metadata of the datasets in a given catalogue), as well as aggregate results for various metrics (e.g. the number of datasets uploaded in the previous month).


For further details see Deliverable D3.6.