The figure below shows an overview of the ODM platform architecture, presenting the main components and illustrating the metadata processing workflow, as was initially designed in the beginning of the project and presented in Deliverable D3.3. The progress of the work throughout the second year of the project has adhered to this architecture, without any major changes or deviations.
Below, we briefly outline the main modules and components comprising the ODM platform:
- Metadata collection module. This module is responsible for collecting metadata from a list of registered open data catalogues. It consists of the following main components:
- Catalogue registry. It allows the registration of catalogues for harvesting and monitoring. Registration is done via a Web-based User Interface (UI), where a form is completed with basic information about the registered catalogue, as well as some additional information that is needed in order to setup and configure the respective harvesting process for this catalogue (See section 2 for detailed information). This information provided during the registration step forms the catalogue profile and is stored in the catalogue registry.
- Job manager. The Job Manager schedules the execution of harvesting jobs, periodically or on demand, and is responsible for monitoring their process and reporting the status of execution. A harvesting job is a task that collects metadata from a registered open data catalogue. It provides the required configuration that drives the harvesting process (e.g., which harvester to use and a set of metadata extraction rules to be applied). Harvesting jobs are maintained in a queue and are scheduled for processing.
- Metadata harvesters. These are scripts executed by harvesting jobs in order to perform the actual extraction of metadata from the respective catalogue. Different harvesters are implemented and used to address the different open data platforms and APIs that exist. The configuration included in the harvesting job specifies which harvester should be used and how.
- Metadata processing module. This module performs the cleaning, integration and analysis of the metadata that are extracted from the various catalogues that are being monitored. It consists of the following main components:
- Harmonisation engine. It processes the raw, original metadata that were retrieved by the harvesters and performs cleaning and integration tasks required to obtain a homogenized dataset in terms of both attribute names and attribute values.
- Analysis engine. Once the collected metadata have been mapped to a consistent internal schema and representation, the analysis engine performs the required operations (e.g. aggregations) in order to compute the metrics that have been defined for monitoring (Refer to D3.7 for detailed information). It also makes these results available to the demonstration site for visualisation and presentation to the end users.
- Demonstration site. This module comprises several components for generating intuitive visualisations and reports that are presented to the end users, allowing them to obtain a comprehensive overview of trends in the evolving open data landscape, based on the monitored open data catalogues.
- Administration panel. This module comprises a set of dashboards that allows the ODM system administrator to monitor, control and configure various aspects of the system’s operation (e.g., configure options for metadata collection, monitor the status of harvesting jobs, define rules for metadata harmonisation, specify templates for visualisations).
This report focuses in more detail on the first two modules, i.e. the metadata collection and metadata processing. For a more detailed description of the demonstration site, and a report on the status of its implementation, see Deliverable D3.4 and forthcoming Deliverable D3.7.
Overview of architecture and processing workflow.