Data extraction and preparation comprises the activities of extracting data from existing data bases and preparing the data to be published as open data. In general, these activities can be carried out automatically by ICT systems or manually by an administrative clerk. Here, the variety of technical systems employed by the different parts of the public administration plays a significant role. Some administrations have developed tools or updated existing applications and added functionalities to extract, clean, prepare and publish data automatically. For this, however, ICT vendors need to be commissioned, because most applications appear not to be open data-ready. Instead, they are programmed to keep data within their boundaries protected and secured and only allow its release through predefined interfaces. Thus, to adjust existing ICT systems causes costs and therefore to open data relies to a large extent on manually extracted data.
Whether data is extracted automatically or manually, the data owners are involved not only in decisions about what data is published, which parts of the data and which formats are on offer; they also define the process and assign responsibilities for the tasks. Here, they are assisted by ICT departments and ICT strategy units who often provide guidance and make recommendations.
In the observed cases where data extraction is done manually, it is administrative clerks from the administrative departments themselves who are responsible for extracting and preparing the data, e.g. separating parts that violate privacy rights and adding meta data. However, these persons often have a scarce understanding of open data use and the role formats, structure, vocabulary and meta data play for using open data. Also, these tasks are generally not high on the departments or organisations executives’ agenda and thus do not justify significant effort. Thus, because of the often labour-intensive way in which open data has been adopted in these cases, resources and priorities arguments are regularly raised.
Finally, the prepared data need to be submitted to the open data platform. For security and performance reasons, the open data base is kept separately from the internal database, so attacks and external use does not reduce internal performance.
According to Spanish regulation, every public administration at the national level needs a part on its web page where it publishes open data that is also federated to datos.gob.es. This ensures that even though data is offered dispersed and only subsequently federated, it can be found and retrieved easily. However, this does not include all autonomous regions and the local level which will nevertheless federate voluntarily in the upcoming months. In addition, federation does not enable the open data platform provider or the ICT/open data unit that commissions the platform to impose quality standards or format standards. Thus, the central open data catalogue contains data with various formats, structures, vocabularies etc. and in unknown quality, whichever way it is provided by the data owner.
In the German case, no regulation exists so far that mandates anyone to publish on the national platform. Here, even fewer standards exist regarding licensing, as outlined above. Thus, the currently 7.200 data sets by August 2014, are published under twelve different standardised licenses and for approximately 1.000 data sets some not further specified generic license is used. However, some general meta data values are required.
Stakeholders and their exemplary interests in data extraction, preparation and publication
|Data owners – administrative clerk||Conform with open data-related obligations with little effort as possible, when purpose is little understood and praise by superiors unlikely|
|Data owners – executive level||Provide as little resources as possible for open data, as long as risks posed remain higher than potential benefits|
|ICT vendors||Build secure, reliable ICT products; only provide open formats or open data-interfaces when these are mandatory, provide a USP or enhance the product portfolio in another manner|
|ICT departments||Recommend, train and advise specialist departments on open data standards and processes|
|Open data platform provider||Provide easy to use, low threshold platform, while ensuring performance and security of running applications|
|ICT/open data unit||Maximise the amount of open data sets on the central open data portal; subordinate coherent standards|
References can be found here: OpenDataMonitor Project – Shared References