General distinctions between user groups can be drawn based on sectors (private/public) or the degree of organisation (individual/collective/corporate). However, for the questions at stake here, it remains paramount to translate user characteristics and requirements into necessary functionalities of the OpenDataMonitor platform. Therefore, we focus on stakeholder group interests, requirements and understanding of the topic and level of technical expertise in regard to open data.
Stakeholder Requirements and OpenDataMonitor Potential
|Policy Makers: Parliament, ministries pushing open data, coordination bodies for e- government and ICT, governance structures for cross-level collaboration in e- government and ICT||Understanding barriers to open data publication and use; understand, develop and enforce widely used standards (formats, structure, licenses etc.)||Benchmark volume and sophistication of the published data as well as its use;highlight coverage of used standards;present usage of open data; metrics per geography (see D2.3)|
|Commercial User (Asscociations): corporate advocacy groups, business associations, media outlets||Detect high-value data sets with minimally and transparent strings attached; detect mashable, harmonised data sets on a large scale;||Highlight high-value data sets (e. update frequency of a data set); map mashable content (congruent licenses, harmonised structure and vocabulary); highlight coverage of used standards (esp. licenses);metrics per geography and per data set|
|Civic Advocacy Groups: civic advocacy groups||Advocate the publication of and detect already published politically sensitive data sets (politico-administrative)||Highlight and compare sensitive data sets to advocate their publication in other locations; map mashable content;metrics per data set and pergeography|
|Government bodies and associations: inter- and supra- national bodies and associations, coordinating bodies around ICT and e- government, public enterprise in charge of furthering the information society, network of smart cities, standardisation bodies||Advocate the publication of high- value data sets; benchmark volume and sophistication of the published data as well as its use to name and shame understand coverage of used standards to align with these; understand what constitutes high-value data sets to advocate their publication||Benchmark volume and sophistication of the published data as well as its use;highlight coverage of used standards;highlight high-value data sets; metrics per geography, catalogue and data set|
|Data generating and (potentially) providing government bodies||Understand what constitutes a high-value data set in their professional domain; learn about standards in open data in general and their professional domain; understand how open data in their professional domain can be used||Highlight high-value data sets by domain or topic;highlight coverage of used standards (licenses, structure and vocabulary) by domain or topic;highlight applications of open data by domain or topic|
|Technology providers: Private technology consultancies, ICT vendors, (public) ICT service providers, Open data platform providers; applied research centres||Understand widely adopted technologies and standards to align with these||Highlight coverage of used infrastructure, technology and standards (formats, licenses); metrics per geography and overall|
Groups for which generally little technical expertise has to be presumed are policy-makers, data generators and some of the support units. Nevertheless these groups are involved in major decisions about open data and shape its conceptualisation and implementation. Policy-makers (parliamentarians, high-level executives) are involved with open data at a rather abstract level. However, their commitment to and interest in the topic in general has a significant impact on how the machinery of government approaches and implements open data. Insightful for policy-makers is to see how their sphere of responsibility (jurisdiction, organisation) compares to others in regards of volume and sophistication of the published data as well as its use. This serves as a basis to benchmark their performance and identify fields of strategic interest. Therefore, for them it is necessary to see, what data is published by other public sector organisations and how frequently this data is used. Thereby they can get a better understanding of high-value datasets. At the moment administrations often pursue an “availability approach” to publishing data: They publish data that is available in a structured format, at a fairly good quality level and not obviously sensitive, because they lack a profound understanding of what data might be useful. At a more specific level, policy- makers pass laws, issue executive orders or policies about open data that shape how open data is published (e.g. prescribe licenses, formats, meta data standards or even paradigmatic shifts to consider everything open by default) (see Zuiderwijk & Janssen, 2014). However, these decisions are mostly prepared by ministries or other governmental departments, considered here as support units further below.
Another group that approaches open data from a rather thematic and legal perspective are data generators who typically hold the data and often consider themselves as data owners. They generate data in the course of their regular work and are predominantly responsible for the decisions whether and which of this data to publish as open data. Besides information about which data from their subject area is published by other organisations (see above), more detailed thematic and technical aspects are relevant for the decisions they make in terms of open data. Information about data structure, vocabulary and measurement scales could provide guidance for data generators how to publish their data, although they often seem to be unaware of its significance. Here, various European, supra-national and national conventions exist – some codified, others not – in various policy fields which could be built upon, as has been demonstrated with the INSPIRE directive. At a basic level, insights in which meta data schema are used could be helpful. At a far more sophisticated level, patterns in data structures and vocabulary might assist. Furthermore, data generators appear largely unaware of how open data is used and often seem to lack imagination of its possible use. In this respect, successful use cases of open data could prove insightful for them. In addition, the legal perspective is especially significant in the public sector. This is in particular true for data generators and for support units (legal department, data protection officer) who are involved in decisions about which data to publish, with which level of detail and under which license. Thus, such information could assist their decisions about licensing, liabilities and privacy protection. On the whole, data generators are not fully aware of the topic open data, do not initially endorse the idea of publishing data and have not yet integrated open data processes in their routine activities. Therefore, it poses a challenge to even attract this group to information about open data.
IT strategy units, platform providers and private consultancies often have a higher level of technical expertise, although not necessarily in regards to open data and how it is used. They are involved in decisions about portal architecture, publishing processes and to a varying extent can set standards for data published in a catalogue (data format, meta data standards, quality). For these decisions, information about the spread of platforms (e.g. CKAN), meta data schema and data formats could help them in establishing a state of the art open data portal.
Among the intermediary and end-users, advocacy groups stand out as a group which does not necessarily use open data itself, but gathers and publishes information about open data to further their cause (see Davies, 2013; Open Knowledge Foundation, 2013). Thus, they require a breadth of detailed information about open data, in particular for benchmarking purposes. Advocacy groups in general have a sophisticated technical understanding of open data so it is not necessary to reduce complexity for them in this respect. Quite the contrary, in order to illustrate which catalogue hosts the most exhaustive meta data and points the most comprehensive and sophisticated datasets advocacy groups need to look at technical details. For lobbying efforts, it is necessary to trace back datasets to specific territories, policy fields and organisations. Especially the content-relation (policy field) seems relevant, since in the various institutional arrangements in European countries, different organisations are responsible for and accommodate the same thematic data. With several catalogues by now federating data from numerous organisations, jurisdictions and even countries, this becomes more important for comparisons.
Among the immediate users of open data (esp. application developers, researchers, data journalists) further differentiation appears necessary. Different approaches in data detection which can be termed “data-driven” versus “issue-driven”. Issue-driven users search for datasets in the context of a specific topic, because they have a certain interest and know in advance which data they therefore need. They search directly on an open data platform via search-terms, specific keywords. For these kinds of users, portals/catalogues provide fairly appropriate search masks. Thus far, however, they can only search in a specific catalogue and find results of the data referenced there. Since portals often contain only meta data about data from a specific jurisdiction or even organisation, users might have to search in different catalogues instead of looking into one meta-catalogue. Furthermore, the lack of meta data quality often inhibits or restricts the ability of these users to find relevant datasets. Thus, a meta-catalogue would be even more powerful, if it provided a search mask not only for the meta data in the catalogues, but the datasets themselves which are hosted in the repositories.
Data-driven users on the other hand look for complex, comprehensive, and large datasets, irrespective of their specific topical content. Their assumption basically seems to be that a complex dataset can be put to a purposeful use, even without a prior idea. Until now, they find scant support on existing catalogues to identify relevant, sensitive, high-value datasets. Since a number of datasets are simply published because they are at hand, catalogues are stacked with data of little use for these users and search term queries are of little help. More relevant would be algorithms that analyse the size of a dataset (columns, data points, whether a dataset contains string-data and numeric data or structured and unstructured data), its update frequency or whether it contains linked or non-linked data.
References can be found here: OpenDataMonitor Project – Shared References.