Data detection circumscribes the initial discovery of a data set by a potential user. Here, special attention needs to be paid to the fact that the users interviewed for this report were predominantly small-scale users of open data, often building applications as independent or even free-time users of open data besides their professional occupation. The latter was however related to information technology in all cases. All of the interviewees were technology professionals, programmers in most cases, but also web designers.

The approaches and strategies appear to be vastly different. Whereas one approach can be labelled more  data-driven,  the  other  can  appropriately  labelled  more  issue-driven. Issue-driven  users regularly have a certain interest in mind and know in advance which kind of data they therefore need. They often search via search-terms and keywords in open data catalogues or use general search engines. For them, open data catalogues and portals provide fairly helpful search masks. However, the meta data provided by the catalogues generally does not give them with all the information they need to decide, whether they can use a data set retrieved from the search results. This cannot solely be reduced to meta data quality or a lack of standardised meta data, but also that what is included in common standards is not seen as sufficiently comprehensive and meaningful. Furthermore, they frequently voiced complaints about the scattered portal landscape that seems to them  still  barely  integrated. Thus,  they  have  to  conduct  similar  searches  in various  catalogues. Another  strategy  followed  by  several  issue-driven  users  is  to  harvest  data  from  governmental websites and subsequently request and negotiate terms of use or even to directly a specific data set they need which is not made available as open data and thus not listed in any catalogue. This latter strategy even yields success frequently. However, with both strategies – informal requesting and harvesting – questions remain and prove difficult to dissolve about the license and what the data can be used for.

On the other hand, data-driven users look for complex, comprehensive, and large datasets, largely without regard for the specific content of the data itself. Often, they do not have a specific purpose in mind. Their presumption is that an interesting data set can be put to a purposeful use. Currently, they feel little supported by the open data portals, since these rarely support search queries that meet data-driven user’s needs. Helpful for them seems rather algorithms that analyse the size of a dataset (columns, data points, whether a dataset contains string-data and numeric data or structured and unstructured data), update frequency and whether it is linked or non-linked data. However, questions remain how to identify relevant, sensitive datasets, because too many datasets are simply published since they are at hand, but of little use.

Overall, there seems to be a relative indifference to meta data standards and even meta data in general. This might be attributable to the scarcity of meta data, the low quality and the lack of content-related meaning – as opposed to formal characteristics of the data set – a lot of the available meat data convey. User interests, especially issue-driven user interests appear to point more strongly to  the  vocabulary  and  content  of  the  data,  feature  that  remain  largely  unharmonised  and undescribed as of today.

 

Stakeholders and their exemplary interests in data detection

Stakeholder Exemplary Interests
Issue-driven users Detect  data  sets  with  a  specific  content  or related to a certain topic
Data-driven users Detect large, complex data sets

 

References can be found here: OpenDataMonitor Project – Shared References.