Identify Library Datasets

Overview

Teaching: 30 min
Exercises: 0 min

Questions

What library-related data can you publish?

Objectives

Identify library datasets to publish openly.

Identifying Library Datasets

Does your open data project match community needs? A quiz for finding a community-driven focus from the Sunlight Foundation.

Quote from Ayre, L., & Craner, J. (2017). Open Data: What It Is and Why You Should Care. Public Library Quarterly, 36(2), 173-184.

Your library also has data that would be useful to publish including as follows: (1) Statistics about visitors, computer usage, materials borrowed, etc., which illustrate the value of the services you provide (see Figure 1); (2) Budget data that can be accessed by journalists, academics, good-government groups, and citizens interested in how taxpayer dollars are spent; (3) Information published on your existing website—such as event schedules or operating hours—when offered in a “machine-readable” format become available for “mash-ups” with with other data or local applications (see Figure 2).

See Appendix in Carruthers, A. (2014). Open data day hackathon 2014 at Edmonton Public Library. Partnership: The Canadian Journal of Library and Information Practice and Research, 9(2), 1-13. DOI: 10.21083/partnership.v9i2.3121.

Identifying Open Data to Publish

Adapted with permission from a document prepared by Kathleen Sullivan, open data literacy consultant, Washington State Library

This following text synthesizes recommendations for getting started with open government data publishing. The recommendations reflect reliable, commonly recommended sources (listed at the end), as well as interviews conducted by Kathleen Sullivan with other open data portal creators and hosts.

The Mantras
People, issues, data
As Sunlight Foundation hammers home in its “Does your open data project match community needs?” quiz, the data you choose to publish should connect to issues that people care about. Begin your publishing process by identifying the key people, the key issues, and the available data.

Keep it Simple
Many guides have some version of this advice, along with suggestions to do things in small, manageable batches and avoid perfectionism. Here’s Open Data Handbook’s take:

Keep it simple. Start out small, simple and fast. There is no requirement that every dataset must be made open right now. Starting out by opening up just one dataset, or even one part of a large dataset, is fine – of course, the more datasets you can open up the better.

As (Washington State’s OCIO Open Data Guy) Will Saunders has said often (advice repeated in sources like OKI’s Open Data Handbook), overpublishing and undermanaging is the usual scenario, and it leads to disillusionment and low participation. Start small with good-quality data that’s connected to what the community cares about. Keep talking to users, improve, expand as you can, repeat.

Figuring out who to talk to, what to look at Identify your users
Which people are those users in a user-centered data selection process? Below are parties suggested by various sources. How will you tap these users below?

The general public
Other libraries
Community groups with particular strategic goals
Local app developers and data activists
Outside groups interested in libraries: State and federal agencies, non-profits, researchers, grant funders

Identify the issues (which will lead you to the data people care about)
Helping people solve problems is the wave that carries any open data effort. The data published should help people do things they couldn’t do before – assess the budget impact of enacting a new policy such as fine elimination, understand hyper-local reading trends, analyze wifi speeds and usage.

The Sunlight Foundation quiz, Priority Spokane’s guidelines for choosing indicators and Socrata’s goals-people-data-outcomes examples provide good jumping-off points for your conversations with government officials and members of the community.

Assignment 1: User Stories + Use Cases

Throughout class we will read examples data curation work that are “user centered.” Two broad techniques for developing requirements of systems that are “human centered” are through use cases, and user stories. The following exercises may be helpful in the work you plan to do this quarter in building a protocol to satisfy a community of data users.

User Stories

User stories are a simple, but powerful way to capture requirements when doing systems analysis and development.

The level of detail, and the specific narrative style will vary greatly on the intended audience. For example, if we want to create a user story for accessing data stored in our repository we will need to think closely about different types of users, their skills, the ways they could access data, the different data they may want, etc.

This could become a very complicated user story that could be overwhelming to translate into actionable policy or features of our data repository.

Instead of trying to think through each complicated aspect of a user’s need we can try to atomize a user experience. In doing so we come up with simple story about the user that tells us the “who what and why” - so that we can figure out “how” …

Elements

The three most basic elements of a user story are:

Stakeholder - the type of stakeholder might be a user, curators, administrator, etc.
Feature - this could be a feature of a dataset, a system, or service
Goal - what is the stakeholder trying to achieve with the feature?

Templates A template for a user story goes something like this:

As a user I want some feature so that I can achieve some goal.

As a data user of the QDR I want to find all datasets about recycling for cities in Pacific Northwest So that I can reliably compare recycling program policies and outcomes for this region.

Some variations on this theme: More recent work in human centered design places the goals first - that is, it attempts to say what is trying to be done, before even saying who is trying to do it…

This is helpful for thinking about goals that may be shared across different user or stakeholder types.

This variation of a user story goes something like:

In order to achieve some business value As a stakeholder type I want some new system feature

Here is your task

Create user stories that help you understand the potential wants, needs, and desires of your designated community of users. For this first attempt - focus on user stories around data sharing. How might you satisfy these user stories? What choices do these stories help clarify or obscure about your protocol, and your designated community?

Here is an example that can help clarify the way that user stories are assembled for the protocol: https://rochellelundy.gitbooks.io/r3-recycling-repository/content/r3Recycling/protocolReport/userCommunity.html

Use Cases (Best Practices)

Another way to begin gathering information about users and their expectations for data are through use cases. As you read in week 5 for class the W3C has developed a template for conducting use cases for data on the web best practices. This exercise asks you to use that template to investigate best practices for data and repositories that are relevant to your quarter protocols.

The simple steps to follow:

Read the Data on the Web Use Cases & Requirements (as part of your weekly reading) https://www.w3.org/TR/dwbp-ucr
Identify a data collection published on the web that is relevant to your protocol.
Develop a Use Case following the DWBP outline.

Public Library Data

Key Points

Library Datasets

previous episode

Preparing and Publishing Library Data

next episode