The Social Data Ecosystem in Public Health
A decade ago, public health media campaigns measured their impact by evaluating the relationship between message exposure and the targeted behavior. Given the advent and wide diffusion of social media, measuring simple message exposure doesn’t begin to capture the information environment influencing people’s health behavior. Not only are people passively exposed to health-related messages, they actively search for and interactively share messages, news, product promotions, and other health-related content via multiple social media. In order to capture, measure, and evaluate this new media paradigm, we must understand the social data ecosystem—from source to application. This panel opens a lively discussion among leaders from key communities in the social data universe. Three case studies from tobacco control research will provide the context for describing the evolution of the social data fire hose, its management, and its application to address public health problems.
Additional Supporting Materials
- What is a fire hose of social data? Charles Ince will discuss the evolution of the social data ecosystem, from the diffusion of social platforms, the massive data they generate, the recognition of value, and the engineering challenges of making these data available in a usable format. Managing the flow is key to using social data effectively. Sherry Emery, will describe examples of when the fire hose makes sense and when API data are sufficient to answer questions about media campaigns.
- What’s inside the API black box? The Twitter Streaming API offers a 1% sample that can provide great insight. However, the validity of conclusions drawn from a sample depends critically upon how it’s drawn. Documentation of social data APIs does not reveal in detail their sampling methodology. Hyun Suk Kim will discuss analyses of tobacco-related key words used to compare the Twitter fire hose to the API over four months. This work informs decisions about using the fire hose vs. the API.
- What do I do with all this data? Regardless of whether you use a fire hose or the API, you still have an unprecedented amount of data to manage. Sherry Emery and Stu Shulman will discuss the search for tobacco-relevant tweets within the universe of tweets about smoking ribs, smoking weed, smoking squares, or smoking hot girls. A combination of human and machine coding and analysis provides a model of a rigorous and replicable methodology to identify and analyze relevant content.
- How do you analyze something so organic? Prospective, methodical design is the stock and trade of academic and policy research. Social media, however, are inherently spontaneous and creative. Without the ability to go back and learn as you go, it is impossible to really capture the conversation on social media. The social conversation about California’s defeated prop 29 illustrates the need for a flexible data collection process that captures spontaneous conversations/hashtags.
- Is it cool to smoke a square? Social conversation relies on slang and abbreviations. In order to make valid policy inferences, it is critical to understand this informal and constantly evolving language used on social media. Who knew ‘squares’ referred to cigarettes? A lot of people, actually, but not the academic researchers looking for messages about smoking. Sherry Emery will describe how iterative content analyses were used to unlock the language of smoking in the urban community .
- Charles Ince Gnip
- Stuart Shulman Texifter
- Sherry Emery Health Media Collaboratory at the Institute for Health Research and Policy
- Hyun Suk Kim Annenberg School for Communication, University of Pennsylvania
Eman Aly Health Media Collaboratory