Data Visualization: Statistics as Storytelling

Project Summary

A large and growing trove of patient, clinical, and organizational data is collected as a part of the “Help Desk” program at Durham’s Lincoln Community Health Center. Help Desk is a group of student volunteers who connect with patients over the phone and help them navigate to community resources (like food assistance programs, legal aid, or employment centers). Data-driven approaches to identifying service gaps, understanding the patient population, and uncovering unseen trends are important for improving patient health and advocating for the necessity of these resources. Disparities in food security, economic stability, education, neighborhood and physical environment, community and social context, and access to the healthcare system are crucial social determinants of health, which studies indicate account for nearly 70% of all health outcomes.

Themes and Categories

This project introduced some techniques and best practices of data visualization with a focus on clear, thoughtful, and impactful presentation—a crucial part of any project working with data. Discussions and activities advanced a perspective on data visualization as a form of visual storytelling and creating meaning, as opposed to just “making numbers pretty.” Participants were then asked to comment on the effectiveness of several created examples. In the second part of the Expedition, students were given data from Help Desk and were walked through the process of visualization, from conception to design, to answer a relevant research question of their own regarding Help Desk and the social determinants of health. Visualizations were created in Tableau, a common and useful software in health and public policy domains.

Guiding Questions

  • Why visualize? What is the point of visualization?
  • What are the implicit and explicit narratives of real-world data visualizations?
  • What contexts are important for understanding our Help Desk data?
  • What are necessary, appropriate, and responsible questions to ask given our data?
  • What type of visualization would best convey the information we have (i.e. bar graph, histogram, scatterplot)?

The Dataset

Data were obtained from PRAPARE screenings conducted by case managers at Lincoln as well as from Help Desk volunteers speaking with patients over the phone. A variety of questions about patient needs and demographics, call outreach and outcomes, and community-based organization service and utility can be investigated. Participants were given a dataset containing 700 de-identified patient records with 1,122 variables, the number of variables slightly pared down from the original dataset. The full codebook was included for reference.

Lab Sessions

In the first session, participants were introduced to the fundamentals of data visualization as storytelling and sense-making. They watched Hans Rosling’s “200 Countries, 200 Years, 4 Minutes” video and discussed the implicit and explicit narratives told in visualization. Then, after an overview of HIPAA and data privacy by Connor, Tyler went through a live demonstration of Tableau, creating a basic bar graph, and led the participants through a critique of five visualizations created ahead of time using Help Desk data. As homework, participants were asked to download Tableau, load the dataset, and familiarize themselves with the interface and codebook.

In the second session, participants were split into two groups, Connor and Tyler guiding one each through the process of data visualization from start to finish.

Each group focused on a different question:

  • Which food-related CBOs were the most referred to? Which food-related CBOs were the most “successful”?
  • What demographic factors are most associated with self-reported food need?

From here, participants were asked to discuss which variables would be most pertinent to answering their respective question, then which type of visualization would convey information the best, and finally whether there were other techniques (color, order, text) that could be useful.

The groups reconvened to share their preliminary visualizations and give each other feedback. They are presented here, without some of the necessary context, labels, explanations needed to make them interpretable, due to the time constraints.

Food graph

food graph


Data Expedition Lesson Plan.docx

Data Expedition Slides.pptx




Related Projects

This data expedition focused on the mechanisms animals use to orient using environmental stimuli, the methods that scientists use to test hypotheses about orientation, and the statistical methods used with circular orientation data. Students collected their own data set during the class period, performed hypothesis testing on their data using circular statistics in R, and aggregated their data to formally test the hypothesis that isopods orient with light using an RShiny online application.

This exercise served as a capstone to a series of four class sessions on orientation and navigation, where students read primary scientific literature that used circular statistics in their methods. This data exercise was used to give students the opportunity to collect their own data, discover why linear statistics wouldn’t be sufficient to analyze them, and then implement their own analysis. The goal of this course was to give students a better understanding of circular statistics, with hands-on application in forming and testing a hypothesis.

In this two-day, virtual data expedition project, students were introduced to the APIM in the context of stress proliferation, linked lives, the spousal relationship, and mental and physical health outcomes.

Stress proliferation is a concept within the stress process paradigm that explains how one person’s stressors can influence others (Thoits 2010). Combining this with the life course principle of linked lives explains that because people are embedded in social networks, stress not only can impact the individual but can also proliferate to people close to them (Elder Jr, Shanahan and Jennings 2015). For example, one spouse’s chronic health condition may lead to stress-provoking strain in the marital relationship, eventually spilling over to affect the other spouse’s mental health. Additionally, because partners share an environment, experiences, and resources (e.g., money and information), as well as exert social control over each other, they can monitor and influence each other’s health and health behaviors. This often leads to health concordance within couples; in other words, because individuals within the couple influence each other’s health and well-being, their health tends to become more similar or more alike (Kiecolt-Glaser and Wilson 2017, Polenick, Renn and Birditt 2018). Thus, a spouse’s current health condition may influence their partner’s future health and spouses may contemporaneously exhibit similar health conditions or behaviors.

However, how spouses influence each other may be patterned by the gender of the spouse with the health condition or exhibiting the health behaviors. Recent evidence suggests that a wife’s health condition may have little influence on her husband’s future health conditions, but that a husband’s health condition will most likely influence his wife’s future health (Kiecolt-Glaser and Wilson 2017).

Stats/Sociology major Mitchelle Mojekwu joined Neuroscience majors Kassie Hamilton and Zineb Jaidi in a ten-week exploration of data relevant to an upcoming public school zone redistricting in Durham County. Using information acquired from the General Social Survey and the US Census, the team applied modern mathematical and statistical methods for generating proposed redistricting plans, with the aim of providing decision-makers with information they can use to produce school districts that are equitable and reflective of the Durham County student population.

View the team's project poster here

Watch the team's final presentation on Zoom:


Faculty Lead: Greg Herschlag

Project Manager: Bernard Coles