Social Network Analysis Basics: Case Studies from Game of Thrones and the National Hockey League

Project Summary

The goal of this Data Expedition was to introduce students to the exploration of social networks data using R. Students learned to load and plot a social network in R and then perform some basic analyses on two different networks: Hockey Fights in the National Hockey League in 2018-2019 and characters in Game of Thrones Season 3. Students used social network analysis to better understand who is connected to whom, how frequently they interact, and how they are interacting.

Themes and Categories
Year
2019
Contact
Claire Le Barbenchon and Liann Tucker
claire.lebarbenchon@duke.edu and liann.tucker@duke.edu

Graduate Students: Claire Le Barbenchon and Liann Tucker

Faculty: Craig Rawlings

Course: Soc 110D: Sociological Inquiry

Overview

First, we introduced students to Rstudio and the basics of R programming. Next, we covered loading data into RStudio and transforming the Hockey Fights network from a “hairball” network to a legible and labelled network of fights between players. Students then re-organized the plot to show fights between teams and identify teams more prone to fighting than others.

Next, students dug into the Game of Thrones Season 3 network and created plots that show the links between houses in the show. They explored some of the different descriptive and analytical tools commonly used in social network analysis: different forms of centrality to understand popularity and influence.

Guiding Questions

Social network analysis can help explain social dynamics at both the group and individual level. For example, certain network structures are better for spreading information, others for maintaining power hierarchies.

This project guides students through plotting a network in an elegant, intuitive way, which helps to answer descriptive questions like:

  • What does this network look like?
  • How is it organized?
  • Who are the members and how do you think they are connected?

Next, we explore structural features in the network and will be guided by the following questions :

  1. Who are the most “popular” characters?
  2. Which characters and houses have the most influence?

In order to do so, students were introduced to ways of calculating different centrality types using igraph (degree centrality and betweenness centrality).

Ultimately, learned how to compare and contrast two different network structures in order to learn about social phenomena and be given the basic tools to do so.

The plots below show the hockey fights network organized by players and teams. The Game of Thrones network is shown both mapped by character, and with nodes scaled to character degree centrality.

Hockey fights

Game of Thrones

Data

Dataset #1: Game of Thrones Season 3

Our first dataset was collected by Andrew Beveridge of Macalester University for an exploratory network project. It is based on Season 3 of the HBO series “Game of Thrones.” The data contains an edge-list (file of interactions between characters) and a node list (a second data file that contains each character mentioned) and added “House” variables to each observation. We end up with two data files we can teach students how to turn into a network. The file of characters contains 123 observations for 3 variables, and the file of ties between the characters contains 500 observations for 3 variables. Ties are created by characters being mentioned together, speaking about the other, in a scene together, in the same stage direction, or speaking one after the other.

Dataset #2: Hockey Fights (2018-2019)

Our second dataset includes all of the NHL players for the 2018-2019 season. The data was collected and compiled by Liann Tucker (co-writer of this proposal) for her personal network analysis interests, aggression and roles in networks. The data were collected from the National Hockey League official website and hockeyfights.com. The hockey fights website includes all the fights in each season, though it was not set up as a useable dataset, so one was created. These data are separated into two files, players (nodes) and their attributes being the first file and fights (edges) in the second file. The planned exercises include teaching students how to put these two files together to make a network. Since the players are separated into teams, we can create a bipartite network (2 level) out of this dataset. There are 786 observations in the nodes file and 226 observation in the edges file. The nodes file contains information on players such as their team, when they were acquired by their team, and position. The edges file contains each observed fight between two players, and who won the fight (according to voting on the hockey fights website). This data is open access, since it was collected from two public websites and created by a co-writer of this proposal.

Bibliography

A. Beveridge & Shan, J. (2016). Network of Thrones: A Song of Math and Westeros (Season 3), [got-s3- edges.csv, got-s3-nodes.csv] Retrieved from https://github.com/mathbeveridge/gameofthrones/tree/master/ data

Tucker, Liann (2019). Hockey Fights Network 2018-2019,[nhl_nodes.csv, nhl_edges.csv] Retrieved from https://www.hockeyfights.com/fightlog/1/reg2019/8

Related Projects

This data expedition focused on the mechanisms animals use to orient using environmental stimuli, the methods that scientists use to test hypotheses about orientation, and the statistical methods used with circular orientation data. Students collected their own data set during the class period, performed hypothesis testing on their data using circular statistics in R, and aggregated their data to formally test the hypothesis that isopods orient with light using an RShiny online application.

This exercise served as a capstone to a series of four class sessions on orientation and navigation, where students read primary scientific literature that used circular statistics in their methods. This data exercise was used to give students the opportunity to collect their own data, discover why linear statistics wouldn’t be sufficient to analyze them, and then implement their own analysis. The goal of this course was to give students a better understanding of circular statistics, with hands-on application in forming and testing a hypothesis.

In this two-day, virtual data expedition project, students were introduced to the APIM in the context of stress proliferation, linked lives, the spousal relationship, and mental and physical health outcomes.

Stress proliferation is a concept within the stress process paradigm that explains how one person’s stressors can influence others (Thoits 2010). Combining this with the life course principle of linked lives explains that because people are embedded in social networks, stress not only can impact the individual but can also proliferate to people close to them (Elder Jr, Shanahan and Jennings 2015). For example, one spouse’s chronic health condition may lead to stress-provoking strain in the marital relationship, eventually spilling over to affect the other spouse’s mental health. Additionally, because partners share an environment, experiences, and resources (e.g., money and information), as well as exert social control over each other, they can monitor and influence each other’s health and health behaviors. This often leads to health concordance within couples; in other words, because individuals within the couple influence each other’s health and well-being, their health tends to become more similar or more alike (Kiecolt-Glaser and Wilson 2017, Polenick, Renn and Birditt 2018). Thus, a spouse’s current health condition may influence their partner’s future health and spouses may contemporaneously exhibit similar health conditions or behaviors.

However, how spouses influence each other may be patterned by the gender of the spouse with the health condition or exhibiting the health behaviors. Recent evidence suggests that a wife’s health condition may have little influence on her husband’s future health conditions, but that a husband’s health condition will most likely influence his wife’s future health (Kiecolt-Glaser and Wilson 2017).

Fluid mechanics is the study of how fluids (e.g., air, water) move and the forces on them. Scientists and engineers have developed mathematical equations to model the motions of fluid and inertial particles. However, these equations are often computationally expensive, meaning they take a long time for the computer to solve.

 

To reduce the computation time, we can use machine learning techniques to develop statistical models of fluid behavior. Statistical models do not actually represent the physics of fluids; rather, they learn trends and relationships from the results of previous simulation experiments. Statistical models allow us to leverage the findings of long, expensive simulations to obtain results in a fraction of the time. 

 

In this project, we provide students with the results of direct numerical simulations (DNS), which took many weeks for the computer to solve. We ask students to use machine learning techniques to develop statistical models of the results of the DNS.