Social Network Analysis Basics: Case Studies from Game of Thrones and the National Hockey League

Project Summary

The goal of this Data Expedition was to introduce students to the exploration of social networks data using R. Students learned to load and plot a social network in R and then perform some basic analyses on two different networks: Hockey Fights in the National Hockey League in 2018-2019 and characters in Game of Thrones Season 3. Students used social network analysis to better understand who is connected to whom, how frequently they interact, and how they are interacting.

Themes and Categories
Year
2019
Contact
Claire Le Barbenchon and Liann Tucker
claire.lebarbenchon@duke.edu and liann.tucker@duke.edu

Graduate Students: Claire Le Barbenchon and Liann Tucker

Faculty: Craig Rawlings

Course: Soc 110D: Sociological Inquiry

Overview

First, we introduced students to Rstudio and the basics of R programming. Next, we covered loading data into RStudio and transforming the Hockey Fights network from a “hairball” network to a legible and labelled network of fights between players. Students then re-organized the plot to show fights between teams and identify teams more prone to fighting than others.

Next, students dug into the Game of Thrones Season 3 network and created plots that show the links between houses in the show. They explored some of the different descriptive and analytical tools commonly used in social network analysis: different forms of centrality to understand popularity and influence.

Guiding Questions

Social network analysis can help explain social dynamics at both the group and individual level. For example, certain network structures are better for spreading information, others for maintaining power hierarchies.

This project guides students through plotting a network in an elegant, intuitive way, which helps to answer descriptive questions like:

  • What does this network look like?
  • How is it organized?
  • Who are the members and how do you think they are connected?

Next, we explore structural features in the network and will be guided by the following questions :

  1. Who are the most “popular” characters?
  2. Which characters and houses have the most influence?

In order to do so, students were introduced to ways of calculating different centrality types using igraph (degree centrality and betweenness centrality).

Ultimately, learned how to compare and contrast two different network structures in order to learn about social phenomena and be given the basic tools to do so.

The plots below show the hockey fights network organized by players and teams. The Game of Thrones network is shown both mapped by character, and with nodes scaled to character degree centrality.

Hockey fights

Game of Thrones

Data

Dataset #1: Game of Thrones Season 3

Our first dataset was collected by Andrew Beveridge of Macalester University for an exploratory network project. It is based on Season 3 of the HBO series “Game of Thrones.” The data contains an edge-list (file of interactions between characters) and a node list (a second data file that contains each character mentioned) and added “House” variables to each observation. We end up with two data files we can teach students how to turn into a network. The file of characters contains 123 observations for 3 variables, and the file of ties between the characters contains 500 observations for 3 variables. Ties are created by characters being mentioned together, speaking about the other, in a scene together, in the same stage direction, or speaking one after the other.

Dataset #2: Hockey Fights (2018-2019)

Our second dataset includes all of the NHL players for the 2018-2019 season. The data was collected and compiled by Liann Tucker (co-writer of this proposal) for her personal network analysis interests, aggression and roles in networks. The data were collected from the National Hockey League official website and hockeyfights.com. The hockey fights website includes all the fights in each season, though it was not set up as a useable dataset, so one was created. These data are separated into two files, players (nodes) and their attributes being the first file and fights (edges) in the second file. The planned exercises include teaching students how to put these two files together to make a network. Since the players are separated into teams, we can create a bipartite network (2 level) out of this dataset. There are 786 observations in the nodes file and 226 observation in the edges file. The nodes file contains information on players such as their team, when they were acquired by their team, and position. The edges file contains each observed fight between two players, and who won the fight (according to voting on the hockey fights website). This data is open access, since it was collected from two public websites and created by a co-writer of this proposal.

Bibliography

A. Beveridge & Shan, J. (2016). Network of Thrones: A Song of Math and Westeros (Season 3), [got-s3- edges.csv, got-s3-nodes.csv] Retrieved from https://github.com/mathbeveridge/gameofthrones/tree/master/ data

Tucker, Liann (2019). Hockey Fights Network 2018-2019,[nhl_nodes.csv, nhl_edges.csv] Retrieved from https://www.hockeyfights.com/fightlog/1/reg2019/8

Related Projects

The data that students see in their statistics courses are often constrained to numeric and tabular data. However, there is an exciting field of data science and statistics known as text analysis. This expedition introduces students to the concept of treating text as data frames of words, and demonstrates how to perform basic analyses on bodies of text using R. Tweets of four Democratic candidates for the 2020 Primary are used as data, and demonstrated text analysis techniques in the expedition include comparisons of word frequencies, log-odds ratios for word usage, and pairwise word correlations.

Fluid mechanics is the study of how fluids (e.g., air, water) move and the forces on them. Scientists and engineers have developed mathematical equations to model the motions of fluid and inertial particles. However, these equations are often computationally expensive, meaning they take a long time for the computer to solve. 

To reduce the computation time, we can use machine learning techniques to develop statistical models of fluid behavior. Statistical models do not actually represent the physics of fluids; rather, they learn trends and relationships from the results of previous simulations. Statistical models allow us to leverage the findings of long, expensive simulations to obtain results in a fraction of the time.

In this project, we provide students with the results of direct numerical simulations (DNS), which took many weeks for the computer to solve. We ask students to use machine learning techniques to develop statistical models of the results of the DNS.

This project allowed students in BIOL 268D (Mechanisms of Animal Behavior) to explore the relationship between estrogen, female sexual swellings, and male mating success in wild baboons using data from the Amboseli Baboon Research Project. Students learned how to use the popular R packages dplyr and ggplot2 to calculate descriptive statistics about the dataset and perform data visualization to understand and explore patterns in animal mating behavior and sexual signals.