Social Network Analysis Basics: Case Studies from Game of Thrones and the National Hockey League

Project Summary

The goal of this Data Expedition was to introduce students to the exploration of social networks data using R. Students learned to load and plot a social network in R and then perform some basic analyses on two different networks: Hockey Fights in the National Hockey League in 2018-2019 and characters in Game of Thrones Season 3. Students used social network analysis to better understand who is connected to whom, how frequently they interact, and how they are interacting.

Themes and Categories
Year
2019
Contact
Claire Le Barbenchon and Liann Tucker
claire.lebarbenchon@duke.edu and liann.tucker@duke.edu

Graduate Students: Claire Le Barbenchon and Liann Tucker

Faculty: Craig Rawlings

Course: Soc 110D: Sociological Inquiry

Overview

First, we introduced students to Rstudio and the basics of R programming. Next, we covered loading data into RStudio and transforming the Hockey Fights network from a “hairball” network to a legible and labelled network of fights between players. Students then re-organized the plot to show fights between teams and identify teams more prone to fighting than others.

Next, students dug into the Game of Thrones Season 3 network and created plots that show the links between houses in the show. They explored some of the different descriptive and analytical tools commonly used in social network analysis: different forms of centrality to understand popularity and influence.

Guiding Questions

Social network analysis can help explain social dynamics at both the group and individual level. For example, certain network structures are better for spreading information, others for maintaining power hierarchies.

This project guides students through plotting a network in an elegant, intuitive way, which helps to answer descriptive questions like:

  • What does this network look like?
  • How is it organized?
  • Who are the members and how do you think they are connected?

Next, we explore structural features in the network and will be guided by the following questions :

  1. Who are the most “popular” characters?
  2. Which characters and houses have the most influence?

In order to do so, students were introduced to ways of calculating different centrality types using igraph (degree centrality and betweenness centrality).

Ultimately, learned how to compare and contrast two different network structures in order to learn about social phenomena and be given the basic tools to do so.

The plots below show the hockey fights network organized by players and teams. The Game of Thrones network is shown both mapped by character, and with nodes scaled to character degree centrality.

Hockey fights

Game of Thrones

Data

Dataset #1: Game of Thrones Season 3

Our first dataset was collected by Andrew Beveridge of Macalester University for an exploratory network project. It is based on Season 3 of the HBO series “Game of Thrones.” The data contains an edge-list (file of interactions between characters) and a node list (a second data file that contains each character mentioned) and added “House” variables to each observation. We end up with two data files we can teach students how to turn into a network. The file of characters contains 123 observations for 3 variables, and the file of ties between the characters contains 500 observations for 3 variables. Ties are created by characters being mentioned together, speaking about the other, in a scene together, in the same stage direction, or speaking one after the other.

Dataset #2: Hockey Fights (2018-2019)

Our second dataset includes all of the NHL players for the 2018-2019 season. The data was collected and compiled by Liann Tucker (co-writer of this proposal) for her personal network analysis interests, aggression and roles in networks. The data were collected from the National Hockey League official website and hockeyfights.com. The hockey fights website includes all the fights in each season, though it was not set up as a useable dataset, so one was created. These data are separated into two files, players (nodes) and their attributes being the first file and fights (edges) in the second file. The planned exercises include teaching students how to put these two files together to make a network. Since the players are separated into teams, we can create a bipartite network (2 level) out of this dataset. There are 786 observations in the nodes file and 226 observation in the edges file. The nodes file contains information on players such as their team, when they were acquired by their team, and position. The edges file contains each observed fight between two players, and who won the fight (according to voting on the hockey fights website). This data is open access, since it was collected from two public websites and created by a co-writer of this proposal.

Bibliography

A. Beveridge & Shan, J. (2016). Network of Thrones: A Song of Math and Westeros (Season 3), [got-s3- edges.csv, got-s3-nodes.csv] Retrieved from https://github.com/mathbeveridge/gameofthrones/tree/master/ data

Tucker, Liann (2019). Hockey Fights Network 2018-2019,[nhl_nodes.csv, nhl_edges.csv] Retrieved from https://www.hockeyfights.com/fightlog/1/reg2019/8

Related Projects

A large and growing trove of patient, clinical, and organizational data is collected as a part of the “Help Desk” program at Durham’s Lincoln Community Health Center. Help Desk is a group of student volunteers who connect with patients over the phone and help them navigate to community resources (like food assistance programs, legal aid, or employment centers). Data-driven approaches to identifying service gaps, understanding the patient population, and uncovering unseen trends are important for improving patient health and advocating for the necessity of these resources. Disparities in food security, economic stability, education, neighborhood and physical environment, community and social context, and access to the healthcare system are crucial social determinants of health, which studies indicate account for nearly 70% of all health outcomes.

We led a 75-minute class session for the Marine Mammals course at the Duke University Marine Lab that introduced students to strengths and challenges of using aerial imagery to survey wildlife populations, and the growing use of machine learning to address these "big data" tasks.

Most phenomena that data scientists seek to analyze are either spatially or temporally correlated. Examples of spatial and temporal correlation include political elections, contaminant transfer, disease spread, housing market, and the weather. A question of interest is how to incorporate the spatial correlation information into modeling such phenomena.

 

In this project, we focus on the impact of environmental attributes (such as greenness, tree cover, temperature, etc.) along with other socio-demographics and home characteristics on housing prices by developing a model that takes into account the spatial autocorrelation of the response variable. To this aim, we introduce a test to diagnose spatial autocorrelation and explain how to integrate spatial autocorrelation into a regression model

 

 

In this data exploration, students are provided with data collected from remote sensing, census, and Zillow sources. Students are tasked with conducting a regression analysis of real-estate estimates against environmental amenities and other control variables which may or may not include the spatial autocorrelation information.