Social Network Analysis Basics: Case Studies from Game of Thrones and the National Hockey League

Project Summary

The goal of this Data Expedition was to introduce students to the exploration of social networks data using R. Students learned to load and plot a social network in R and then perform some basic analyses on two different networks: Hockey Fights in the National Hockey League in 2018-2019 and characters in Game of Thrones Season 3. Students used social network analysis to better understand who is connected to whom, how frequently they interact, and how they are interacting.

Themes and Categories
Claire Le Barbenchon and Liann Tucker and

Graduate Students: Claire Le Barbenchon and Liann Tucker

Faculty: Craig Rawlings

Course: Soc 110D: Sociological Inquiry


First, we introduced students to Rstudio and the basics of R programming. Next, we covered loading data into RStudio and transforming the Hockey Fights network from a “hairball” network to a legible and labelled network of fights between players. Students then re-organized the plot to show fights between teams and identify teams more prone to fighting than others.

Next, students dug into the Game of Thrones Season 3 network and created plots that show the links between houses in the show. They explored some of the different descriptive and analytical tools commonly used in social network analysis: different forms of centrality to understand popularity and influence.

Guiding Questions

Social network analysis can help explain social dynamics at both the group and individual level. For example, certain network structures are better for spreading information, others for maintaining power hierarchies.

This project guides students through plotting a network in an elegant, intuitive way, which helps to answer descriptive questions like:

  • What does this network look like?
  • How is it organized?
  • Who are the members and how do you think they are connected?

Next, we explore structural features in the network and will be guided by the following questions :

  1. Who are the most “popular” characters?
  2. Which characters and houses have the most influence?

In order to do so, students were introduced to ways of calculating different centrality types using igraph (degree centrality and betweenness centrality).

Ultimately, learned how to compare and contrast two different network structures in order to learn about social phenomena and be given the basic tools to do so.

The plots below show the hockey fights network organized by players and teams. The Game of Thrones network is shown both mapped by character, and with nodes scaled to character degree centrality.

Hockey fights

Game of Thrones


Dataset #1: Game of Thrones Season 3

Our first dataset was collected by Andrew Beveridge of Macalester University for an exploratory network project. It is based on Season 3 of the HBO series “Game of Thrones.” The data contains an edge-list (file of interactions between characters) and a node list (a second data file that contains each character mentioned) and added “House” variables to each observation. We end up with two data files we can teach students how to turn into a network. The file of characters contains 123 observations for 3 variables, and the file of ties between the characters contains 500 observations for 3 variables. Ties are created by characters being mentioned together, speaking about the other, in a scene together, in the same stage direction, or speaking one after the other.

Dataset #2: Hockey Fights (2018-2019)

Our second dataset includes all of the NHL players for the 2018-2019 season. The data was collected and compiled by Liann Tucker (co-writer of this proposal) for her personal network analysis interests, aggression and roles in networks. The data were collected from the National Hockey League official website and The hockey fights website includes all the fights in each season, though it was not set up as a useable dataset, so one was created. These data are separated into two files, players (nodes) and their attributes being the first file and fights (edges) in the second file. The planned exercises include teaching students how to put these two files together to make a network. Since the players are separated into teams, we can create a bipartite network (2 level) out of this dataset. There are 786 observations in the nodes file and 226 observation in the edges file. The nodes file contains information on players such as their team, when they were acquired by their team, and position. The edges file contains each observed fight between two players, and who won the fight (according to voting on the hockey fights website). This data is open access, since it was collected from two public websites and created by a co-writer of this proposal.


A. Beveridge & Shan, J. (2016). Network of Thrones: A Song of Math and Westeros (Season 3), [got-s3- edges.csv, got-s3-nodes.csv] Retrieved from data

Tucker, Liann (2019). Hockey Fights Network 2018-2019,[nhl_nodes.csv, nhl_edges.csv] Retrieved from

Related Projects

This data expeditions module used three full course sessions to introduce undergraduate hydrology students with minimal programming background to:

  • Public water data (water quantity and chemistry)

  • Spatial analysis of water data

  • 2 core, spatial datasets produced by the USGS that enable spatial analysis

  • The programming language R

  • R based tools for water data

  • Spatial analysis and maps in R

Exposure to local pathogens is a significant selective pressure on the human genome: the strongest selective forces identified in modern human populations are for mutations that confer increased resistance to malaria infection. Understanding how human genetic variation impacts susceptibility to pathogens can reveal important aspects of disease biology and reveal novel treatment targets. By using genome-wide association of infection-related cellular traits, we can connect human genetic variation to disease susceptibility in a controlled laboratory environment. Identification of the variants, genes, and cellular pathways involved in infectious disease pathogenesis can inform host-directed therapeutics, clinically effective risk stratification, and epidemiological prediction. This data expedition explores the effect of host genetic variation on chemokine response to Chlamydia infection.

How does human habitation relate to patterns in the natural environment? How do species respond to the presence of, and changes in, habitation? In this Data Expedition, students make use of public datasets from the Census and the Global Biodiversity Information Facility to examine relationships between individual species and human settlements. Students develop introductory skills in spatial data manipulation and visualization in R, exposure to powerful datasets and tools, and critical thinking skills in assessing dataset quality and bias.