Research

Research projects at Rhodes iiD focus on building connections. We encourage crosspollination of ideas across disciplines, and to develop new forms of collaboration that will advance research and education across the full spectrum of disciplines at Duke. The topics below show areas of research focus at Rhodes iiD. See all of our research.

Social and environmental contexts are increasingly recognized as factors that impact health outcomes of patients. This team will have the opportunity to collaborate directly with clinicians and medical data in a real-world setting. They will examine the association between social determinants with risk prediction for hospital admissions, and to assess whether social determinants bias that risk in a systematic way. Applied methods will include machine learning, risk prediction, and assessment of bias. This Data+ project is sponsored by the Forge, Duke's center for actionable data science.

Project Leads: Shelly Rusincovitch, Ricardo Henao, Azalea Kim

Project Manager: Austin Talbot

Aaron Chai (Computer Sciece, Math) and Victoria Worsham (Economics, Math) spent ten weeks building tools to understand characteristics of successful oil and gas licenses in the North Sea. The team used data-scraping, merging, and OCR method to create a dataset containing license information and work obligations, and they also produced ArcGIS visualizations of license and well locations. They had the chance to consult frequently with analytics professionals at ExxonMobil.

Click here to read the Executive Summary

 

Project Lead: Kyle Bradbury

Project Manager: Artem Streltsov

Yueru Li (Math) and Jiacheng Fan (Economics, Finance) spent ten weeks investigating abnormal behavior by companies bidding for oil and gas rights in the Gulf of Mexico. Working with data provided by the Bureau of Ocean Energy Management and ExxonMobil, the team used outlier detection methods to automate the flagging of abnormal behavior, and then used statistical methods to examine various factors that might predict such behavior. They had the chance to consult frequently with analytics professionals at ExxonMobil.

 

Click here to read the Executive Summary

 

Project Lead: Kyle Bradbury

Project Manager: Hyeongyul Roh

Team A: Video data extraction

Alexander Bendeck (Computer Science, Statistics) and Niyaz Nurbhasha (Economics) spent ten weeks building tools to extract player and ball movement in basketball games. Using freely available broadcast-angle video footage which required much cleaning and pre-processing, the team used OpenPose software and employed neural network methodologies. Their pipeline fed into the predictive models of Team C.

Click here to read the Executive Summary

 

Team B: Modeling basketball data: offense

Anshul Shah (Computer Science, Statistics), Jack Lichtenstein (Statistics), and Will Schmidt (Mechanical Engineering) spent ten weeks building tools to analyze offensive play in basketball. Using 2014-5 Duke Men’s Basketball player-tracking data provided by SportVU, the team constructed statistical models that explored the relationship between different metrics of offensive productivity, and also used computational geometry methods to analyze the off-ball “gravity” of an offensive player.

Click here to read the Executive Summary

 

Team C: Modeling basketball data: defense

Lukengu Tshiteya (Statistics), Wenge Xie (ECE), and Joe Zuo (Computer Science, Statistics) spent ten weeks building tools to predict player movement in basketball games. Using SportVU data, including some pre-processed by Team A, the team built predictive RNN models that distinguish between 6 typical movement types, and created interactive visualizations of their findings in R Shiny.

Click here to read the Executive Summary

 

Team D: Visualizing basketball data

Shixing Cao (ECE) and Jackson Hubbard (Computer Science, Statistics) spent ten weeks building visualizations to help analyze basketball games. Using player tracking data from Duke basketball games, the team created visualizations of gameflow, networks of points and assists, and integrated all of their tools into an R Shiny app.

Click here to read the Executive Summary

 

Faculty Leads: Alexander Volfovsky, James Moody, Katherine Heller

Project Managers: Fan Bu, Heather Matthews, Harsh Parikh, Joe Zuo

Yanchen Ou (Computer Science) and Jiwoo Song (Chemistry, Mechanical Engineering) spent ten weeks building tools to assist in the analysis of smart meter data. Working with a large dataset of transformer and household data from the Kyrgyz Republic, the team built a data preprocessing pipeline and then used unsupervised machine-learning techniques to assess energy quality and construct typical user profiles.

 

Click here to read the Executive Summary

 

Faculty Lead: Robyn Meeks

Project Manager: Bernard Coles

Bernice Meja (Philosophy, Physics), Jessica Yang (Computer Science, ECE), and Tracey Chen (Computer Science, Mechanical Engineering) spent ten weeks building methods for Duke’s Office of Information Technology (OIT) to better understand information arising from “smart” (IoT) devices on campus. Working with data provided by an IoT testbed set up by OIT professionals, the team used a mixture of supervised and unsupervised machine-learning techniques and built a prototype device classifier.

 

Click here ot read the Executive Summary

 

Project Lead: Will Brockselsby

Interested in understanding the types of attacks targeting Duke and other universities?  Led by OIT and the IT Security Office, students will learn to analyze threat intelligence data to identify trends and patterns of attacks.  Duke blocks an average of 1.5 billion malicious connection attempts/day and is working with other universities to share the attack data.  One untapped area is research into the types of attacks and learning how universities are targeted.  Students will collaborate alongside the security and IT professionals in analyzing the data and with the intent to discern patterns.

Project Lead: Jesse Bowling

Project Manager: Susan Jacobs

Katelyn Chang (Computer Science, Math) and Haynes Lynch (Environmental Science, Policy) spent ten weeks building tools to analyze and visualize geospatial and remote sensing data arising from the Alligator River National Wildlife Refuge (ARNWR). The team produced interactive maps of physical characteristics that were tailored to specific refuge management professionals, and also built classifiers for vegetation detection in LandSat imagery.

 

Click here to read the Executive Summary

 

Faculty Leads: Justin Wright, Emily Bernhardt

Project Manager: Emily Ury

Dennis Harrsch, Jr. ( Computer Science ), Elizabeth Loschiavo ( Sociology ), and Zhixue (Mary) Wang ( Computer Science, Statistics ) spent ten weeks improving upon the team’s web platform that allows users to examine contraceptive use in low and middle income (LMIC) countries collected by the Demographic and Health Survey (DHS) contraceptive calendar. The team improved load times, data visualization latency, and increased the number of country surveys available in the platform from 3 to 55. The team also created a new app that allows users to explore the results of machine learning using this big data set.

This project will continue into the academic year via Bass Connections where student teams will refine the machine learning model results and explore the question of whether and how policymakers can use these tools to improve family planning in LMIC settings.

 

Click here to view the Executive Summary

 

Faculty Lead: Megan Huchko

Project Manager: Amy Finnegan

Nathaniel Choe (ECE) and Mashal Ali (Neuroscience) spent ten weeks developing machine-learning tools to analyze urodynamic detrusor pressure data of pediatric spina bifida patients from the Duke University Hospital. The team built a pipeline that went from raw time series data to signal analysis to dimension reduction to classification, and has the potential to assist in clinician diagnosis.

 

Click here to read the Executive Summary

 

Faculty Leads: Wilkins Aquino, Jonathan Routh

Project Manager: Zekun Cao

Varun Nair (Economics, Physics), Paul Rhee (Computer Science), Jichen Yang (Computer Science, ECE), and Fanjie Kong (Computer Vision) spent ten weeks helping to adapt deep learning techniques to inform energy access decisions.

 

Click here to read the Executive Summary

 

Faculty Lead: Kyle Bradbury

Project Manager: Fanjie Kong

Yoav Kargon (Mechanical Engineering) and Tommy Lin (Chemistry, Computer Science) spent ten weeks working with data from the Water Quality Portal (WQP), a large national dataset of water quality measurements aggregated by the USGS and EPA. The team went all the way from raw data to the production of Pondr, an interactive and comprehensive tool built with R Shiny that permits users to investigate and visualize data coverage, values, and trends from the WQP.

 

Click here to read the Executive Summary

 

Faculty Lead: Jim Heffernan

Project Manager: Nick Bruns

Marco Gonazales Blancas (Civil Engineering) and Mengjie Xiu (Masters, BioStatistics) spent ten weeks building tools to help Duke reduce its energy footprint and achieve carbon neutrality by 2024. The team processed and analyzed troves of utility consumption data and then created practical monthly energy use reports for each school at Duke. These reports show historical usage trends, provide energy benchmarks for comparison, and make practical suggestions for energy savings.

Click here to read the Executive Summary

 

Faculty Lead: Billy Pizer

Project Manager: Sophia Ziwei Zhu

Cathy Lee (Statistics) and Jennifer Zheng (Math, Emory University) spent ten weeks building tools to help Duke University Libraries better understand its journal purchasing practice. Using a combination of web-scraping and data-merging algorithms, the team created a dashboard to help library strategists visualize and optimize journal selection.

 

Click here to read the Executive Summary

 

 

 

 

Faculty Leads: Angela Zoss, Jeff Kosokoff

Project Manager: Chi Liu

 Micalyn Struble (Computer Science, Public Policy), Xiaoqiao Xing (Economics), and Eric Zhang (Math) spent ten weeks exploring the use of neuroscience as evidence in criminal trials. Working with a large set of case files downloaded from WestLaw, the team used natural language processing to build a predictive model that has the potential to automate the process of locating relevant-to-neuroscience cases from databases.

 

Click here to read the Executive Summary

 

Faculty Lead: Nita Farahany

Project Manager: William Krenzer

The Middle Passage, the route by which most enslaved persons were brought across the Atlantic to North America, is a critical locus of modern history—yet it has been notoriously difficult to document or memorialize. The ultimate aim of this project is to employ the resources of digital mapping technologies as well as the humanistic methods of history, literature, philosophy, and other disciplines to envision how best to memorialize the enslaved persons who lost their lives between their homelands and North America. To do this, the students combined previously-disparate data and archival sources to discover where on their journeys enslaved persons died. Because of the nature of data itself and the history it represents, the team engaged in on-going conversations about various ways of visualizing its findings, and continuously evaluated the ethics of the data’s provenance and their own methodologies and conclusions. A central goal for the students was to discover what contribution digital data analysis methods could make to the project of remembering itself.

 

The group worked with two datasets: the Trans-Atlantic Slave Trade Database (www.slavevoyages.org), an SPSS-formatted database currently run out of Emory University, containing data on 36,002 individual slaving expeditions between 1514 and 1866; and the Climatological Database for the World’s Oceans 1750-1850 (CLIWOC) (www.kaggle.com/cwiloc/climate-data-from-ocean-ships), a dataset composed of digitized records from the daily logbooks of ocean vessels, originally funded by the European Union in 2001 for purposes of tracking historical climate change. This second dataset includes 280,280 observational records of daily ship locations, climate data, and other associated information. The team employed archival materials to confirm (and disconfirm) overlaps between the two datasets: the students identified 316 ships bearing the same name across the datasets, of which they confirmed 35 matching slaving voyages.

 

The students had two central objectives: first, to locate where and why enslaved Africans died along the Middle Passage, and, second, to analyze patterns in the mortality rates. The group found significant patterns in the mortality data in both spatial and temporal terms (full results can be found here). At the same time, the team also examined the ethics of creating visualizations based on data that were recorded by the perpetrators of the slave trade—opening up space for further developments of this project that would include more detailed archival and theoretical work.

 

Click here to read the Executive Summary

 

Image credit:

J.M.W. Turner, Slave Ship, 1840, Museum of Fine Arts, Boston (public domain)

Faculty Lead: Charlotte Sussman

Project Manager: Emma Davenport

Ellis Ackerman (Math, NCSU), Rodrigo Araujo (Computer Science), and Samantha Miezio (Public Policy) spent ten weeks building tools to help understand the scope, cause, and effects of evictions in Durham County. Using evictions data recorded by the Durham County Sheriff’s Department and demographic data from the American Community Survey, the team investigated relationships between rent and evictions, created cost-benefit models for eviction diversion efforts, and built interactive visualizations of eviction trends. They had the opportunity to consult with analytics professionals from DataWorks NC.

Project Leads: Tim Stallmann, John Killeen, Peter Gilbert

Project Manager: Libby McClure

 

The aim of this project was to explore how U.S. mass media—particularly newspapers—enlists text and imagery to portray human rights, genocide, and crimes against humanity from World War II until the present. From the Holocaust to Cambodia, from Rwanda to Myanmar, such representation has political consequences. Coined by Raphael Lemkin, a Polish lawyer who fled Hitler’s antisemitism, the term “genocide” was first introduced to the American public in a Washington Post op-ed in 1944. Since its legal codification by the United Nations Convention for the Prevention of Genocide in 1948, the term has circulated, been debated, used to describes events that pre-date it (such as the displacement and genocide of Native People in the Americas), and been shaped by numerous forces—especially the words and images published in newspapers. Alongside the definition of “genocide,” other key concepts, specifically “crimes against humanity,” have attempted to label, and thus name the story, of targeted mass violence. Conversely, the concept of “human rights,” enshrined in the 1948 UN Declaration, seeks to name a presence of rights instead of their absence.

 

During the summer, the team focused their work on evaluating the language used in Western media to represent instances of genocide and how such language varied based on the location and time period of the conflict. In particular, the team’s efforts centered on Rwanda and Bosnia as important case studies, affording them the chance to compare nearly simultaneous reporting on two well-known genocides. The language used by reporters in these two cases showed distinct polarizations of terminology (for instance, while “slaughter” was much more common than “murder” in discussions of the Rwanda genocide, the inverse was true for Bosnia).

 

Click here to read the Executive Summary

 

Faculty Leads: Nora Nunn, Astrid Giugni

How Much Profit is Too Much Profit?

Chris Esposito (Economics), Ruoyu Wu (Computer Science), and Sean Yoon (Masters, Decision Sciences) spent ten weeks building tools to investigate the historical trends of price gouging and excess profits taxes in the United States of America from 1900 to the present. The team used a variety of text-mining methods to create a large database of historical documents, analyzed historical patterns of word use, and created an interactive R Shiny app to display their data and analyses.

Click here to read the Executive Summary

 

(cartoon from The Masses July 1916)

Faculty Lead: Sarah Deutsch

Project Manager: Evan Donahue

Maria Henriquez (Computer Science, Statistics) and Jacob Sumner (Biology) spent ten weeks building tools to help the Michael W. Krzyzewski Human Performance Lab best utilize its data from Duke University student athletes. The team worked with a large collection of athlete strength, balance, and flexibility measurements collected by the lab. They improved the K Lab’s data pipeline, created a predictive model for injury risk, and developed interactive web-based individualized injury risk reports.

Click here to read the Executive Summary

Faculty Lead: Dr. Tim Sell
Project Manager: Brinnae Bent

 

 

Vincent Wang (Computer Science, CE), Karen Jin (Bio/Stats), and Katherine Cottrell (Computer Science) spent ten weeks building tools to educate the public about lake dynamics and ecosystem health. Using data collected over a period of 50 years at the Experimental Lake Area (ELA) in Ontario, the team preprocessed and merged datasets, made a series of data visualizations, and produced an interactive website using R Shiny.

Click here to read the Executive Summary

 

Faculty Lead: Kateri Salk

Project Manager: Kim Bourne

Vivek Sahukar (Masters, Data Science), Yuval Medina (Computer Science), and Jin Cho (Computer Science/Electrical & Compter Engineering) spent ten weeks creating tools to help augment the experience of users in the StreamPULSE community. The team created an interactive guide and used data sonification methods to help users navigate and understand the data, and they used a mixture of statistical and machine-learning methods to build out an outlier detection and data cleaning pipeline.

Click here to read the Executive Summary

Faculty Leads: Emily Bernhardt, Jim Heffernan

Project Managers: Alice Carter, Michael Vlah

Aidan Fitzsimmons (Public Policy, Mathematics, Electrical & Computer Engineering), Joe Choo (Mathematics, Economics) and Brooke Scheinberg (Mathematics) spent ten weeks partnering with the Durham Crisis Intervention Team, the Criminal Justice Resource Center, and the Stepping Up Initiative. Utilizing booking data of 57,346 individuals provided by the Durham County Jail, this team was able to create visualizations and predictive models that illustrate patterns of recidivism, with a focus on the subset of the population with serious mental illness (SMI). These results could assist current efforts in diverting people with SMI from the criminal justice system and into care.

Click here to read the Executive Summary

Faculty Lead: Nicole Schramm-Sapyta, Michele Easter

Project Manager: Ruth Wygle

The students in this project worked on a pervasive question in literary, film, and copyright studies: how do we know when a new work of fiction borrows from an older one? Many times, works are appropriated, rather than straightforwardly adapted, which makes it difficult for human readers to trace. As we continue to remake and repurpose previous texts into new forms that combine hundreds of references to other works (such as Ready Player One), it becomes increasingly laborious to track all the intertextual elements of a single text. While some borrowings are easy to spot, as in the case of Marvel films that are straightforward adaptations of comic book storylines and aesthetics, others are more subtle, as when Disney reinterpreted Hamlet and African oral traditions to create The Lion King. Thousands of new stories are created each day, but how do we know if we are borrowing or appropriating a previous text? Are there works that have adapted previous ones that we have yet to identify?

 

The students worked with data from over 16.7 million books from Hathitrust, with critical analysis in scholarly articles accessible through JSTOR, and with the topic categories in Wikipedia. The group used Latent Dirichlet Allocation (LDA), a generative model that assumes that all documents are a mixture of topics, to represent key themes and topics as a distribution over words. The students developed a flexible and graduated heuristic for identifying a work as an adaptation; the more pre-selected categories a work fit under, the more likely it was to be marked as an adaptation by their model. Over the summer, the students came to appreciate that all digital humanistic methodologies are contestable and dependent on traditional critical work.

 

Click here to read the Executive Summary

Faculty Lead: Grant Glass

Jett Hollister (Mechanical Engineering) and Lexx Pino (Computer Science, Math) joined Economics majors Shengxi Hao and Cameron Polo in a ten week study of the late 2000s housing bubble. The team scraped, merged, and analyzed a variety of datasets to investigate different proposed causes of the bubble. They also created interactive visualizations of their data which will eventually appear on a website for public consumption.

Click here to read the Executive Summary

 

Faculty Lead: Lee Reiners

Project Manager: Kate Coulter

Cassandra Turk (Economics) and Alec Ashforth (Economics, Math) spent ten weeks building tools to help minimize the risk of trading electricity on the wholesale energy market. The team combined data from many sources and employed a variety of outlier-detection methods and other statistical tools in order to create a large dataset of extreme energy events and their causes. They had the opportunity to consult with analytics professionals from Tether Energy.

Click here to read Executive Summary

 

Project Lead: Eric Butter, Tether

Andre Wang (Math, Statistics), Michael Xue (Computer Science, ECE), and Ryan Culhane (Computer Science) spent ten weeks exploring the role played by emotion in speech-focused machine-learning. The team used a variety of techniques to build emotion recognition pipelines, and incorporated emotion into generated speech during text-to-speech synthesis.

Click here to read the Executive Summary

 

Faculty Leads: Vahid Tarokh, Jie Ding

Project Manager: Enmao Diao