Data+

Data+ is a 10-week summer research experience that welcomes Duke undergraduates interested in exploring new data-driven approaches to interdisciplinary challenges. Students join small project teams, collaborating with other teams in a communal environment. They learn how to marshal, analyze, and visualize data, while gaining broad exposure to the modern world of data science.

Data+ 2020

Scroll down to see our upcoming projects!

 

 

  • “I feel I have a better understanding of how to communicate my work to different groups of people. We met with stakeholders, other Data+ teams, and people from the University, all with different levels of technical knowledge, and this really allowed me to be adaptable in how I present what I've worked on. Before, I really thought "data scientist" described what I wanted to be. Now, I have a better understanding of the things I like and dislike. Working extensively on modeling this summer has made me aware that I really enjoy project management on real world projects, which is great insight to have so early in my undergraduate career.”
    Meredith Brown
    Getting Granular on Social Determinants of Health

  • “I have gained experience in applying my classroom knowledge of data science to real world problems. I have made connections that could help further my future data science careers. I have also received offers and opportunities to continue working on the Data+ project and other related projects in the next coming semesters. At first, I thought data science and statistics were basically the same thing. I now realize how insanely wrong I was. Data science is this incredible combination of statistics, computer science, communications, and any other hobby or interest you so choose. I never would have realized the infinite ways data science could be applied, and I am so thankful that this program has changed my perception of Data Science research.”
    —Maria Henriquez
    Optimizing Risk Assessment for Duke University Student Athlete Injury Prevention

  • “I’ve gained technical/coding skills, tangible experience with real big data, communication and public speaking skills, patience, and flexibility. There is way more problem-solving, dirty data, and things you have to account for than I thought in data science. Also, nothing is straightforward. Anything "shown" through data has used specific algorithms to get there.”
    —Brooke Scheinberg
    Recidivism in Durham County Jails

  • “Data+ enabled me to hone my research, communication, and presentation skills. I never fully understood the data science discipline until this experience.”
    —Victoria Worsham
    Investigating Oil and Gas Production in the United Kingdom

  • “As a second time participant in Data+, I gained experience in a different type of data science project. Last summer, my team started with dirty data which we cleaned and performed NLP/machine learning classification on. This summer, my team had to web scrape and make API requests to collect a significant portion of data. Collecting and cleaning data enabled me to think more about how to obtain and organize data in a way that is convenient for the next person using it. Building a dashboard to visualize trends in the data introduced me to iterative dashboard design, in which a series of client-specific changes are made after the general structure and functionality of the dashboard is built. For example, changing the user input bar from a sidebar to a horizontal bar displayed input options more clearly and made more user-friendly for our client."
    Cathy Lee
    Breaking the Bundle

  • “The exposure to lots of different methods and applications across data science has been very eye-opening. I've also gained perspective on the kind of skills, both "soft" and "hard,” that are useful both in academic environments and the real world. I didn't realize how much of data science was learning and researching topics on the fly. This is encouraging, because it means that if you can develop this skill of being able to quickly learn a topic/language/method, you're rarely under-qualified for a job or position.”
    Alex Bendeck
    Basketball Analytics Pipeline

  • “I gained a passion for sports analytics and learned important ML concepts. You really can apply data science to any field you like.”
    Anshul Shah
    Basketball Analytics Pipeline

  • “I feel like I gained a lot of hard skills related to coding in different languages like Python and R. I'm really happy I had a chance to dive into some more complex machine learning problems like computer vision. Most importantly, I feel like I've gained enough confidence in my data science skills to pursue a related career after college.
    Niyaz Nurbhasha
    Basketball Analytics Pipeline

  • “The most important thing I've gained this summer is a sense of understanding of what it's like to work for a client and produce deliverables on a timeline. The biggest takeaway I've gotten about data science is how much data is missing in the world, and how hard it can be to find. There are a lot of really beautiful, well formatted datasets out there, but sometimes, there just isn't a way to get what you're looking for--and that's often where you come in. There's a lot of time spent putting together these datasets in a meaningful way, and figuring out how to format them in the most helpful way for future work, especially if you're not going to be the ones using them. It's a lot like commenting code, but tagging your columns is really important to ensure that everyone going forward knows how to work with what you've put together.”
    Cassandra Turk
    Identifying Extreme Events in Wholesale Energy Markets

  • “I now understand there are many ways to be useful in a collaborative environment. As the only non-technical major on my team, I first believed I couldn't bring as much to the table, but I found there is immense utility in being the person with the least technical experience. Not only was I able to perceive problems in a different way, offering fresh solutions, but I was also able to gain much more knowledge from my teammates. Additionally, I was able to incorporate my sociological understanding of people into the design aspect of our project to create a truly useful user experience. No matter your background, technical or non-technical, you always have something important to share and to learn.”
    Elizabeth Loschiavo
    Big Data for Reproductive Health (Year 2)

  • “I think that Data+ has taught me a lot about how to create something really cool out of nothing. We had to change our project goals because of technical issues and holdups, and I think I learned a lot about how to go from just raw data and ask really interesting questions.”
    Aidan Fitzsimons
    Recidivism in Durham County Jails

     

  • “I gained valuable experience with working both with data, something I had never had the chance to do before, but perhaps more importantly I gained insight into working with a small team alongside a project manager.”
    Dennis Harssch​​​​​​​
    Big Data for Reproductive Health (Year 2)

  • “I didn't know Data Science research was a thing at the undergraduate level, so it showed me some possibilities. I have gained a better understanding of machine learning topics and when to apply different methods. I've also learned a lot looking at results from our models and pulling meaning out of them–even if that meaning is just that there was a bug and we need to redo it differently.”
    Varun Nair
    Deep Learning and Energy Access Decisions

  • “Data+ in many ways widened my perception of what it means to participate in Data Science research. My particular project was pretty in line with my perception before coming to Data+ but many of the other projects looked very different than I imagined a Data Science research project looking in terms of the type of data they were using and the clients.”
    Ryan Culhane
    Speech Emotion Analysis

  • “I feel I have gained a much greater understanding of how to approach problems computationally as well as working on a small team in order to solve/fix these problems! It has been a great experience overall, and I wish I could do it again. Working with the stakeholders we had was great, because they were also so involved and enthusiastic. Our project manager couldn't have been better either. I had a great summer with Data+! It was honestly more exciting and fun than I realized, and participating in Data+ helped me see that I could do computational work in the future and be happy doing it.”
    Jake Sumner
    Athletic Injury Risk Assessment

  • “This summer of Data+ showed me that data science research is also interested in intensely analyzing and understanding outliers of a dataset, rather than just the averages. Many probability and statistics classes emphasize the mean, variation, and overall distribution of a dataset without paying too much attention to the outliers and asking if those outliers can really be explained by the same set of processes affecting 99%+ of the other data points. This summer of Data+ showed me that data science research is also interested in intensely analyzing and understanding these outliers.”
    —Alec Ashforth
    Identifying Extreme Events in Wholesale Energy Markets

  • “Having no prior exposure, outside of a statistics course, to data science and data visualization, I feel like I got a lot out of this program. At least for me, there was a little bit of "drinking from a firehose,” and a steep learning curve but I learned a lot very quickly. The data visualization programs with which I worked are definitely ones I could see myself using in the future.”
    —Anonymous student
    Remembering the Middle Passage

  • “I have gained extensive knowledge about all the human rights conflicts in the world and also just a general understanding of many different facts of data science. I realized that I am not as interested in research as a profession, but I did learn that the digital humanities are a very legitimate field.”
    Anonymous student

  • “I have gained some knowledge in R as well as great teamwork skills.”
    Marco Gonzalez Blancas
    Duke Building Energy Use Report

  • “I feel I have gained a lot of technical skills regarding natural language processing which I really wanted to experience. I also got to experience what it was like working on a research team and attempting to solve a problem in that setting, which I found really valuable. I realized that data science really is a part of every field and it is so broad in terms of scope and what kind of work you will be doing.”
    Nikhil Kaul
    Invisible Adaptations

  • “I went from fumbling through CS101 my freshman fall to playing around with the hyper parameters of machine learning algorithms! That's a success in my book. I learned that data science projects take time and many little failures before you can arrive at anything close to the outcome that you'd like. I learned that, seemingly counterintuitively, I can't rely on other people to teach me things I can learn on my own, but there exists a whole community of people in data science who want to help me learn and grow. I feel more confident tackling problems that have no clear solution, and I've found a new appreciation for API documentations. I'm so thankful that I had the opportunity to spend my summer with Data+!”
    Micalyn Struble
    Neuroscience in the Courtroom

  • “I can code significantly faster now. I have also been through a truncate version of the problem solving process in the real world.”
    —Ellis Ackerman 
    Durham Evictions

  • “I have gotten a better look at where humanities can be useful and frankly, necessary, in implementing new methodologies and approaches to the data science field.”
    —Anonymous

  • “I have learned a lot about JavaScript and Google Earth Engine and issuing remote sensing data to see what changes are happening at Alligator River National Wildlife Refuge and how they are happening. I have also learned that working on a project like this is a great way to learn and make something at the same time.”
    Katelyn Chang
    Saltwater Intrusion on Coastal Ecosystems

  • “I’ve learned many technical tools for data analysis as well as improved my abilities as working part of a team. It was a unique experience working in data science research because this type of work isn’t offered in the lab portion of many classes. It’s an incredible opportunity to engage in data science research as an undergraduate student, especially when the decision making process is mostly up to you and your team.”
    Jeevan Tewari
    Human Rights in the Postwar World

  • “Data+ helped me learn a lot about how to create a machine learning pipeline in order to discover meaningful relationships in data. It also gave me a broader view of data science and career opportunities. I also learned that data science research involves a lot of preprocessing and simply turning the data into a form that a model can digest.”
    Michael Xue
    Speech Emotion Analysis

  • “I've learned so much about data science and I now feel very comfortable coding in several languages. There were multiple times during the summer where I was asked to do something and had absolutely no clue what I was doing, but Data+ gave me the skills and confidence to solve problems beyond what I thought was the limit of my knowledge. (Ultimately, I became really good at googling how to code!). I learned the importance of communication. It's very easy to spend all day coding, but if you're not able to communicate with the public on what you are doing, the code is meaningless.”
    Anonymous

  • “I have gained a more realistic understanding of what working in a data scientist role looks like. I have learned a lot more than I was expecting to and cemented the skills that I had learned in class. I came into the summer unsure of what area of technology I wanted to pursue as a career, but now I am more confident of my role within data science. I have further learned what it looks like to work within a team, with a manager, and with other coworkers.”
    Jackson Hubbard
    Basketball Analytics Pipeline

10
weeks during the summer
2-3
undergraduates per team
1-2
grad student mentors
25
projects sharing ideas and code

Related Videos

Projects

A team of students, led by University Archivist Valerie Gillispie and Professor Don Taylor, will take a closer look at how the student body at Duke has transformed into a coeducational student body from around the world enrolled in ten different schools. Students will seek to transform digital and historical data into a dynamic visual display which allows viewers to examine changes in the student body over time in terms of three dimensions: geographic origin, gender, and school. The students will use born-digital data along with historical, paper-based data to assemble a data corpus. The goal is to demonstrate trends and changes over time in terms of where Duke students have come from, identifying statistically significant shifts and patterns that warrant further study.

Project Leads: Don Taylor, Valerie Gillispie

A team of students led by researchers in the Energy Initiative and the Energy Access Project will explore historical data on the U.S. Electric Farm Equipment (EFE) demonstration show that ran between 1939 and 1941, which aimed to increase usage of electricity in rural areas. Students will compile data collected by the Rural Electrification Agency into a machine-readable form, and then use that data to explore and visualize the EFE’s impact. If time allows, they will then compare data from the EFE and a related, smaller-scale project from 1923 (“Red Wing Project”) to current data on appliance promotion programs in villages in East Africa that have recently gained access to electricity. The outcomes of this analysis would offer evidence on the successes and limitations of these types of programs, and the relevance of the historical U.S. case to countries that are currently facing similar challenges.

Project Leads: Victoria Plutshack, Jonathon Free, Robert Fetter

A team of students led by the Nunn lab and its collaborators will investigate the ecological and behavioral factors that determine parasitism in different species of primates. Based on publicly available data and evolutionary trees, students will investigate parasitism by developing a network of primate-parasite relationships. This network will then be used to infer the ecological and behavioral characteristics that best predict parasitism. The findings are relevant to identifying emerging infectious diseases in humans, and also for conservation efforts globally.

Project Leads: Jim Moody, Charles Nunn

Project Manager: Marie Claire Chelini

A team of students led by researchers from the Internet of Water project at the Nicholas Institute will develop an online tool that allows local water systems to update and verify their service boundaries while maintaining data security and functionality for state regulators. States oversee hundreds of water systems with system service areas and boundaries that change over time. An online tool enabling water system managers to update their service areas would enable an improved, time-saving process for creating and maintaining up-to-date water system boundaries. Students will have the opportunity to interact with state regulators and water system managers in North Carolina and California who will provide feedback on design and usability. This tool will improve system boundary data that are used for planning and decision-making purposes. Additionally, the tool may include functionality for basic spatial analyses such as overlaying boundaries on sociodemographic, economic, and environmental data. This would enable impact analyses, the identification of utilities and vulnerable populations affected by environmental hazards to water systems, and multi-system regional water supply projections.

Project Leads: Megan Mullin, Lauren Patterson, Kyle Onda

A team of students led by eating disorders expert Nancy Zucker and engineering professor Guillermo Sapiro will develop multimodal computational tools to help improve the nutritional status and food enjoyment of young children with Avoidant/Restrictive Food Intake Disorder (ARFID), children who are not eating enough food or are eating an inadequate variety of food to the degree that it impairs functioning. Students will analyze facial affect and behavior from videos of children trying new foods and will derive sensory profiles based on children’s patterns of food acceptance. These analyses will serve as the basis for personalized recommendations for parents that will suggest actionable next steps to increase their child’s food acceptance.

Project Leads: Guillermo Sapiro, Nancy Zucker

Project Manager: Julia Nichols

A team of students led by Humanities Unbounded Fellow Eva Michelle Wheeler will explore how culturally-bound language in African-American literature and film is rendered for international audiences and will map where and into which languages these translations are occurring. Students will use a reference dataset to build and annotate a translation corpus, explore the lexical choices and translation strategies employed by translators, and conduct a macro-level analysis of the geographic and linguistic spread of these types of translations. The results of this project will bring a quantitative dimension to what has largely been a qualitative analysis and will contribute to ongoing academic conversations about language, race, and globalization.  

Project Lead: Eva Wheeler

Human activity recognition (HAR) is a rapidly expanding field with a variety of applications from biometric authentication to developing home-based rehabilitation for people suffering from traumatic brain injuries. While HAR is traditionally performed using accelerometry data, a team of students led by researchers in the BIG IDEAS Lab will explore HAR with physiological data from wrist wearables. Using deep learning methods, students will extract features from wearable sensor data to classify human activity. The student team will develop a reproducible machine learning model that will be integrated into the Big Ideas Lab Digital Biomarker Discovery Pipeline (DBDP), which is a source of code for researchers and clinicians developing digital biomarkers from wearable sensors and mobile health technologies.

Project Lead: Jessilyn Dunn

Project Manager: Brinnae Brent

Disciplines involved: Health, Biology, Biomedical Engineering

Social and environmental contexts are increasingly recognized as factors that impact health outcomes of patients. This team will have the opportunity to collaborate directly with clinicians and medical data in a real-world setting. They will examine the association between social determinants with risk prediction for hospital admissions, and to assess whether social determinants bias that risk in a systematic way. Applied methods will include machine learning, risk prediction, and assessment of bias. This Data+ project is sponsored by the Forge, Duke's center for actionable data science.

Project Leads: Shelly Rusincovitch, Ricardo Henao, Azalea Kim

Project Manager: Austin Talbot

Aaron Chai (Computer Sciece, Math) and Victoria Worsham (Economics, Math) spent ten weeks building tools to understand characteristics of successful oil and gas licenses in the North Sea. The team used data-scraping, merging, and OCR method to create a dataset containing license information and work obligations, and they also produced ArcGIS visualizations of license and well locations. They had the chance to consult frequently with analytics professionals at ExxonMobil.

Click here to read the Executive Summary

 

Project Lead: Kyle Bradbury

Project Manager: Artem Streltsov

Yueru Li (Math) and Jiacheng Fan (Economics, Finance) spent ten weeks investigating abnormal behavior by companies bidding for oil and gas rights in the Gulf of Mexico. Working with data provided by the Bureau of Ocean Energy Management and ExxonMobil, the team used outlier detection methods to automate the flagging of abnormal behavior, and then used statistical methods to examine various factors that might predict such behavior. They had the chance to consult frequently with analytics professionals at ExxonMobil.

 

Click here to read the Executive Summary

 

Project Lead: Kyle Bradbury

Project Manager: Hyeongyul Roh

Team A: Video data extraction

Alexander Bendeck (Computer Science, Statistics) and Niyaz Nurbhasha (Economics) spent ten weeks building tools to extract player and ball movement in basketball games. Using freely available broadcast-angle video footage which required much cleaning and pre-processing, the team used OpenPose software and employed neural network methodologies. Their pipeline fed into the predictive models of Team C.

Click here to read the Executive Summary

 

Team B: Modeling basketball data: offense

Anshul Shah (Computer Science, Statistics), Jack Lichtenstein (Statistics), and Will Schmidt (Mechanical Engineering) spent ten weeks building tools to analyze offensive play in basketball. Using 2014-5 Duke Men’s Basketball player-tracking data provided by SportVU, the team constructed statistical models that explored the relationship between different metrics of offensive productivity, and also used computational geometry methods to analyze the off-ball “gravity” of an offensive player.

Click here to read the Executive Summary

 

Team C: Modeling basketball data: defense

Lukengu Tshiteya (Statistics), Wenge Xie (ECE), and Joe Zuo (Computer Science, Statistics) spent ten weeks building tools to predict player movement in basketball games. Using SportVU data, including some pre-processed by Team A, the team built predictive RNN models that distinguish between 6 typical movement types, and created interactive visualizations of their findings in R Shiny.

Click here to read the Executive Summary

 

Team D: Visualizing basketball data

Shixing Cao (ECE) and Jackson Hubbard (Computer Science, Statistics) spent ten weeks building visualizations to help analyze basketball games. Using player tracking data from Duke basketball games, the team created visualizations of gameflow, networks of points and assists, and integrated all of their tools into an R Shiny app.

Click here to read the Executive Summary

 

Faculty Leads: Alexander Volfovsky, James Moody, Katherine Heller

Project Managers: Fan Bu, Heather Matthews, Harsh Parikh, Joe Zuo

Yanchen Ou (Computer Science) and Jiwoo Song (Chemistry, Mechanical Engineering) spent ten weeks building tools to assist in the analysis of smart meter data. Working with a large dataset of transformer and household data from the Kyrgyz Republic, the team built a data preprocessing pipeline and then used unsupervised machine-learning techniques to assess energy quality and construct typical user profiles.

 

Click here to read the Executive Summary

 

Faculty Lead: Robyn Meeks

Project Manager: Bernard Coles

Bernice Meja (Philosophy, Physics), Jessica Yang (Computer Science, ECE), and Tracey Chen (Computer Science, Mechanical Engineering) spent ten weeks building methods for Duke’s Office of Information Technology (OIT) to better understand information arising from “smart” (IoT) devices on campus. Working with data provided by an IoT testbed set up by OIT professionals, the team used a mixture of supervised and unsupervised machine-learning techniques and built a prototype device classifier.

 

Click here ot read the Executive Summary

 

Project Lead: Will Brockselsby

Interested in understanding the types of attacks targeting Duke and other universities?  Led by OIT and the IT Security Office, students will learn to analyze threat intelligence data to identify trends and patterns of attacks.  Duke blocks an average of 1.5 billion malicious connection attempts/day and is working with other universities to share the attack data.  One untapped area is research into the types of attacks and learning how universities are targeted.  Students will collaborate alongside the security and IT professionals in analyzing the data and with the intent to discern patterns.

Project Lead: Jesse Bowling

Project Manager: Susan Jacobs

Katelyn Chang (Computer Science, Math) and Haynes Lynch (Environmental Science, Policy) spent ten weeks building tools to analyze and visualize geospatial and remote sensing data arising from the Alligator River National Wildlife Refuge (ARNWR). The team produced interactive maps of physical characteristics that were tailored to specific refuge management professionals, and also built classifiers for vegetation detection in LandSat imagery.

 

Click here to read the Executive Summary

 

Faculty Leads: Justin Wright, Emily Bernhardt

Project Manager: Emily Ury

Dennis Harrsch, Jr. ( Computer Science ), Elizabeth Loschiavo ( Sociology ), and Zhixue (Mary) Wang ( Computer Science, Statistics ) spent ten weeks improving upon the team’s web platform that allows users to examine contraceptive use in low and middle income (LMIC) countries collected by the Demographic and Health Survey (DHS) contraceptive calendar. The team improved load times, data visualization latency, and increased the number of country surveys available in the platform from 3 to 55. The team also created a new app that allows users to explore the results of machine learning using this big data set.

This project will continue into the academic year via Bass Connections where student teams will refine the machine learning model results and explore the question of whether and how policymakers can use these tools to improve family planning in LMIC settings.

 

Click here to view the Executive Summary

 

Faculty Lead: Megan Huchko

Project Manager: Amy Finnegan

Nathaniel Choe (ECE) and Mashal Ali (Neuroscience) spent ten weeks developing machine-learning tools to analyze urodynamic detrusor pressure data of pediatric spina bifida patients from the Duke University Hospital. The team built a pipeline that went from raw time series data to signal analysis to dimension reduction to classification, and has the potential to assist in clinician diagnosis.

 

Click here to read the Executive Summary

 

Faculty Leads: Wilkins Aquino, Jonathan Routh

Project Manager: Zekun Cao

Varun Nair (Economics, Physics), Paul Rhee (Computer Science), Jichen Yang (Computer Science, ECE), and Fanjie Kong (Computer Vision) spent ten weeks helping to adapt deep learning techniques to inform energy access decisions.

 

Click here to read the Executive Summary

 

Faculty Lead: Kyle Bradbury

Project Manager: Fanjie Kong

Yoav Kargon (Mechanical Engineering) and Tommy Lin (Chemistry, Computer Science) spent ten weeks working with data from the Water Quality Portal (WQP), a large national dataset of water quality measurements aggregated by the USGS and EPA. The team went all the way from raw data to the production of Pondr, an interactive and comprehensive tool built with R Shiny that permits users to investigate and visualize data coverage, values, and trends from the WQP.

 

Click here to read the Executive Summary

 

Faculty Lead: Jim Heffernan

Project Manager: Nick Bruns

Marco Gonazales Blancas (Civil Engineering) and Mengjie Xiu (Masters, BioStatistics) spent ten weeks building tools to help Duke reduce its energy footprint and achieve carbon neutrality by 2024. The team processed and analyzed troves of utility consumption data and then created practical monthly energy use reports for each school at Duke. These reports show historical usage trends, provide energy benchmarks for comparison, and make practical suggestions for energy savings.

Click here to read the Executive Summary

 

Faculty Lead: Billy Pizer

Project Manager: Sophia Ziwei Zhu

Cathy Lee (Statistics) and Jennifer Zheng (Math, Emory University) spent ten weeks building tools to help Duke University Libraries better understand its journal purchasing practice. Using a combination of web-scraping and data-merging algorithms, the team created a dashboard to help library strategists visualize and optimize journal selection.

 

Click here to read the Executive Summary

 

 

 

 

Faculty Leads: Angela Zoss, Jeff Kosokoff

Project Manager: Chi Liu

 Micalyn Struble (Computer Science, Public Policy), Xiaoqiao Xing (Economics), and Eric Zhang (Math) spent ten weeks exploring the use of neuroscience as evidence in criminal trials. Working with a large set of case files downloaded from WestLaw, the team used natural language processing to build a predictive model that has the potential to automate the process of locating relevant-to-neuroscience cases from databases.

 

Click here to read the Executive Summary

 

Faculty Lead: Nita Farahany

Project Manager: William Krenzer

The Middle Passage, the route by which most enslaved persons were brought across the Atlantic to North America, is a critical locus of modern history—yet it has been notoriously difficult to document or memorialize. The ultimate aim of this project is to employ the resources of digital mapping technologies as well as the humanistic methods of history, literature, philosophy, and other disciplines to envision how best to memorialize the enslaved persons who lost their lives between their homelands and North America. To do this, the students combined previously-disparate data and archival sources to discover where on their journeys enslaved persons died. Because of the nature of data itself and the history it represents, the team engaged in on-going conversations about various ways of visualizing its findings, and continuously evaluated the ethics of the data’s provenance and their own methodologies and conclusions. A central goal for the students was to discover what contribution digital data analysis methods could make to the project of remembering itself.

 

The group worked with two datasets: the Trans-Atlantic Slave Trade Database (www.slavevoyages.org), an SPSS-formatted database currently run out of Emory University, containing data on 36,002 individual slaving expeditions between 1514 and 1866; and the Climatological Database for the World’s Oceans 1750-1850 (CLIWOC) (www.kaggle.com/cwiloc/climate-data-from-ocean-ships), a dataset composed of digitized records from the daily logbooks of ocean vessels, originally funded by the European Union in 2001 for purposes of tracking historical climate change. This second dataset includes 280,280 observational records of daily ship locations, climate data, and other associated information. The team employed archival materials to confirm (and disconfirm) overlaps between the two datasets: the students identified 316 ships bearing the same name across the datasets, of which they confirmed 35 matching slaving voyages.

 

The students had two central objectives: first, to locate where and why enslaved Africans died along the Middle Passage, and, second, to analyze patterns in the mortality rates. The group found significant patterns in the mortality data in both spatial and temporal terms (full results can be found here). At the same time, the team also examined the ethics of creating visualizations based on data that were recorded by the perpetrators of the slave trade—opening up space for further developments of this project that would include more detailed archival and theoretical work.

 

Click here to read the Executive Summary

 

Image credit:

J.M.W. Turner, Slave Ship, 1840, Museum of Fine Arts, Boston (public domain)

Faculty Lead: Charlotte Sussman

Project Manager: Emma Davenport

Ellis Ackerman (Math, NCSU), Rodrigo Araujo (Computer Science), and Samantha Miezio (Public Policy) spent ten weeks building tools to help understand the scope, cause, and effects of evictions in Durham County. Using evictions data recorded by the Durham County Sheriff’s Department and demographic data from the American Community Survey, the team investigated relationships between rent and evictions, created cost-benefit models for eviction diversion efforts, and built interactive visualizations of eviction trends. They had the opportunity to consult with analytics professionals from DataWorks NC.

Project Leads: Tim Stallmann, John Killeen, Peter Gilbert

Project Manager: Libby McClure

 

The aim of this project was to explore how U.S. mass media—particularly newspapers—enlists text and imagery to portray human rights, genocide, and crimes against humanity from World War II until the present. From the Holocaust to Cambodia, from Rwanda to Myanmar, such representation has political consequences. Coined by Raphael Lemkin, a Polish lawyer who fled Hitler’s antisemitism, the term “genocide” was first introduced to the American public in a Washington Post op-ed in 1944. Since its legal codification by the United Nations Convention for the Prevention of Genocide in 1948, the term has circulated, been debated, used to describes events that pre-date it (such as the displacement and genocide of Native People in the Americas), and been shaped by numerous forces—especially the words and images published in newspapers. Alongside the definition of “genocide,” other key concepts, specifically “crimes against humanity,” have attempted to label, and thus name the story, of targeted mass violence. Conversely, the concept of “human rights,” enshrined in the 1948 UN Declaration, seeks to name a presence of rights instead of their absence.

 

During the summer, the team focused their work on evaluating the language used in Western media to represent instances of genocide and how such language varied based on the location and time period of the conflict. In particular, the team’s efforts centered on Rwanda and Bosnia as important case studies, affording them the chance to compare nearly simultaneous reporting on two well-known genocides. The language used by reporters in these two cases showed distinct polarizations of terminology (for instance, while “slaughter” was much more common than “murder” in discussions of the Rwanda genocide, the inverse was true for Bosnia).

 

Click here to read the Executive Summary

 

Faculty Leads: Nora Nunn, Astrid Giugni

How Much Profit is Too Much Profit?

Chris Esposito (Economics), Ruoyu Wu (Computer Science), and Sean Yoon (Masters, Decision Sciences) spent ten weeks building tools to investigate the historical trends of price gouging and excess profits taxes in the United States of America from 1900 to the present. The team used a variety of text-mining methods to create a large database of historical documents, analyzed historical patterns of word use, and created an interactive R Shiny app to display their data and analyses.

Click here to read the Executive Summary

 

(cartoon from The Masses July 1916)

Faculty Lead: Sarah Deutsch

Project Manager: Evan Donahue

Maria Henriquez (Computer Science, Statistics) and Jacob Sumner (Biology) spent ten weeks building tools to help the Michael W. Krzyzewski Human Performance Lab best utilize its data from Duke University student athletes. The team worked with a large collection of athlete strength, balance, and flexibility measurements collected by the lab. They improved the K Lab’s data pipeline, created a predictive model for injury risk, and developed interactive web-based individualized injury risk reports.

Click here to read the Executive Summary

Faculty Lead: Dr. Tim Sell
Project Manager: Brinnae Bent

 

 

Vincent Wang (Computer Science, CE), Karen Jin (Bio/Stats), and Katherine Cottrell (Computer Science) spent ten weeks building tools to educate the public about lake dynamics and ecosystem health. Using data collected over a period of 50 years at the Experimental Lake Area (ELA) in Ontario, the team preprocessed and merged datasets, made a series of data visualizations, and produced an interactive website using R Shiny.

Click here to read the Executive Summary

 

Faculty Lead: Kateri Salk

Project Manager: Kim Bourne

Vivek Sahukar (Masters, Data Science), Yuval Medina (Computer Science), and Jin Cho (Computer Science/Electrical & Compter Engineering) spent ten weeks creating tools to help augment the experience of users in the StreamPULSE community. The team created an interactive guide and used data sonification methods to help users navigate and understand the data, and they used a mixture of statistical and machine-learning methods to build out an outlier detection and data cleaning pipeline.

Click here to read the Executive Summary

Faculty Leads: Emily Bernhardt, Jim Heffernan

Project Managers: Alice Carter, Michael Vlah

Aidan Fitzsimmons (Public Policy, Mathematics, Electrical & Computer Engineering), Joe Choo (Mathematics, Economics) and Brooke Scheinberg (Mathematics) spent ten weeks partnering with the Durham Crisis Intervention Team, the Criminal Justice Resource Center, and the Stepping Up Initiative. Utilizing booking data of 57,346 individuals provided by the Durham County Jail, this team was able to create visualizations and predictive models that illustrate patterns of recidivism, with a focus on the subset of the population with serious mental illness (SMI). These results could assist current efforts in diverting people with SMI from the criminal justice system and into care.

Click here to read the Executive Summary

Faculty Lead: Nicole Schramm-Sapyta, Michele Easter

Project Manager: Ruth Wygle

The students in this project worked on a pervasive question in literary, film, and copyright studies: how do we know when a new work of fiction borrows from an older one? Many times, works are appropriated, rather than straightforwardly adapted, which makes it difficult for human readers to trace. As we continue to remake and repurpose previous texts into new forms that combine hundreds of references to other works (such as Ready Player One), it becomes increasingly laborious to track all the intertextual elements of a single text. While some borrowings are easy to spot, as in the case of Marvel films that are straightforward adaptations of comic book storylines and aesthetics, others are more subtle, as when Disney reinterpreted Hamlet and African oral traditions to create The Lion King. Thousands of new stories are created each day, but how do we know if we are borrowing or appropriating a previous text? Are there works that have adapted previous ones that we have yet to identify?

 

The students worked with data from over 16.7 million books from Hathitrust, with critical analysis in scholarly articles accessible through JSTOR, and with the topic categories in Wikipedia. The group used Latent Dirichlet Allocation (LDA), a generative model that assumes that all documents are a mixture of topics, to represent key themes and topics as a distribution over words. The students developed a flexible and graduated heuristic for identifying a work as an adaptation; the more pre-selected categories a work fit under, the more likely it was to be marked as an adaptation by their model. Over the summer, the students came to appreciate that all digital humanistic methodologies are contestable and dependent on traditional critical work.

 

Click here to read the Executive Summary

Faculty Lead: Grant Glass

Jett Hollister (Mechanical Engineering) and Lexx Pino (Computer Science, Math) joined Economics majors Shengxi Hao and Cameron Polo in a ten week study of the late 2000s housing bubble. The team scraped, merged, and analyzed a variety of datasets to investigate different proposed causes of the bubble. They also created interactive visualizations of their data which will eventually appear on a website for public consumption.

Click here to read the Executive Summary

 

Faculty Lead: Lee Reiners

Project Manager: Kate Coulter

Cassandra Turk (Economics) and Alec Ashforth (Economics, Math) spent ten weeks building tools to help minimize the risk of trading electricity on the wholesale energy market. The team combined data from many sources and employed a variety of outlier-detection methods and other statistical tools in order to create a large dataset of extreme energy events and their causes. They had the opportunity to consult with analytics professionals from Tether Energy.

Click here to read Executive Summary

 

Project Lead: Eric Butter, Tether

Andre Wang (Math, Statistics), Michael Xue (Computer Science, ECE), and Ryan Culhane (Computer Science) spent ten weeks exploring the role played by emotion in speech-focused machine-learning. The team used a variety of techniques to build emotion recognition pipelines, and incorporated emotion into generated speech during text-to-speech synthesis.

Click here to read the Executive Summary

 

Faculty Leads: Vahid Tarokh, Jie Ding

Project Manager: Enmao Diao

Past Projects

Brooke Erikson (Economics/Computer Science), Alejandro Ortega (Math), and Jade Wu (Computer Science) spent ten weeks developing open-source tools for automatic document categorization, PDF table extraction, and data identification. Their motivating application was provided by Power for All’s Platform for Energy Access Knowledge, and they frequently collaborated with professionals from that organization.

Click here to read the Executive Summary

 

Jake Epstein (Statistics/Economics), Emre Kiziltug (Economics), and Alexander Rubin (Math/Computer Science) spent ten weeks investigating the existence of relative value opportunities in global corporate bond markets. They worked closely with a dataset provided by a leading asset management firm.

Click here for the Executive Summary

Maksym Kosachevskyy (Economics) and Jaehyun Yoo (Statistics/Economics) spent ten weeks understanding temporal patterns in the used construction machinery market and investigating the relationship between these patterns and macroeconomic trends.

They worked closely with a large dataset provided by MachineryTrader.com, and discussed their findings with analytics professionals from a leading asset management firm.

Click here to read the Executive Summary