• “I feel I have a better understanding of how to communicate my work to different groups of people. We met with stakeholders, other Data+ teams, and people from the University, all with different levels of technical knowledge, and this really allowed me to be adaptable in how I present what I've worked on. Before, I really thought "data scientist" described what I wanted to be. Now, I have a better understanding of the things I like and dislike. Working extensively on modeling this summer has made me aware that I really enjoy project management on real world projects, which is great insight to have so early in my undergraduate career.”
    Meredith Brown
    Getting Granular on Social Determinants of Health

  • “I have gained experience in applying my classroom knowledge of data science to real world problems. I have made connections that could help further my future data science careers. I have also received offers and opportunities to continue working on the Data+ project and other related projects in the next coming semesters. At first, I thought data science and statistics were basically the same thing. I now realize how insanely wrong I was. Data science is this incredible combination of statistics, computer science, communications, and any other hobby or interest you so choose. I never would have realized the infinite ways data science could be applied, and I am so thankful that this program has changed my perception of Data Science research.”
    —Maria Henriquez
    Optimizing Risk Assessment for Duke University Student Athlete Injury Prevention

  • “I’ve gained technical/coding skills, tangible experience with real big data, communication and public speaking skills, patience, and flexibility. There is way more problem-solving, dirty data, and things you have to account for than I thought in data science. Also, nothing is straightforward. Anything "shown" through data has used specific algorithms to get there.”
    —Brooke Scheinberg
    Recidivism in Durham County Jails

  • “Data+ enabled me to hone my research, communication, and presentation skills. I never fully understood the data science discipline until this experience.”
    —Victoria Worsham
    Investigating Oil and Gas Production in the United Kingdom

  • “As a second time participant in Data+, I gained experience in a different type of data science project. Last summer, my team started with dirty data which we cleaned and performed NLP/machine learning classification on. This summer, my team had to web scrape and make API requests to collect a significant portion of data. Collecting and cleaning data enabled me to think more about how to obtain and organize data in a way that is convenient for the next person using it. Building a dashboard to visualize trends in the data introduced me to iterative dashboard design, in which a series of client-specific changes are made after the general structure and functionality of the dashboard is built. For example, changing the user input bar from a sidebar to a horizontal bar displayed input options more clearly and made more user-friendly for our client."
    Cathy Lee
    Breaking the Bundle

  • “The exposure to lots of different methods and applications across data science has been very eye-opening. I've also gained perspective on the kind of skills, both "soft" and "hard,” that are useful both in academic environments and the real world. I didn't realize how much of data science was learning and researching topics on the fly. This is encouraging, because it means that if you can develop this skill of being able to quickly learn a topic/language/method, you're rarely under-qualified for a job or position.”
    Alex Bendeck
    Basketball Analytics Pipeline

  • “I gained a passion for sports analytics and learned important ML concepts. You really can apply data science to any field you like.”
    Anshul Shah
    Basketball Analytics Pipeline

  • “I feel like I gained a lot of hard skills related to coding in different languages like Python and R. I'm really happy I had a chance to dive into some more complex machine learning problems like computer vision. Most importantly, I feel like I've gained enough confidence in my data science skills to pursue a related career after college.
    Niyaz Nurbhasha
    Basketball Analytics Pipeline

  • “The most important thing I've gained this summer is a sense of understanding of what it's like to work for a client and produce deliverables on a timeline. The biggest takeaway I've gotten about data science is how much data is missing in the world, and how hard it can be to find. There are a lot of really beautiful, well formatted datasets out there, but sometimes, there just isn't a way to get what you're looking for--and that's often where you come in. There's a lot of time spent putting together these datasets in a meaningful way, and figuring out how to format them in the most helpful way for future work, especially if you're not going to be the ones using them. It's a lot like commenting code, but tagging your columns is really important to ensure that everyone going forward knows how to work with what you've put together.”
    Cassandra Turk
    Identifying Extreme Events in Wholesale Energy Markets

  • “I now understand there are many ways to be useful in a collaborative environment. As the only non-technical major on my team, I first believed I couldn't bring as much to the table, but I found there is immense utility in being the person with the least technical experience. Not only was I able to perceive problems in a different way, offering fresh solutions, but I was also able to gain much more knowledge from my teammates. Additionally, I was able to incorporate my sociological understanding of people into the design aspect of our project to create a truly useful user experience. No matter your background, technical or non-technical, you always have something important to share and to learn.”
    Elizabeth Loschiavo
    Big Data for Reproductive Health (Year 2)

  • “I think that Data+ has taught me a lot about how to create something really cool out of nothing. We had to change our project goals because of technical issues and holdups, and I think I learned a lot about how to go from just raw data and ask really interesting questions.”
    Aidan Fitzsimons
    Recidivism in Durham County Jails


  • “I gained valuable experience with working both with data, something I had never had the chance to do before, but perhaps more importantly I gained insight into working with a small team alongside a project manager.”
    Dennis Harssch​​​​​​​
    Big Data for Reproductive Health (Year 2)

  • “I didn't know Data Science research was a thing at the undergraduate level, so it showed me some possibilities. I have gained a better understanding of machine learning topics and when to apply different methods. I've also learned a lot looking at results from our models and pulling meaning out of them–even if that meaning is just that there was a bug and we need to redo it differently.”
    Varun Nair
    Deep Learning and Energy Access Decisions

  • “Data+ in many ways widened my perception of what it means to participate in Data Science research. My particular project was pretty in line with my perception before coming to Data+ but many of the other projects looked very different than I imagined a Data Science research project looking in terms of the type of data they were using and the clients.”
    Ryan Culhane
    Speech Emotion Analysis

  • “I feel I have gained a much greater understanding of how to approach problems computationally as well as working on a small team in order to solve/fix these problems! It has been a great experience overall, and I wish I could do it again. Working with the stakeholders we had was great, because they were also so involved and enthusiastic. Our project manager couldn't have been better either. I had a great summer with Data+! It was honestly more exciting and fun than I realized, and participating in Data+ helped me see that I could do computational work in the future and be happy doing it.”
    Jake Sumner
    Athletic Injury Risk Assessment

  • “This summer of Data+ showed me that data science research is also interested in intensely analyzing and understanding outliers of a dataset, rather than just the averages. Many probability and statistics classes emphasize the mean, variation, and overall distribution of a dataset without paying too much attention to the outliers and asking if those outliers can really be explained by the same set of processes affecting 99%+ of the other data points. This summer of Data+ showed me that data science research is also interested in intensely analyzing and understanding these outliers.”
    —Alec Ashforth
    Identifying Extreme Events in Wholesale Energy Markets

  • “Having no prior exposure, outside of a statistics course, to data science and data visualization, I feel like I got a lot out of this program. At least for me, there was a little bit of "drinking from a firehose,” and a steep learning curve but I learned a lot very quickly. The data visualization programs with which I worked are definitely ones I could see myself using in the future.”
    —Anonymous student
    Remembering the Middle Passage

  • “I have gained extensive knowledge about all the human rights conflicts in the world and also just a general understanding of many different facts of data science. I realized that I am not as interested in research as a profession, but I did learn that the digital humanities are a very legitimate field.”
    Anonymous student

  • “I have gained some knowledge in R as well as great teamwork skills.”
    Marco Gonzalez Blancas
    Duke Building Energy Use Report

  • “I feel I have gained a lot of technical skills regarding natural language processing which I really wanted to experience. I also got to experience what it was like working on a research team and attempting to solve a problem in that setting, which I found really valuable. I realized that data science really is a part of every field and it is so broad in terms of scope and what kind of work you will be doing.”
    Nikhil Kaul
    Invisible Adaptations

  • “I went from fumbling through CS101 my freshman fall to playing around with the hyper parameters of machine learning algorithms! That's a success in my book. I learned that data science projects take time and many little failures before you can arrive at anything close to the outcome that you'd like. I learned that, seemingly counterintuitively, I can't rely on other people to teach me things I can learn on my own, but there exists a whole community of people in data science who want to help me learn and grow. I feel more confident tackling problems that have no clear solution, and I've found a new appreciation for API documentations. I'm so thankful that I had the opportunity to spend my summer with Data+!”
    Micalyn Struble
    Neuroscience in the Courtroom

  • “I can code significantly faster now. I have also been through a truncate version of the problem solving process in the real world.”
    —Ellis Ackerman 
    Durham Evictions

  • “I have gotten a better look at where humanities can be useful and frankly, necessary, in implementing new methodologies and approaches to the data science field.”

  • “I have learned a lot about JavaScript and Google Earth Engine and issuing remote sensing data to see what changes are happening at Alligator River National Wildlife Refuge and how they are happening. I have also learned that working on a project like this is a great way to learn and make something at the same time.”
    Katelyn Chang
    Saltwater Intrusion on Coastal Ecosystems

  • “I’ve learned many technical tools for data analysis as well as improved my abilities as working part of a team. It was a unique experience working in data science research because this type of work isn’t offered in the lab portion of many classes. It’s an incredible opportunity to engage in data science research as an undergraduate student, especially when the decision making process is mostly up to you and your team.”
    Jeevan Tewari
    Human Rights in the Postwar World

  • “Data+ helped me learn a lot about how to create a machine learning pipeline in order to discover meaningful relationships in data. It also gave me a broader view of data science and career opportunities. I also learned that data science research involves a lot of preprocessing and simply turning the data into a form that a model can digest.”
    Michael Xue
    Speech Emotion Analysis

  • “I've learned so much about data science and I now feel very comfortable coding in several languages. There were multiple times during the summer where I was asked to do something and had absolutely no clue what I was doing, but Data+ gave me the skills and confidence to solve problems beyond what I thought was the limit of my knowledge. (Ultimately, I became really good at googling how to code!). I learned the importance of communication. It's very easy to spend all day coding, but if you're not able to communicate with the public on what you are doing, the code is meaningless.”

  • “I have gained a more realistic understanding of what working in a data scientist role looks like. I have learned a lot more than I was expecting to and cemented the skills that I had learned in class. I came into the summer unsure of what area of technology I wanted to pursue as a career, but now I am more confident of my role within data science. I have further learned what it looks like to work within a team, with a manager, and with other coworkers.”
    Jackson Hubbard
    Basketball Analytics Pipeline

weeks during the summer
undergraduates per team
grad student mentors
projects sharing ideas and code

Related Videos


A team of students that worked together for a semester in the Mission Driven Startups class will obtain and analyze data to create a predictive maintenance model for F15-E Fighter Jets from Seymour Johnson Air Base. Using data provided by the Base, the Data+ team will evaluate the relationship between unscheduled maintenance and external factors such as weather, sortie hours between repairs, and failure frequency of aircraft components. These findings will then feed into a predictive maintenance model to enhance the Air Force Crew’s ability to anticipate maintenance needs, helping to minimize unscheduled aircraft downtime. 


Faculty Lead: Dr. Emma Rasiel

Client Lead: Lt. Devon Burger

Project Manger:  Vignesh Kumaresan

A team of students, led by Electrical and Computer Engineering professor Vahid Tarokh, will develop methods to improve the efficiency of information processing with adaptive decisions according to the structure of new incoming data. Students will have the opportunity to explore data-driven adaptive strategies based on neural networks and statistical learning models, investigate trade-offs between error threshold and computational complexity for various fundamental operations, and implement software prototypes. The outcome of this project can potentially speed up many systems and networks involving data sensing, acquisition, and computation.

Project Leads: Yi Feng, Vahid Tarokh

A team of students will explore new ways of reading pre-modern maps and perspectival views through image tagging, annotation and 3D modeling. Each student will build a typology of icons found in these early maps (for example, houses, churches, roads, rivers, etc.). By extracting, modeling, and cataloging these features, the team will create a library of 2D and 3D objects that will be used to (a) identify patterns in how space and power are represented across these maps, and (b) to create a model for “experiencing” these maps in 3D, using the Unity game engine platform. This is a combined Data+ / Bass Connections project that will instruct students in qualitative and quantitative mapping techniques, basic 3D modeling and the history of cartography.

Project Lead: Philip Stern, Ed Triplett

Project Manager: Sam Horewood

A team of students will explore ways in which data science can help support the mission of Rewriting the Code, a national non-profit organization dedicated to empowering a community of college women with a passion for technology.

In particular, students will perform statistical analyzes of past survey data, build out interactive dashboards that help visualize trends in student experience, and help design future survey questions.

Project Lead: Sue Harnett

Faculty Lead: Alexandra Cooper

Project Manager: Imari Smith

A team of students will explore how artificial intelligence tools can be used to support the investment office at the Duke University Management Company (DUMAC).

In particular, the team will investigate natural language processing and other AI methods for supporting the legal review process, investment analysis, and financial reporting.

Project Lead: Robert McGrail, DUMAC

Project Manager: Yi Wang

Over the past several months, Duke's Information Technology Security Office (ITSO) has begun applying the MITRE ATT&CK framework as a basis for how the team collects, assesses, identifies and responds to attacker tactics, techniques, and procedures (TTPs). As the team rolls out new processes to "hunt" for attackers, a model that transitions the team's primary functions from defensive/reactive to offensive/proactive, the team will need to incorporate real time and longitudinal data analytics as well as incorporate automated responses based on these data analyses.  This orchestration of the various tools and analysis of data will facilitate the automation of responses to attacker incursions.  Given the amount of data, and speed needed to respond, application of machine learning techniques will be a necessary component.

Project Leads: Phillip Batton, Nick Tripp

Project Manager: Joao Alberto Capanema Mansur

Duke season ticket holders are both strategically and financially important to Duke Athletics. One of the major challenges in retaining season ticket holders is understanding which are most likely to churn, i.e. not renew their tickets. A team of students, in conjunction with Duke’s Office of Information Technology and Duke Athletics, will make use of data from Duke’s ticketing system, to build a set of models that seeks to predict the profiles and timing of non-renewal of season ticket holders and annual donors.

Project Leads: John Haws, Larry Cleaver

Project Manager: Andrew Carr

The natural and built environment can both promote and harm the public’s health. Some states have created interactive web-portals to help visualize how health and environmental measures relate…North Carolina wants to be next! The Data+ student team, led by epidemiologist Mike Dolan Fliss and colleagues from the NC Division of Public Health (DPH), will build a pilot Environmental Public Health Tracking (EPHT) tool for NC. Students will analyze and combine spatial health, environmental, and point-source data from NC DPH and other partners, then co-design and prototype visual dashboards for public use.

Project Leads: Mike Dolan Fliss, Kim Gaetz

Project Manager: Melyssa Minto

A team of students, led by researches in the Global Financial Markets Center at Duke Law, will carry forward the work of a 2019-20 Bass Connections team to better understand the state of the home mortgage market leading up to the financial crisis. The Data+ team will expand the scope of their analysis outside North Carolina and begin the process of developing a complete quantitative portrait on the state of the mortgage market in Sun Belt states. Following the work done this year, the Data+ team would be largely responsible for creating visualization devices to visualize at the census tract level different mortgage market statistics for the entire US based on the NC version created this year. Additionally, a model would be created to identify whether a loan is predatory or not. The output for this project will be displayed on a comprehensive website that is currently being constructed by the Bass Connections team.

Project Lead:  Lee Reiners

Project Manager: Eric Autry

A team of students led by researchers at the Duke River Center will develop tools to link water quality and aquatic ecosystem condition to urban and other land uses by combining existing geospatial data including land cover maps, LiDAR, and remotely-sensed images with time series of estimates of ecosystem metabolism found within the StreamPULSE data portal.  Students will develop clustering tools for rapid identification of land use and other gradients that minimize confounding factors, and then will compare metabolic time series along these gradients to identify connections between catchment attributes and the seasonal and stochastic components of ecosystem function.  This work will help Duke researchers determine thresholds of land use (or other catchment characteristics) that protect aquatic ecosystem condition and will also generate generalizable workflows and data infrastructure that supports the scientific community’s use of our open science data portal.

Project Leads: Jim Heffernan, Phil Savoy

A team of students will analyze sensor data from a shipping fleet to develop predictive models to prevent mechanical failures from happening at sea and optimize the best time for replacement. They will have the opportunity to collaborate closely with analytics professionals from Fleet Management Limited, the world’s third-largest ship management company looking after 520+ vessels on behalf of owners.

Faculty Sponsor: Paul Bendich

Project Manager: Anil Ganti

Client Lead: Shah Irani, Fleet Management Limited

A team of students led by researchers in the Energy Data Analytics Lab, Electrical & Computer Engineering, and with participation from the Energy Access Project will investigate how to use synthetically-generated satellite imagery to improve the identification of energy infrastructure in satellite imagery. The detected energy infrastructure will fill outstanding data gaps in the ability to identify pathways for electrification in low-income countries. The team will build the foundation for research that can identify objects that appear relatively rarely in satellite imagery and accomplish this using very limited training examples by creating realistic synthetic 3D models of those rare objects.  This would greatly scale up the applicability of computer vision techniques for energy object identification in overhead imagery.

Project Lead:  Kyle Bradbury

A team of students led by Physics professors Dan Scolnic, Michael Troxel and Chris Walter will build their own algorithms to use images taken as part of The Dark Energy Survey, one of the largest cosmological surveys, to learn more about all the things we find in space that we aren’t looking for. These can be anything from image artifacts, to cosmic ray hits, to satellite trails to Elon Musk's car (see picture). Each of these different things has their own signatures on the images, and automatic detection and identification algorithms would enable improved image processing. As surveys attempt to measure increasingly difficult and subtle features of the universe, like the imprint of dark energy and dark matter, identification of any kind of artifact will be critical.

Project Lead: Dan Scolnic, Michael Troxel, Chris Walter

A team of students led by the Data and Analytics Practice at OIT will develop a robust forecasting model for predicting energy usage for different facilities on campus. Students will explore a wide range of real-world time-series data challenges from anomaly detection as well as handling, to benchmarking traditional statistical and modern machine learning models for forecasting. Students will also gain valuable experience developing an interactive application with latest open source libraries converting Jupyter notebooks into web applications to facilitate effective stakeholder collaboration. This work will enable several critical analyses for Duke Facilities Management to optimize their operations and significantly reduce costs.

Projects Leads: John Haws, Gagandeep Kaur

Project Manager: Billy Carson

A team of students led by professor of Public Policy William Darity Jr. will chart the evolution of racial inequality in housing in a subset of Durham’s neighborhoods over the course of the 20th century, using census data and Durham County housing records. Students will select a sample of homes from those that appear in de-anonymized decennial censuses between 1920 and 1940, noting homeowner race and reported home value. Tenure (time since last sale), assessed home values and occupancy will be collected from county records for the period between 1940 and 2018. The set of homes will be selected to include a range of neighborhoods that vary in racial composition, zoning designation, and credit riskiness as determined by HOLC’s residential security (redlining) maps. The proposed approach allows the Data+ team to document racial differences in the evolution of home values, tenure and occupancy across neighborhoods.

Project Leads: William Darity Jr.

Project Manager: Omer Ali

A team of students led by researchers in the Duke Eye Center and Department of Statistical Science will develop statistical models to assess the risk of legal blindness in glaucoma patients using electronic health records (EHR) from Duke Health. Students will focus on identifying risk factors relevant locally to the Durham county patient population and will enrich the available EHR data with detailed social and environmental data using the Durham Neighborhood Compass. A priority of the research will be to develop an app to make the prediction model accessible, so that real-time decisions about medical care related to blindness can be made. For the greatest impact, the app will be created in close collaboration with clinicians and decision makers at Duke Health.

Project Leads: Samuel Berchuck, Sayan Mukherjee, Felipe Medeiros

Project Manager: Kimberly Roche

A team of students led by data scientists and engineers from the Office of Information Technology will work to visualize foot traffic patterns in the Bryan Center. Students will be given a large dataset consisting of wifi data, which they will analyze to gain insight into usage patterns of the Bryan Center over various time periods. The work will help to identify areas of the center that experience high wear and tear, particularly during high-volume events such as basketball games.

Project Leads: John Haws, Mary Thompson, Eric Hope, Sean Dilda

Project Manager: Hunter Klein

Mental Illness is over-represented in the incarcerated population, and is correlated with higher rates of re-arrest.  In recent years, Durham County has taken many steps to break this unfortunate cycle, including helping incarcerated people to engage with mental health treatment resources.  This team will work with collaborators at the Durham County Detention Facility, the Criminal Justice Resource Center, and the Duke Health System to determine if recently-incarcerated people in Durham are using the resources available to them, and if outcomes are improving.  The team will use descriptive statistics and construct statistical models, and welcomes students from all majors, especially those interested in mental health and policy.  This team is a combined Data +/Bass Connections project, so students will be expected to commit to the project for Summer 2020 as well as academic year 2020-2021. 

Project Leads: Nicole Schramm-Sapyta, Maria Tackett

Project Manager: Ruth Wygle

A team of students led by History Professor Cecilia Márquez will use census data to understand the long history of Latinxs in the U.S. South. Despite a growing focus of historians and social scientists on the historical and contemporary Latinx South, there has not yet been a thorough data analysis of the historical presence of Latinxs in the South. The Data+ team will search the U.S. Federal Census, immigration records, and marriage records to determine the location of Latinxs in the U.S. South over the course of the late nineteenth and early twentieth centuries. This work will provide an invaluable data set to help us understand the long southern history of Latinxs. 

Project Lead: Cecilia Márquez

Project Manager: Susan Jacobs

A team of students led by researchers from the Duke Human Performance Optimization Lab (OptiLab) and the Michael W. Krzyzewski Human Performance Laboratory (K-Lab) will develop an analytic and report generating application to test if baseline vision and movement screening measures are able to predict on-field baseball performance in a cohort of nearly 300 athletes who participated in the USA Baseball Prospect Development Pipeline (PDP).  Using machine learning and Bayesian hierarchical modeling, students will test data provided by USA baseball to identify relationships between baseline characteristics and performance in NCAA sanctioned and collegiate summer league games during the 2018 and 2019 seasons. The final deliverable will be both a report of the findings, and an analytic toolset that can be used within the PDP to provide direct feedback to the athletes about their future performance potential immediately following testing. As such, this program will provide valuable new information about the characteristics that predict successful athletic performance in demanding situations, and could be used to develop new approaches for talent identification within and beyond baseball. 

Project Leads: Greg Appelbaum, Marc Richard


Are the concepts of a “consumer” and of a “consumer society” modern ideas? Is greed good, as Michael Douglas’s Gordon Gekko in the 1987 movie Wall Street claimed, or is it a destructive sin?

A team of students led by Dr. Astrid Giugni (Duke, English and ISS) and Dr. Jessica Hines (Brimingham-Southern College, English) will address the question of how to trace concepts that slowly developed alongside changing economic and social realities.


We will track a set of related terms (such as consumer, greed, speculation, profit) in order to begin assessing how the ethical, political, and economic language of goods-consumption changed around the Protestant Reformation and the rise of the market economy. Using large databases-- EEBO (Proquest), ECCO (Gale), HathiTrust, and TEAMS (University of Rochester)—that contain scans and machine-readable Medieval and Early Modern texts, the group will track and analyze pamphlets, sermons, satires, and images to understand how the ethical discourse of consumerism changed over time.  

Project Leads: Astrid Giugni, Jessica Hines

Project Manager: Chris Huebner

The promoters for modern American capitalism have long encouraged individuals, including those of modest means, to build their wealth through investments.  But how have ordinary investors learned about the opportunities and risks of putting their savings to work on Wall Street?  A team of students working with History professor Ed Balleisen will delve into the evolving nature of investment advice from the early twentieth-century up to the start of the internet age.  Creating datasets from financial advice columns in large circulation American newspapers and magazines, they will use text mining techniques and sentiment analysis to see how advice changed in response to the business cycle, the emergence of new types of investments, financial products, and investors, and the evolution of financial regulation.  This is a chance to link data science to historical analysis of a key facet of finance capitalism.

Project Lead: Ed Balleisen


A team of students, led by University Archivist Valerie Gillispie and Professor Don Taylor, will take a closer look at how the student body at Duke has transformed into a coeducational student body from around the world enrolled in ten different schools. Students will seek to transform digital and historical data into a dynamic visual display which allows viewers to examine changes in the student body over time in terms of three dimensions: geographic origin, gender, and school. The students will use born-digital data along with historical, paper-based data to assemble a data corpus. The goal is to demonstrate trends and changes over time in terms of where Duke students have come from, identifying statistically significant shifts and patterns that warrant further study.

Project Leads: Don Taylor, Valerie Gillispie

Project Manager: Anna Holleman

A team of students led by researchers in the Energy Initiative and the Energy Access Project will explore historical data on the U.S. Electric Farm Equipment (EFE) demonstration show that ran between 1939 and 1941, which aimed to increase usage of electricity in rural areas. Students will compile data collected by the Rural Electrification Agency into a machine-readable form, and then use that data to explore and visualize the EFE’s impact. If time allows, they will then compare data from the EFE and a related, smaller-scale project from 1923 (“Red Wing Project”) to current data on appliance promotion programs in villages in East Africa that have recently gained access to electricity. The outcomes of this analysis would offer evidence on the successes and limitations of these types of programs, and the relevance of the historical U.S. case to countries that are currently facing similar challenges.

Project Leads: Victoria Plutshack, Jonathon Free, Robert Fetter

A team of students led by the Nunn lab and its collaborators will investigate the ecological and behavioral factors that determine parasitism in different species of primates. Based on publicly available data and evolutionary trees, students will investigate parasitism by developing a network of primate-parasite relationships. This network will then be used to infer the ecological and behavioral characteristics that best predict parasitism. The findings are relevant to identifying emerging infectious diseases in humans, and also for conservation efforts globally.

Project Leads: Jim Moody, Charles Nunn

Project Manager: Marie Claire Chelini

A team of students led by researchers from the Internet of Water project at the Nicholas Institute will develop an online tool that allows local water systems to update and verify their service boundaries while maintaining data security and functionality for state regulators. States oversee hundreds of water systems with system service areas and boundaries that change over time. An online tool enabling water system managers to update their service areas would enable an improved, time-saving process for creating and maintaining up-to-date water system boundaries. Students will have the opportunity to interact with state regulators and water system managers in North Carolina and California who will provide feedback on design and usability. This tool will improve system boundary data that are used for planning and decision-making purposes. Additionally, the tool may include functionality for basic spatial analyses such as overlaying boundaries on sociodemographic, economic, and environmental data. This would enable impact analyses, the identification of utilities and vulnerable populations affected by environmental hazards to water systems, and multi-system regional water supply projections.

Project Leads: Megan Mullin, Lauren Patterson

Project Manager: Kyle Onda

A team of students led by eating disorders expert Nancy Zucker and engineering professor Guillermo Sapiro will develop multimodal computational tools to help improve the nutritional status and food enjoyment of young children with Avoidant/Restrictive Food Intake Disorder (ARFID), children who are not eating enough food or are eating an inadequate variety of food to the degree that it impairs functioning. Students will analyze facial affect and behavior from videos of children trying new foods and will derive sensory profiles based on children’s patterns of food acceptance. These analyses will serve as the basis for personalized recommendations for parents that will suggest actionable next steps to increase their child’s food acceptance.

Project Leads: Guillermo Sapiro, Nancy Zucker

Project Manager: Julia Nichols

A team of students led by Humanities Unbounded Fellow Eva Michelle Wheeler will explore how culturally-bound language in African-American literature and film is rendered for international audiences and will map where and into which languages these translations are occurring. Students will use a reference dataset to build and annotate a translation corpus, explore the lexical choices and translation strategies employed by translators, and conduct a macro-level analysis of the geographic and linguistic spread of these types of translations. The results of this project will bring a quantitative dimension to what has largely been a qualitative analysis and will contribute to ongoing academic conversations about language, race, and globalization.  

Project Lead: Eva Wheeler

Project Manager: Bernard Coles

Human activity recognition (HAR) is a rapidly expanding field with a variety of applications from biometric authentication to developing home-based rehabilitation for people suffering from traumatic brain injuries. While HAR is traditionally performed using accelerometry data, a team of students led by researchers in the BIG IDEAS Lab will explore HAR with physiological data from wrist wearables. Using deep learning methods, students will extract features from wearable sensor data to classify human activity. The student team will develop a reproducible machine learning model that will be integrated into the Big Ideas Lab Digital Biomarker Discovery Pipeline (DBDP), which is a source of code for researchers and clinicians developing digital biomarkers from wearable sensors and mobile health technologies.

Project Lead: Jessilyn Dunn

Project Manager: Brinnae Brent

Disciplines involved: Health, Biology, Biomedical Engineering

Past Projects

Social and environmental contexts are increasingly recognized as factors that impact health outcomes of patients. This team will have the opportunity to collaborate directly with clinicians and medical data in a real-world setting. They will examine the association between social determinants with risk prediction for hospital admissions, and to assess whether social determinants bias that risk in a systematic way. Applied methods will include machine learning, risk prediction, and assessment of bias. This Data+ project is sponsored by the Forge, Duke's center for actionable data science.

Project Leads: Shelly Rusincovitch, Ricardo Henao, Azalea Kim

Project Manager: Austin Talbot

Aaron Chai (Computer Sciece, Math) and Victoria Worsham (Economics, Math) spent ten weeks building tools to understand characteristics of successful oil and gas licenses in the North Sea. The team used data-scraping, merging, and OCR method to create a dataset containing license information and work obligations, and they also produced ArcGIS visualizations of license and well locations. They had the chance to consult frequently with analytics professionals at ExxonMobil.

Click here to read the Executive Summary


Project Lead: Kyle Bradbury

Project Manager: Artem Streltsov

Yueru Li (Math) and Jiacheng Fan (Economics, Finance) spent ten weeks investigating abnormal behavior by companies bidding for oil and gas rights in the Gulf of Mexico. Working with data provided by the Bureau of Ocean Energy Management and ExxonMobil, the team used outlier detection methods to automate the flagging of abnormal behavior, and then used statistical methods to examine various factors that might predict such behavior. They had the chance to consult frequently with analytics professionals at ExxonMobil.


Click here to read the Executive Summary


Project Lead: Kyle Bradbury

Project Manager: Hyeongyul Roh