Data+

Our Virtual Finale on July 31, 2020 was a huge success. If you would like to see the presentations our Data+ 2020 teams gave, please visit our YouTube Channel or visit the individual project pages!

 

Our Data+ 2021 season is open for proposals!

 

Please read our CFP here to submit a proposal for Data+ summer 2021. Proposals are due on November 2, 2021. If you would like help or have questions about your proposal, please contact Paul Bendich (bendich@math.duke.edu)

 

  • I learned there’s much more to it then looking at data. It’s also a way of thinking and organizing what you have analyzed to help others who aren’t able to look at data in such a way to understand it. It’s also a bit of storytelling in a way.

    - Jessica Ho, Math and Neuroscience ‘22
    Predicting Baseball Players’ Athletic Performance Utilizing Baseline Assessments of Vision

  • I didn't really know how data science research applied to social science, but Data+ showed me that it can be a really successful avenue for discovery and change.

    - Nick Datto, Neuroscience, Computer Science, and Cultural Anthropology ‘23
    Race and Housing in Durham over the Course of the 20th Century

  • I’ve learned how interdisciplinary data science is, and how a team of people with many different academic trajectories can work together on the same project, something that I don't think happens very often in other areas.

    - Anonymous

  • What I have discovered is that a majority of data research is about communication. How you interact with your teammates and superiors is just as important, if not more important, than being a genius in your field.

    -  Andrew Scofield, Computer Science ’22, Birmingham-Southern College
    For love of greed: tracing the early history of consumer culture

  • I had expected it to be very analytical, but I was surprised at the creativity that was also required. I enjoyed this aspect a lot.

    - Amber Potter, Computer Science ‘23
    Predicting Baseball Players’ Athletic Performance Utilizing Baseline Assessments of Vision

  • I've gained a lot of valuable insight into the career fields of environmental health and epidemiology. I've also learned a lot about project workflow and how to work through the different phases of a long term project with a team. In addition, my skills in R coding and Tableau have improved a ton.

    - Anonymous

  • My group has been focused on cybersecurity and automation methods to prevent and seek out attackers to keep Duke websites and accounts from being compromised. I have learned a lot about cybersecurity, a field that I otherwise might not have pursued. It has been a very interesting and enlightening experience so far and I am excited to continue learning from the Duke OIT staff.

    - John Taylor, Computer Science ‘21
    Applying Security Orchestration, Automation & Response (SOAR) to security threat hunting with Duke’s ITSO

  • I have gained so much knowledge and confidence! And it is not limited to the area of technology, although I have learned to code in R, navigate PACE, and so much more. I have better discovered the benefit of working with a team and received motivation and mentors by seeing female-identifying students, like myself, succeed. Hearing their success stories via panels or team meetings has given me so much more confidence as a young woman wanting to pursue a career in STEM. I see that it is possible! I also have never worked with data before Data+, but never felt behind in my lack of knowledge as my team is super supportive. They Zoom me outside of the workday and send me resources to help me complete my assignments. I've also realized that I do have an interest in Data Science and feel like I'm making a difference in the world through this program. Knowing that my project (Predicting Blindness in Duke's Glaucoma Patient Population) is going to help so many clinicians, government officials, patients and more is so empowering. It is crazy to think that I am just 19 years old and working on such an advanced project with beyond accomplished students, doctors, and professors, but I'm doing it! Data+ truly has given me the opportunity to expand my knowledge and network in a safe environment. I find these takeaways pretty impressive, especially since it is all remote this year.

    - Sydney Hunt, Engineering ‘23
    Predicting Blindness in Duke’s Glaucoma Patient Population

  • Beyond solid technical machine learning skills, I've received a greater appreciation for data science as a tool to understand everything--from aircraft maintenance to the humanities. Before, I'd never expected that conducting humanities research would teach me how to wield and utilize the most cutting-edge research in machine learning and natural language processing. My team is using new package libraries and research papers written by lead researchers this year to conduct our analysis of ancient texts. In Data+, New meets Old.

    - Albert Sun, Computer Science and Public Policy ‘23
    For love of greed: tracing the early history of consumer culture

  • Working remotely has made coordination much more difficult. However, we really have been embracing GitHub and box to overcome these challenges. I have learned a lot about RNNs and the applications of GRUs and LSTM's and how to implement such layers, in addition to learning how to use pytorch as previously I only used tensor flow.

    - Nathan Warren, MIDS
    Human Activity Recognition using Physiological Data from Wearables

  • As a Biology pre-med, I made the mistake of thinking that coding was irrelevant to me. That changed when I took a biology class where we used R to analyze lab data. That was when I realized that coding (and the problem solving skills that come with it) is invaluable in research. It was difficult at first to jump into Data+, but doing this has benefited me a few ways. Having to learn python on my own, in a very short amount of time, with almost no prior coding experience (I didn't even know what a package was) and quickly turning around and using those skills taught me that I am capable of flexibility and learning on the job. Coding also requires an immense amount of problem solving and independence. Although my mentors are fantastic, it's up to me to figure out where I want to take the project and how I want to do it. Finally, Data+ has been a really invaluable exercise in teamwork. This has been especially challenging with remote learning. However, I still feel like our team has grown very close in working toward a common goal.

    - Ellen Mines, Biology and Philosophy ‘21
    Computational Tools to Improve Healthy and Pleasurable Eating in Young Children

  • My coding skills and machine learning knowledge had a huge leap. I learned how to better work in a team as well.

    - Noah Lanier, Psychology ‘22
    Human Activity Recognition using Physiological Data from Wearables

  • I learned a lot about data science and using code to manipulate data. I learned how to properly use a terminal, deep learning/machine learning, pandas, and many other skills. Also, I gained collaboration skills when it comes to developing code.

    - Pavani Jairam, Physics ‘23
    Finding Space Junk with the World’s Biggest Telescopes

  • I’ve learned to work through the entire process of a data science project, from assembling data sources all the way through presenting our findings. I’ve also developed insight into working in a team with people of different backgrounds and interests, which enabled us to contribute to the project in different ways. I’ve taken various lessons and hard skills that will carry with me into my future academic and professional endeavors.

    - Benjamin Chen, Computer Science, Economics ‘22
    Protecting American Investors? Financial Advice from before the New Deal to the Birth of the Internet

  • Data+ absolutely changed my perception of data science research. Learning data science has been more intuitive than expected. There are also resources all over the Internet in addition to team members that are able to provide assistance when one is facing difficulty with an aspect of a project. Data science is also able to be applied to many more scenarios than I expected; I look forward to continuing data science research in the future.

    - Malik Scott, Global Health ‘22
    Predicting Baseball Players’ Athletic Performance Utilizing Baseline Assessments of Vision

  • I gained concrete skills in R and Tableau, the ability to collaborate in a virtual environment, and a better understanding of what data science actually means. I also got a glimpse into the public health field and got to learn what many different public health careers might actually entail.

    - Anna Zolotor, Undeclared
    Piloting an Environmental Public Health Tracking Tool for North Carolina

  • I have gained a significant amount of knowledge of the cybersecurity industry and attack methods due to the nature of the background research I had to do for my project. In addition, I was able to apply my knowledge of statistical analysis to real data and learn new techniques to arrange data such as time series analysis.

    - Matthew Feder, Computer Science ‘22
    Applying Security Orchestration, Automation & Response (SOAR) to security threat hunting with Duke’s ITSO

  • Since I've never participated in research before, especially not research this independently oriented, the main thing I feel I've gained from this experience is confidence. I feel like I have a much better understanding of my own capabilities, and I honestly feel much less intimidated by the idea of pursuing research, not just in Data Science.

    - Donald Pepka, Math, Political Science, and Creative Writing ‘21
    For love of greed: tracing the early history of consumer culture

  • I learned a number of hard skills in terms of coding languages as well as some soft skills along the lines of working with a team and coordinating with a client.

    - Benjamin Williams, ECE ‘21
    ABOUT-US – A BOundary Update Tool for Utility Services

  • I definitely gained a lot of experience in R and in Tableau, but I also learned a ton about the fields of data science and public health. We had several interviews with community partners that helped me learn a lot about the different types of careers in data science, environmental advocacy, and environmental health.

    - Leah Roffman, Environmental Science ‘23
    Piloting an Environmental Public Health Tracking Tool for North Carolina

  • I learned about team communication and organizational skills, time management, and I think I have a greater appreciation for how socio-cultural analysis from a humanities perspective can work in tandem with STEM based modes of collecting information/data.

    - Luci Jones, Environmental Studies Brown University
    When Black Stories Go Global: Analyzing the Translation of African-American Literature and Film

  • Through the program, I not only developed my technical skills with regards to programming and data visualization, but I also learned a lot more about finance and the intersections of finance and data science. This program really incited my love for programming and problem-solving with data, and has made me even more interested in studying statistical science and data science at Duke. Finally, I learned how to effectively collaborate and communicate with a team in a virtual environment.

    - Helen Chen, Statistics ‘23
    AI in the Investment Office

10
weeks during the summer
2-3
undergraduates per team
1-2
grad student mentors
25
projects sharing ideas and code

Related Videos

Projects

The Air Force’s F-15E Strike Eagle jets have parts that wear down and break, causing unscheduled maintenance events that take away valuable time in the air for critical missions and training. Our team, Limitless Data, is working with Seymour Johnson Air Force Base to mine manually entered maintenance data to visualize and predict aircraft failures. We created a prototype data visualization product that will enable maintainers on the flight line and help them identify and repair critical failures before they happen, keeping jets ready to fly, fight and win.

 

Faculty Lead: Dr. Emma Rasiel

Client Lead: Lt. Devon Burger

Project Manger:  Vignesh Kumaresan

This project aims to improve the computational efficiency of signal operations, e.g., sampling and multiplying signals. We design machine learning-based signal processing modules that use an adaptive sampling strategy and interpolation to generate a good approximation of the exact output. While ensuring a low error level, improvements in computational efficiency can be expected for digital signal processing systems using the implemented self-adjusting modules.

Project Leads: Yi Feng, Vahid Tarokh

 

Click here to view the project team's poster

 

Watch the team's final presentation (on Zoom) here:

 

Mapping History has focused on the categorizing, labelling, digitization, and 3D reconstruction of 16th & 17th century maps & atlases of London and Lisbon. Over the course of the summer, the Mapping History team has developed its own unique analytical dataset by painstakingly labelling every element contained within these maps, used python to digitize this dataset, and, now in the projects final stage, has begun the process of reconstructing these historical perspectives in a 3D game engine.

Project Lead: Philip Stern, Ed Triplett

Project Manager: Sam Horewood

 

View the team's final poster here

Watch the team's final presentation (on Zoom) below:

For our Data+ project, we partnered with Rewriting the Code (RTC), a non-profit organization committed to empowering and fostering a community of college women with a passion for technology. We developed company and industry profiles for the recruitment process that included information ranging from interview and offer rate to negotiation success of salary and benefits. Additionally, we conducted text-analysis of resumes to understand the structure of an optimal tech resume and utilized linear regression to determine the influence of different variables such as ethnicity or college rank.

Project Lead: Sue Harnett

Faculty Lead: Alexandra Cooper

Project Manager: Imari Smith

 

Watch the team's final presentation (on Zoom) here:

Our team used artificial intelligence to help Duke University Management Company (DUMAC) operate more efficiently by building a cost optimization tool and analyzing and visualizing venture capital data. In our first project centered around cost optimization, we designed a Python script that suggests optimal cash transfers between prime brokers. In our second project, we utilized web scraping and Tableau to aid DUMAC in understanding the relationship between company age and length of current investments.

Project Lead: Robert McGrail, DUMAC

Project Manager: Yi Wang

 

Click here to view the team's final summary

 

Watch the team's final presentation (on Zoom) here:

We utilize elements of data science and analysis in order to scour weblogs for potential malicious attacks on Duke’s servers. Additionally, we seek to identify patterns within the data that could be indicative of malicious intent and hope to apply these to real-time data.

Project Leads: Phillip Batton, Nick Tripp

Project Manager: Joao Alberto Capanema Mansur

 

Click here to view the team's project summary slides

 

Watch the team's final presentation (on Zoom) here:

Predictive Churn Models for Duke Season Ticket Holders and Annual Donors is centered around understanding which annual donors are most likely to churn, i.e. not donate the following year. To solve this problem the team built different models to predict the profiles and timing of donor churn. The team made use of Duke Athletics’ internal data supplemented by external data to build predictive machine learning models.

Project Leads: John Haws, Larry Cleaver

Project Manager: Andrew Carr

 

Click here to view the team's final project summary

 

Watch the team's final presentation (on Zoom) here:

Our team members have spent the summer working with the North Carolina Division of Public Health Occupational and Environmental Epidemiology Branch to build a pilot environmental public health data dashboard, with the hope that the pilot tool will be used in DPH’s grant proposal to the CDC for a fully-funded tool. The pilot tool, which is a Tableau dashboard, displays population, health, and environmental data for North Carolina counties and census tracts. The project involved data processing in R, the creation of a detailed metadata table, and building interactive visualizations Tableau.

Project Leads: Mike Dolan Fliss, Kim Gaetz

Project Manager: Melyssa Minto

 

Click here to view the team's final project poster

 

Watch the team's final presentation (on Zoom) here:

Carrying forward the work of a 2019-20 Bass Connections team, our Data+ team has worked to better understand the state of the home mortgage market leading up to the financial crisis. The team has built a more in-depth analysis of North Carolina to understand its different regions. We have also expanded the scope of the analysis developing a quantitative portrait on the state of the mortgage market in Arizona, Florida, Massachusetts, Georgia, and Ohio, creating visualization devices for different mortgage market statistics.

 

Project Lead:  Lee Reiners

 

 

Project Manager: Eric Autry

 

Click here to view the team's project poster

 

 

Watch the project team's final presentation (over Zoom) below:

 

 

Our group aims to reveal the effects of urban and agricultural land use on metabolic productivities of rivers through statistical manipulation and visualization. During this summer, we classified sites and conducted covariate analyses based on patterns of metabolism, and produced reproducible code that can be used by researchers with similar research goals. We hope that our findings would suggest hypotheses of how disruption is caused by land development, and what factors should land planners avoid introducing.

Project Leads: Jim Heffernan, Phil Savoy

 

Click here to view the team's final poster

 

Watch the team's final presentation here:

Our team used years of unanalyzed data in a cloud computing environment to conduct exploratory data analysis using natural language processing techniques, as well as visualizations, for Fleet Management Limited. Through this, and preliminary predictive modelling, we hope to help management decrease the number of preventable incidents as each one costs FML more than $10,000.

 

Faculty Sponsor: Paul Bendich

 

Project Manager: Anil Ganti

 

Client Lead: Shah Irani, Fleet Management Limited

 

Click here to view the proect team's poster

 

Watch the team's final presentation (on Zoom) below:

We trained an object detection model to locate wind turbines in overhead satellite imagery. Because these deep learning models require large amounts of training data, and satellite imagery of wind turbines is rare and expensive to collect, we created synthetic satellite imagery using 3D modeling software. We then supplemented our real-world training dataset with the synthetic imagery and observed changes in performance.

The team created a website covering their work: https://dataplus-2020.github.io/

Project Lead:  Kyle Bradbury

 

Click here to vire the team's final project summary

 

Watch the team's final presentation (on Zoom) here:

Astronomers from the Dark Energy Survey rely on images of deep space to understand the nature of the universe, but these images are often polluted with "space junk": asteroids, comets, satellites, or other objects from our own solar system obstructing the telescope's view. In order to perform their analysis, scientists must first manually identify and mask out such objects from images, a time-consuming process. With leads Michael Troxel, Dan Scolnic, and Chris Walter, we've leveraged deep learning-based computer vision techniques to build models to automatically identify and localize space junk in deep space imagery. 

Project Lead: Dan Scolnic, Michael Troxel, Chris Walter

 

Click here to view the team's final project

In collaboration with Data and Analytics Practice at OIT, our team has completed a series of critical analyses aiding Duke Facilities Management in further optimizing campus energy usage. Data cleaning tools, imputation techniques, and a variety of time series prediction methods ranging from autoregressive models to deep learning networks have been seamlessly integrated into a single interactive forecasting application allowing collaborators to provide accurate and comprehensive utility usage estimates.

Projects Leads: John Haws, Gagandeep Kaur

Project Manager: Billy Carson

 

Click here to view the project team's final poster

 

Watch the team's final presentation (on Zoom) here:

Our team examined the relationship between race and home values across several units of analysis (household, address, HOLC rating area, census block, block group, and tract) in Durham, NC. We combined data from the decennial censuses (1940-2010), American Community Survey (2005-2018), Durham County Register of Deeds (1997-2020), and Durham County Tax Administration (1997-2021). We find that home values are strongly associated with the racial composition of areas, that homes in black neighborhoods are worth less, and that they accumulate less value over time.

Project Leads: William Darity Jr.

Project Manager: Omer Ali

Click here to view the team's final project slides

 

Watch the team's final presentation (on Zoom) here:

This project involves predicting the incidence of blindness in glaucoma patients at Duke Eye Center (DEC) -- specifically, the likelihood of a patient presenting legally blind (i.e. with very advanced disease) at their first visit. We will assemble a novel data set of electronic health records from thousands of DEC glaucoma patients and data from the Durham Neighborhood Compass project, a repository of geospatially resolved socioeconomic statistics on Durham county that includes features like average distance to a healthcare facility. We aim to identify risk factors associated with delayed care for glaucoma in the Durham and wider NC communities.

Project Leads: Samuel Berchuck, Sayan Mukherjee, Felipe Medeiros

Project Manager: Kimberly Roche

 

Click here to view the project team's final poster

 

Watch the team's final project presentation (on Zoom) here:

In light of Duke’s reopening amidst the COVID-19 pandemic, this project aims to track the movement of foot traffic in and around Bryan Center by analyzing Wifi log data from all users connected to wireless networks in the center during February 2020. Our team employed Markov Chains, Kernel Density Estimations, and data analysis and visualization tools such as Python and Tableau to create a map of Wifi access points in Bryan Center and a heatmap that visualizes congestion in different floor areas across time. Our goal is to provide Duke OIT and Student Affairs with valuable information on highly congested areas and frequented paths, directing social distancing measures and suggesting alternative paths that can reduce transmission-risk this coming academic year.

Project Leads: John Haws, Mary Thompson, Eric Hope, Sean Dilda

Project Manager: Hunter Klein

Click here to view the project team's poster

Watch the team's final project presentation (on Zoom) here:

 

This summer, our objective was to take data provided by the Durham County Detention Facility (DCDF), Duke Health, and Lincoln Community Health Center and analyze trends across the local justice system and these health care institutions, specifically in regards to individuals with mental illness. We analyzed the experience of individuals who were incarcerated by looking at their demographic characteristics, emergency department usage, and criminal justice encounters. Using these initial findings, we hope to better understand the relationship between health care utilization and rates of recidivism in Durham County during the school year through a Bass Connections Team.

Project Leads: Nicole Schramm-Sapyta, Maria Tackett

Project Manager: Ruth Wygle

 

Click here to view the team's final project summary

 

Watch the team's final presentation (on Zoom) here:

 

 

The visibility of hate groups such as the Alt-Right became mainstream into contemporary political culture during the Unite the Right Rally in Charlottesville, VA in 2017. This project aims to explore methods to quantify the presence of Latinxs within the Alt-Right, particularly in how they racialize themselves in a space that often spews hate towards Mexicans and other marginalized groups from Latin America. Using data from multiple sources (such as Twitter, Stormfront, and Breitbart), we developed a corpus of tweets, subthreads, and articles, and analyzed this data using basic natural language processing (NLP) techniques.

Project Lead: Cecilia Márquez

Project Manager: Susan Jacobs

 

Click here to view the team's project summary slides

 

Watch the team's final presentation (on Zoom) here:

This project aims to analyze assessment and performance data collected from baseball players to make predictions about baseball performance based on vision and physical abilities. We use hierarchical regression analyses to identify characteristics that correlate with batting performance in order to inform scouts about the likely production of developmental prospects. The final product is an application that uses an athlete's assessment results to produce performance summary graphs for the individual compared to other athletes and inferential models for the relationships between assessments and performance.

Project Leads: Greg Appelbaum, Marc Richard

 

Click here to view the team's project poster

 

Watch the team's final presentation (on Zoom) here:

 

We apply word embedding models to corpora from the start of the Early Modern period, when the market economy began to dramatically expand in England. Word embedding models use neural networks to map vectors to words so that semantic relationships are preserved within the vectors’ geometry. Such models have been successful in understanding cultural trends and stereotypes in large corpora of texts, but these techniques are infrequently used on texts dating much farther back than the 19th century. Using newly developed methods for analyzing word embeddings, we track the development of the meanings of words related to consumerism, including their relationships with gender over time.

 

Project Leads: Astrid Giugni, Jessica Hines

Project Manager: Chris Huebner

 

Click here to view the team's final poster

 

Watch the team's final presentation (on Zoom) here:

The Protecting American Investors project investigates the evolving structure and content of financial advice from the early 20th century to the birth of the Internet. By converting and cleaning thousands of investment advice columns from historical newspapers and magazines, we assembled a large corpus to address our research questions. Through text analysis methods like topic modeling, we have seen how the business cycle affects the nature of advice, the speed in which different financial innovations were integrated, and how advice differs among various targeted social groups.

Project Lead: Ed Balleisen

 

Click here to view the team's final project summary slides

 

Watch the team's final presentation (on Zoom) here:

 

 

Duke’s enrollment data over the past 50 years speaks volumes about the evolution of the University. Using physical scans courtesy of the Duke University Archives, we manipulate decades of demographic and geospatial data across all Duke schools to create an interactive application. Giving users control over how they want to dive into the data, we hope to both illuminate questions and inspire further research into the composition of Duke’s student body--highlighting the history, origins, and people that have helped make it what it is today.

Project Leads: Don Taylor, Valerie Gillispie

Project Manager: Anna Holleman

 

Watch the project team's final presentation (on Zoom) here:

 

Between 1935 and 1945, rural electricity access shot up from roughly 10% to 90%. During this time, the Rural Electrification Administration funded an Electric Farm Equipment (EFE) Roadshow as part of its mission to expand electricity access and demand. Digitizing massive amounts of archival data, our team has sought to quantify the effect of the EFE Roadshow on the larger trend of growing residential electricity consumption in rural U.S. towns from 1938 to 1945. We hope that understanding this crucial chapter in our own history will help inform present-day electrification efforts in the developing world.

 

Project Leads: Victoria Plutshack, Jonathon Free, Robert Fetter

 

View the team's final project summary here

 

Watch the team's final presentation (on Zoom) here:

The Disease Emergence and Richness in Primates team uses existing databases to quantify parasite richness across primates and to identify ecological predictors of parasitism. By integrating phylogenetic generalized least squares regression and network based approaches, the team ultimately aims to predict missing interactions between primates and parasites, which, combined with exploring ecological predictors, will provide better capabilities for identifying emerging infectious diseases in humans.

 

Project Leads: Jim Moody, Charles Nunn

Project Manager: Marie Claire Chelini

 

Click here to view the team's final project summary

We all need water to survive but how many of us really know where our water comes from? Team 3 has created a functioning website from scratch to give consumers more easily accessible information about who provides their water and how supply looks in relation to the past 30 years.

Project Leads: Megan Mullin, Lauren Patterson

 

Watch the team's final presentation (on Zoom) here:

Project Manager: Kyle Onda

Our project is about building a food recommendation system for Avoidant/Reactive Intake Disorder (ARFID) patients and understanding the relationship between ARFID and clinical variables. Our stakeholders include young picky eaters and their parents, as well as clinicians who work with ARFID patients. We created an interactive visualization for ARFID patients to encourage them to explore different foods and also built visualization to represent the relationship between ARFID and clinical variables.

Project Leads: Guillermo Sapiro, Nancy Zucker

Project Manager: Julia Nichols

 

Interact with the team's visualization tool here: http://foodrecbucket.s3-website-us-west-1.amazonaws.com/

 

View the team's final project presentation slides here

 

Watch the team's final presentation below:

Led by Dr. Eva Wheeler, this project considers how racial language in African American literature and film is rendered for international audiences and traces the spread of these translations. To address the study’s primary questions, the team analyzed a preliminary dataset and explored the relationship between translation strategy and different categories of racial language. The team also conducted a macro-level analysis of the linguistic, temporal, and geographic spread of African American stories using the IMDB and WorldCat databases. We have found a large amount of variation in how African American stories are rendered, which can in part be explained through a social scientific lens.

 

Project Lead: Eva Wheeler

 

Project Manager: Bernard Coles

 

Click here to view the team's project poster

 

Watch the team's final presentation (in Zoom) here:

Traditional Human Activity Recognition (HAR) utilizes accelerometry (movement) data to classify activities. This summer, Team #4 examined using physiological sensors to improve HAR accuracy and generalizability. The team developed ML models that are going to be available open source in the Digital Biomarker Discovery Pipeline (DBDP) to enable other researchers and clinicians to make useful insights in the field of HAR.

 

Project Lead: Jessilyn Dunn

Project Manager: Brinnae Brent

Click here to view the project team's project poster

Watch the team's final presentation (on Zoom) below:

 

Past Projects

Social and environmental contexts are increasingly recognized as factors that impact health outcomes of patients. This team will have the opportunity to collaborate directly with clinicians and medical data in a real-world setting. They will examine the association between social determinants with risk prediction for hospital admissions, and to assess whether social determinants bias that risk in a systematic way. Applied methods will include machine learning, risk prediction, and assessment of bias. This Data+ project is sponsored by the Forge, Duke's center for actionable data science.

Project Leads: Shelly Rusincovitch, Ricardo Henao, Azalea Kim

Project Manager: Austin Talbot

Aaron Chai (Computer Sciece, Math) and Victoria Worsham (Economics, Math) spent ten weeks building tools to understand characteristics of successful oil and gas licenses in the North Sea. The team used data-scraping, merging, and OCR method to create a dataset containing license information and work obligations, and they also produced ArcGIS visualizations of license and well locations. They had the chance to consult frequently with analytics professionals at ExxonMobil.

Click here to read the Executive Summary

 

Project Lead: Kyle Bradbury

Project Manager: Artem Streltsov

Yueru Li (Math) and Jiacheng Fan (Economics, Finance) spent ten weeks investigating abnormal behavior by companies bidding for oil and gas rights in the Gulf of Mexico. Working with data provided by the Bureau of Ocean Energy Management and ExxonMobil, the team used outlier detection methods to automate the flagging of abnormal behavior, and then used statistical methods to examine various factors that might predict such behavior. They had the chance to consult frequently with analytics professionals at ExxonMobil.

 

Click here to read the Executive Summary

 

Project Lead: Kyle Bradbury

Project Manager: Hyeongyul Roh