The Air Force’s F-15E Strike Eagle jets have parts that wear down and break, causing unscheduled maintenance events that take away valuable time in the air for critical missions and training. Our team, Limitless Data, is working with Seymour Johnson Air Force Base to mine manually entered maintenance data to visualize and predict aircraft failures. We created a prototype data visualization product that will enable maintainers on the flight line and help them identify and repair critical failures before they happen, keeping jets ready to fly, fight and win.
This project aims to improve the computational efficiency of signal operations, e.g., sampling and multiplying signals. We design machine learning-based signal processing modules that use an adaptive sampling strategy and interpolation to generate a good approximation of the exact output. While ensuring a low error level, improvements in computational efficiency can be expected for digital signal processing systems using the implemented self-adjusting modules.
Mapping History has focused on the categorizing, labelling, digitization, and 3D reconstruction of 16th & 17th century maps & atlases of London and Lisbon. Over the course of the summer, the Mapping History team has developed its own unique analytical dataset by painstakingly labelling every element contained within these maps, used python to digitize this dataset, and, now in the projects final stage, has begun the process of reconstructing these historical perspectives in a 3D game engine.
For our Data+ project, we partnered with Rewriting the Code (RTC), a non-profit organization committed to empowering and fostering a community of college women with a passion for technology. We developed company and industry profiles for the recruitment process that included information ranging from interview and offer rate to negotiation success of salary and benefits. Additionally, we conducted text-analysis of resumes to understand the structure of an optimal tech resume and utilized linear regression to determine the influence of different variables such as ethnicity or college rank.
Our team used artificial intelligence to help Duke University Management Company (DUMAC) operate more efficiently by building a cost optimization tool and analyzing and visualizing venture capital data. In our first project centered around cost optimization, we designed a Python script that suggests optimal cash transfers between prime brokers. In our second project, we utilized web scraping and Tableau to aid DUMAC in understanding the relationship between company age and length of current investments.
We utilize elements of data science and analysis in order to scour weblogs for potential malicious attacks on Duke’s servers. Additionally, we seek to identify patterns within the data that could be indicative of malicious intent and hope to apply these to real-time data.
Predictive Churn Models for Duke Season Ticket Holders and Annual Donors is centered around understanding which annual donors are most likely to churn, i.e. not donate the following year. To solve this problem the team built different models to predict the profiles and timing of donor churn. The team made use of Duke Athletics’ internal data supplemented by external data to build predictive machine learning models.
Our team members have spent the summer working with the North Carolina Division of Public Health Occupational and Environmental Epidemiology Branch to build a pilot environmental public health data dashboard, with the hope that the pilot tool will be used in DPH’s grant proposal to the CDC for a fully-funded tool. The pilot tool, which is a Tableau dashboard, displays population, health, and environmental data for North Carolina counties and census tracts. The project involved data processing in R, the creation of a detailed metadata table, and building interactive visualizations Tableau.
Carrying forward the work of a 2019-20 Bass Connections team, our Data+ team has worked to better understand the state of the home mortgage market leading up to the financial crisis. The team has built a more in-depth analysis of North Carolina to understand its different regions. We have also expanded the scope of the analysis developing a quantitative portrait on the state of the mortgage market in Arizona, Florida, Massachusetts, Georgia, and Ohio, creating visualization devices for different mortgage market statistics.
Our group aims to reveal the effects of urban and agricultural land use on metabolic productivities of rivers through statistical manipulation and visualization. During this summer, we classified sites and conducted covariate analyses based on patterns of metabolism, and produced reproducible code that can be used by researchers with similar research goals. We hope that our findings would suggest hypotheses of how disruption is caused by land development, and what factors should land planners avoid introducing.
Our team used years of unanalyzed data in a cloud computing environment to conduct exploratory data analysis using natural language processing techniques, as well as visualizations, for Fleet Management Limited. Through this, and preliminary predictive modelling, we hope to help management decrease the number of preventable incidents as each one costs FML more than $10,000.
We trained an object detection model to locate wind turbines in overhead satellite imagery. Because these deep learning models require large amounts of training data, and satellite imagery of wind turbines is rare and expensive to collect, we created synthetic satellite imagery using 3D modeling software. We then supplemented our real-world training dataset with the synthetic imagery and observed changes in performance.
Astronomers from the Dark Energy Survey rely on images of deep space to understand the nature of the universe, but these images are often polluted with "space junk": asteroids, comets, satellites, or other objects from our own solar system obstructing the telescope's view. In order to perform their analysis, scientists must first manually identify and mask out such objects from images, a time-consuming process. With leads Michael Troxel, Dan Scolnic, and Chris Walter, we've leveraged deep learning-based computer vision techniques to build models to automatically identify and localize space junk in deep space imagery.
In collaboration with Data and Analytics Practice at OIT, our team has completed a series of critical analyses aiding Duke Facilities Management in further optimizing campus energy usage. Data cleaning tools, imputation techniques, and a variety of time series prediction methods ranging from autoregressive models to deep learning networks have been seamlessly integrated into a single interactive forecasting application allowing collaborators to provide accurate and comprehensive utility usage estimates.
Our team examined the relationship between race and home values across several units of analysis (household, address, HOLC rating area, census block, block group, and tract) in Durham, NC. We combined data from the decennial censuses (1940-2010), American Community Survey (2005-2018), Durham County Register of Deeds (1997-2020), and Durham County Tax Administration (1997-2021). We find that home values are strongly associated with the racial composition of areas, that homes in black neighborhoods are worth less, and that they accumulate less value over time.
This project involves predicting the incidence of blindness in glaucoma patients at Duke Eye Center (DEC) -- specifically, the likelihood of a patient presenting legally blind (i.e. with very advanced disease) at their first visit. We will assemble a novel data set of electronic health records from thousands of DEC glaucoma patients and data from the Durham Neighborhood Compass project, a repository of geospatially resolved socioeconomic statistics on Durham county that includes features like average distance to a healthcare facility. We aim to identify risk factors associated with delayed care for glaucoma in the Durham and wider NC communities.
In light of Duke’s reopening amidst the COVID-19 pandemic, this project aims to track the movement of foot traffic in and around Bryan Center by analyzing Wifi log data from all users connected to wireless networks in the center during February 2020. Our team employed Markov Chains, Kernel Density Estimations, and data analysis and visualization tools such as Python and Tableau to create a map of Wifi access points in Bryan Center and a heatmap that visualizes congestion in different floor areas across time. Our goal is to provide Duke OIT and Student Affairs with valuable information on highly congested areas and frequented paths, directing social distancing measures and suggesting alternative paths that can reduce transmission-risk this coming academic year.
Project Leads: John Haws, Mary Thompson, Eric Hope, Sean Dilda
Project Manager: Hunter Klein
Watch the team's final project presentation (on Zoom) here:
This summer, our objective was to take data provided by the Durham County Detention Facility (DCDF), Duke Health, and Lincoln Community Health Center and analyze trends across the local justice system and these health care institutions, specifically in regards to individuals with mental illness. We analyzed the experience of individuals who were incarcerated by looking at their demographic characteristics, emergency department usage, and criminal justice encounters. Using these initial findings, we hope to better understand the relationship between health care utilization and rates of recidivism in Durham County during the school year through a Bass Connections Team.
Project Manager: Ruth Wygle
Watch the team's final presentation (on Zoom) here:
The visibility of hate groups such as the Alt-Right became mainstream into contemporary political culture during the Unite the Right Rally in Charlottesville, VA in 2017. This project aims to explore methods to quantify the presence of Latinxs within the Alt-Right, particularly in how they racialize themselves in a space that often spews hate towards Mexicans and other marginalized groups from Latin America. Using data from multiple sources (such as Twitter, Stormfront, and Breitbart), we developed a corpus of tweets, subthreads, and articles, and analyzed this data using basic natural language processing (NLP) techniques.
This project aims to analyze assessment and performance data collected from baseball players to make predictions about baseball performance based on vision and physical abilities. We use hierarchical regression analyses to identify characteristics that correlate with batting performance in order to inform scouts about the likely production of developmental prospects. The final product is an application that uses an athlete's assessment results to produce performance summary graphs for the individual compared to other athletes and inferential models for the relationships between assessments and performance.
Watch the team's final presentation (on Zoom) here:
We apply word embedding models to corpora from the start of the Early Modern period, when the market economy began to dramatically expand in England. Word embedding models use neural networks to map vectors to words so that semantic relationships are preserved within the vectors’ geometry. Such models have been successful in understanding cultural trends and stereotypes in large corpora of texts, but these techniques are infrequently used on texts dating much farther back than the 19th century. Using newly developed methods for analyzing word embeddings, we track the development of the meanings of words related to consumerism, including their relationships with gender over time.
The Protecting American Investors project investigates the evolving structure and content of financial advice from the early 20th century to the birth of the Internet. By converting and cleaning thousands of investment advice columns from historical newspapers and magazines, we assembled a large corpus to address our research questions. Through text analysis methods like topic modeling, we have seen how the business cycle affects the nature of advice, the speed in which different financial innovations were integrated, and how advice differs among various targeted social groups.
Project Lead: Ed Balleisen
Watch the team's final presentation (on Zoom) here:
Duke’s enrollment data over the past 50 years speaks volumes about the evolution of the University. Using physical scans courtesy of the Duke University Archives, we manipulate decades of demographic and geospatial data across all Duke schools to create an interactive application. Giving users control over how they want to dive into the data, we hope to both illuminate questions and inspire further research into the composition of Duke’s student body--highlighting the history, origins, and people that have helped make it what it is today.
Project Manager: Anna Holleman
Watch the project team's final presentation (on Zoom) here:
The team discusses insights they have learned through looking at Duke's enrollment data since the founding of Duke:
Between 1935 and 1945, rural electricity access shot up from roughly 10% to 90%. During this time, the Rural Electrification Administration funded an Electric Farm Equipment (EFE) Roadshow as part of its mission to expand electricity access and demand. Digitizing massive amounts of archival data, our team has sought to quantify the effect of the EFE Roadshow on the larger trend of growing residential electricity consumption in rural U.S. towns from 1938 to 1945. We hope that understanding this crucial chapter in our own history will help inform present-day electrification efforts in the developing world.
Watch the team's final presentation (on Zoom) here:
The Disease Emergence and Richness in Primates team uses existing databases to quantify parasite richness across primates and to identify ecological predictors of parasitism. By integrating phylogenetic generalized least squares regression and network based approaches, the team ultimately aims to predict missing interactions between primates and parasites, which, combined with exploring ecological predictors, will provide better capabilities for identifying emerging infectious diseases in humans.
Project Manager: Marie Claire Chelini
We all need water to survive but how many of us really know where our water comes from? Team 3 has created a functioning website from scratch to give consumers more easily accessible information about who provides their water and how supply looks in relation to the past 30 years.
Watch the team's final presentation (on Zoom) here:
Project Manager: Kyle Onda
Our project is about building a food recommendation system for Avoidant/Reactive Intake Disorder (ARFID) patients and understanding the relationship between ARFID and clinical variables. Our stakeholders include young picky eaters and their parents, as well as clinicians who work with ARFID patients. We created an interactive visualization for ARFID patients to encourage them to explore different foods and also built visualization to represent the relationship between ARFID and clinical variables.
Project Manager: Julia Nichols
Interact with the team's visualization tool here: http://foodrecbucket.s3-website-us-west-1.amazonaws.com/
Watch the team's final presentation below:
Led by Dr. Eva Wheeler, this project considers how racial language in African American literature and film is rendered for international audiences and traces the spread of these translations. To address the study’s primary questions, the team analyzed a preliminary dataset and explored the relationship between translation strategy and different categories of racial language. The team also conducted a macro-level analysis of the linguistic, temporal, and geographic spread of African American stories using the IMDB and WorldCat databases. We have found a large amount of variation in how African American stories are rendered, which can in part be explained through a social scientific lens.
Traditional Human Activity Recognition (HAR) utilizes accelerometry (movement) data to classify activities. This summer, Team #4 examined using physiological sensors to improve HAR accuracy and generalizability. The team developed ML models that are going to be available open source in the Digital Biomarker Discovery Pipeline (DBDP) to enable other researchers and clinicians to make useful insights in the field of HAR.
Project Lead: Jessilyn Dunn
Project Manager: Brinnae Brent
Watch the team's final presentation (on Zoom) below:
Social and environmental contexts are increasingly recognized as factors that impact health outcomes of patients. This team will have the opportunity to collaborate directly with clinicians and medical data in a real-world setting. They will examine the association between social determinants with risk prediction for hospital admissions, and to assess whether social determinants bias that risk in a systematic way. Applied methods will include machine learning, risk prediction, and assessment of bias. This Data+ project is sponsored by the Forge, Duke's center for actionable data science.
Project Manager: Austin Talbot
Aaron Chai (Computer Sciece, Math) and Victoria Worsham (Economics, Math) spent ten weeks building tools to understand characteristics of successful oil and gas licenses in the North Sea. The team used data-scraping, merging, and OCR method to create a dataset containing license information and work obligations, and they also produced ArcGIS visualizations of license and well locations. They had the chance to consult frequently with analytics professionals at ExxonMobil.
Project Lead: Kyle Bradbury
Project Manager: Artem Streltsov
Yueru Li (Math) and Jiacheng Fan (Economics, Finance) spent ten weeks investigating abnormal behavior by companies bidding for oil and gas rights in the Gulf of Mexico. Working with data provided by the Bureau of Ocean Energy Management and ExxonMobil, the team used outlier detection methods to automate the flagging of abnormal behavior, and then used statistical methods to examine various factors that might predict such behavior. They had the chance to consult frequently with analytics professionals at ExxonMobil.
Project Lead: Kyle Bradbury
Project Manager: Hyeongyul Roh
Team A: Video data extraction
Alexander Bendeck (Computer Science, Statistics) and Niyaz Nurbhasha (Economics) spent ten weeks building tools to extract player and ball movement in basketball games. Using freely available broadcast-angle video footage which required much cleaning and pre-processing, the team used OpenPose software and employed neural network methodologies. Their pipeline fed into the predictive models of Team C.
Team B: Modeling basketball data: offense
Anshul Shah (Computer Science, Statistics), Jack Lichtenstein (Statistics), and Will Schmidt (Mechanical Engineering) spent ten weeks building tools to analyze offensive play in basketball. Using 2014-5 Duke Men’s Basketball player-tracking data provided by SportVU, the team constructed statistical models that explored the relationship between different metrics of offensive productivity, and also used computational geometry methods to analyze the off-ball “gravity” of an offensive player.
Team C: Modeling basketball data: defense
Lukengu Tshiteya (Statistics), Wenge Xie (ECE), and Joe Zuo (Computer Science, Statistics) spent ten weeks building tools to predict player movement in basketball games. Using SportVU data, including some pre-processed by Team A, the team built predictive RNN models that distinguish between 6 typical movement types, and created interactive visualizations of their findings in R Shiny.
Team D: Visualizing basketball data
Shixing Cao (ECE) and Jackson Hubbard (Computer Science, Statistics) spent ten weeks building visualizations to help analyze basketball games. Using player tracking data from Duke basketball games, the team created visualizations of gameflow, networks of points and assists, and integrated all of their tools into an R Shiny app.
Yanchen Ou (Computer Science) and Jiwoo Song (Chemistry, Mechanical Engineering) spent ten weeks building tools to assist in the analysis of smart meter data. Working with a large dataset of transformer and household data from the Kyrgyz Republic, the team built a data preprocessing pipeline and then used unsupervised machine-learning techniques to assess energy quality and construct typical user profiles.
Faculty Lead: Robyn Meeks
Project Manager: Bernard Coles
Bernice Meja (Philosophy, Physics), Jessica Yang (Computer Science, ECE), and Tracey Chen (Computer Science, Mechanical Engineering) spent ten weeks building methods for Duke’s Office of Information Technology (OIT) to better understand information arising from “smart” (IoT) devices on campus. Working with data provided by an IoT testbed set up by OIT professionals, the team used a mixture of supervised and unsupervised machine-learning techniques and built a prototype device classifier.
Project Lead: Will Brockselsby
Interested in understanding the types of attacks targeting Duke and other universities? Led by OIT and the IT Security Office, students will learn to analyze threat intelligence data to identify trends and patterns of attacks. Duke blocks an average of 1.5 billion malicious connection attempts/day and is working with other universities to share the attack data. One untapped area is research into the types of attacks and learning how universities are targeted. Students will collaborate alongside the security and IT professionals in analyzing the data and with the intent to discern patterns.
Project Lead: Jesse Bowling
Project Manager: Susan Jacobs
Katelyn Chang (Computer Science, Math) and Haynes Lynch (Environmental Science, Policy) spent ten weeks building tools to analyze and visualize geospatial and remote sensing data arising from the Alligator River National Wildlife Refuge (ARNWR). The team produced interactive maps of physical characteristics that were tailored to specific refuge management professionals, and also built classifiers for vegetation detection in LandSat imagery.
Project Manager: Emily Ury
Dennis Harrsch, Jr. ( Computer Science ), Elizabeth Loschiavo ( Sociology ), and Zhixue (Mary) Wang ( Computer Science, Statistics ) spent ten weeks improving upon the team’s web platform that allows users to examine contraceptive use in low and middle income (LMIC) countries collected by the Demographic and Health Survey (DHS) contraceptive calendar. The team improved load times, data visualization latency, and increased the number of country surveys available in the platform from 3 to 55. The team also created a new app that allows users to explore the results of machine learning using this big data set.
This project will continue into the academic year via Bass Connections where student teams will refine the machine learning model results and explore the question of whether and how policymakers can use these tools to improve family planning in LMIC settings.
Faculty Lead: Megan Huchko
Project Manager: Amy Finnegan
Nathaniel Choe (ECE) and Mashal Ali (Neuroscience) spent ten weeks developing machine-learning tools to analyze urodynamic detrusor pressure data of pediatric spina bifida patients from the Duke University Hospital. The team built a pipeline that went from raw time series data to signal analysis to dimension reduction to classification, and has the potential to assist in clinician diagnosis.
Project Manager: Zekun Cao
Varun Nair (Economics, Physics), Paul Rhee (Computer Science), Jichen Yang (Computer Science, ECE), and Fanjie Kong (Computer Vision) spent ten weeks helping to adapt deep learning techniques to inform energy access decisions.
Faculty Lead: Kyle Bradbury
Project Manager: Fanjie Kong
Yoav Kargon (Mechanical Engineering) and Tommy Lin (Chemistry, Computer Science) spent ten weeks working with data from the Water Quality Portal (WQP), a large national dataset of water quality measurements aggregated by the USGS and EPA. The team went all the way from raw data to the production of Pondr, an interactive and comprehensive tool built with R Shiny that permits users to investigate and visualize data coverage, values, and trends from the WQP.
Faculty Lead: Jim Heffernan
Project Manager: Nick Bruns
Marco Gonazales Blancas (Civil Engineering) and Mengjie Xiu (Masters, BioStatistics) spent ten weeks building tools to help Duke reduce its energy footprint and achieve carbon neutrality by 2024. The team processed and analyzed troves of utility consumption data and then created practical monthly energy use reports for each school at Duke. These reports show historical usage trends, provide energy benchmarks for comparison, and make practical suggestions for energy savings.
Faculty Lead: Billy Pizer
Project Manager: Sophia Ziwei Zhu
Cathy Lee (Statistics) and Jennifer Zheng (Math, Emory University) spent ten weeks building tools to help Duke University Libraries better understand its journal purchasing practice. Using a combination of web-scraping and data-merging algorithms, the team created a dashboard to help library strategists visualize and optimize journal selection.
Project Manager: Chi Liu
Micalyn Struble (Computer Science, Public Policy), Xiaoqiao Xing (Economics), and Eric Zhang (Math) spent ten weeks exploring the use of neuroscience as evidence in criminal trials. Working with a large set of case files downloaded from WestLaw, the team used natural language processing to build a predictive model that has the potential to automate the process of locating relevant-to-neuroscience cases from databases.
Faculty Lead: Nita Farahany
Project Manager: William Krenzer
The Middle Passage, the route by which most enslaved persons were brought across the Atlantic to North America, is a critical locus of modern history—yet it has been notoriously difficult to document or memorialize. The ultimate aim of this project is to employ the resources of digital mapping technologies as well as the humanistic methods of history, literature, philosophy, and other disciplines to envision how best to memorialize the enslaved persons who lost their lives between their homelands and North America. To do this, the students combined previously-disparate data and archival sources to discover where on their journeys enslaved persons died. Because of the nature of data itself and the history it represents, the team engaged in on-going conversations about various ways of visualizing its findings, and continuously evaluated the ethics of the data’s provenance and their own methodologies and conclusions. A central goal for the students was to discover what contribution digital data analysis methods could make to the project of remembering itself.
Ellis Ackerman (Math, NCSU), Rodrigo Araujo (Computer Science), and Samantha Miezio (Public Policy) spent ten weeks building tools to help understand the scope, cause, and effects of evictions in Durham County. Using evictions data recorded by the Durham County Sheriff’s Department and demographic data from the American Community Survey, the team investigated relationships between rent and evictions, created cost-benefit models for eviction diversion efforts, and built interactive visualizations of eviction trends. They had the opportunity to consult with analytics professionals from DataWorks NC.
Project Leads: Tim Stallmann, John Killeen, Peter Gilbert
Project Manager: Libby McClure
The aim of this project was to explore how U.S. mass media—particularly newspapers—enlists text and imagery to portray human rights, genocide, and crimes against humanity from World War II until the present. From the Holocaust to Cambodia, from Rwanda to Myanmar, such representation has political consequences. Coined by Raphael Lemkin, a Polish lawyer who fled Hitler’s antisemitism, the term “genocide” was first introduced to the American public in a Washington Post op-ed in 1944. Since its legal codification by the United Nations Convention for the Prevention of Genocide in 1948, the term has circulated, been debated, used to describes events that pre-date it (such as the displacement and genocide of Native People in the Americas), and been shaped by numerous forces—especially the words and images published in newspapers. Alongside the definition of “genocide,” other key concepts, specifically “crimes against humanity,” have attempted to label, and thus name the story, of targeted mass violence. Conversely, the concept of “human rights,” enshrined in the 1948 UN Declaration, seeks to name a presence of rights instead of their absence.
During the summer, the team focused their work on evaluating the language used in Western media to represent instances of genocide and how such language varied based on the location and time period of the conflict. In particular, the team’s efforts centered on Rwanda and Bosnia as important case studies, affording them the chance to compare nearly simultaneous reporting on two well-known genocides. The language used by reporters in these two cases showed distinct polarizations of terminology (for instance, while “slaughter” was much more common than “murder” in discussions of the Rwanda genocide, the inverse was true for Bosnia).
How Much Profit is Too Much Profit?
Chris Esposito (Economics), Ruoyu Wu (Computer Science), and Sean Yoon (Masters, Decision Sciences) spent ten weeks building tools to investigate the historical trends of price gouging and excess profits taxes in the United States of America from 1900 to the present. The team used a variety of text-mining methods to create a large database of historical documents, analyzed historical patterns of word use, and created an interactive R Shiny app to display their data and analyses.
(cartoon from The Masses July 1916)
Faculty Lead: Sarah Deutsch
Project Manager: Evan Donahue
Maria Henriquez (Computer Science, Statistics) and Jacob Sumner (Biology) spent ten weeks building tools to help the Michael W. Krzyzewski Human Performance Lab best utilize its data from Duke University student athletes. The team worked with a large collection of athlete strength, balance, and flexibility measurements collected by the lab. They improved the K Lab’s data pipeline, created a predictive model for injury risk, and developed interactive web-based individualized injury risk reports.
Vincent Wang (Computer Science, CE), Karen Jin (Bio/Stats), and Katherine Cottrell (Computer Science) spent ten weeks building tools to educate the public about lake dynamics and ecosystem health. Using data collected over a period of 50 years at the Experimental Lake Area (ELA) in Ontario, the team preprocessed and merged datasets, made a series of data visualizations, and produced an interactive website using R Shiny.
Faculty Lead: Kateri Salk
Project Manager: Kim Bourne
Vivek Sahukar (Masters, Data Science), Yuval Medina (Computer Science), and Jin Cho (Computer Science/Electrical & Compter Engineering) spent ten weeks creating tools to help augment the experience of users in the StreamPULSE community. The team created an interactive guide and used data sonification methods to help users navigate and understand the data, and they used a mixture of statistical and machine-learning methods to build out an outlier detection and data cleaning pipeline.
Aidan Fitzsimmons (Public Policy, Mathematics, Electrical & Computer Engineering), Joe Choo (Mathematics, Economics) and Brooke Scheinberg (Mathematics) spent ten weeks partnering with the Durham Crisis Intervention Team, the Criminal Justice Resource Center, and the Stepping Up Initiative. Utilizing booking data of 57,346 individuals provided by the Durham County Jail, this team was able to create visualizations and predictive models that illustrate patterns of recidivism, with a focus on the subset of the population with serious mental illness (SMI). These results could assist current efforts in diverting people with SMI from the criminal justice system and into care.
Project Manager: Ruth Wygle
The students in this project worked on a pervasive question in literary, film, and copyright studies: how do we know when a new work of fiction borrows from an older one? Many times, works are appropriated, rather than straightforwardly adapted, which makes it difficult for human readers to trace. As we continue to remake and repurpose previous texts into new forms that combine hundreds of references to other works (such as Ready Player One), it becomes increasingly laborious to track all the intertextual elements of a single text. While some borrowings are easy to spot, as in the case of Marvel films that are straightforward adaptations of comic book storylines and aesthetics, others are more subtle, as when Disney reinterpreted Hamlet and African oral traditions to create The Lion King. Thousands of new stories are created each day, but how do we know if we are borrowing or appropriating a previous text? Are there works that have adapted previous ones that we have yet to identify?
Jett Hollister (Mechanical Engineering) and Lexx Pino (Computer Science, Math) joined Economics majors Shengxi Hao and Cameron Polo in a ten week study of the late 2000s housing bubble. The team scraped, merged, and analyzed a variety of datasets to investigate different proposed causes of the bubble. They also created interactive visualizations of their data which will eventually appear on a website for public consumption.
Faculty Lead: Lee Reiners
Project Manager: Kate Coulter
Cassandra Turk (Economics) and Alec Ashforth (Economics, Math) spent ten weeks building tools to help minimize the risk of trading electricity on the wholesale energy market. The team combined data from many sources and employed a variety of outlier-detection methods and other statistical tools in order to create a large dataset of extreme energy events and their causes. They had the opportunity to consult with analytics professionals from Tether Energy.
Project Lead: Eric Butter, Tether
Andre Wang (Math, Statistics), Michael Xue (Computer Science, ECE), and Ryan Culhane (Computer Science) spent ten weeks exploring the role played by emotion in speech-focused machine-learning. The team used a variety of techniques to build emotion recognition pipelines, and incorporated emotion into generated speech during text-to-speech synthesis.
Project Manager: Enmao Diao
Brooke Erikson (Economics/Computer Science), Alejandro Ortega (Math), and Jade Wu (Computer Science) spent ten weeks developing open-source tools for automatic document categorization, PDF table extraction, and data identification. Their motivating application was provided by Power for All’s Platform for Energy Access Knowledge, and they frequently collaborated with professionals from that organization.
Jake Epstein (Statistics/Economics), Emre Kiziltug (Economics), and Alexander Rubin (Math/Computer Science) spent ten weeks investigating the existence of relative value opportunities in global corporate bond markets. They worked closely with a dataset provided by a leading asset management firm.
Maksym Kosachevskyy (Economics) and Jaehyun Yoo (Statistics/Economics) spent ten weeks understanding temporal patterns in the used construction machinery market and investigating the relationship between these patterns and macroeconomic trends.
They worked closely with a large dataset provided by MachineryTrader.com, and discussed their findings with analytics professionals from a leading asset management firm.
Alec Ashforth (Economics/Math), Brooke Keene (Electrical & Computer Engineering), Vincent Liu (Electrical & Computer Engineering), and Dezmanique Martin (Computer Science) spent ten weeks helping Duke’s Office of Information Technology explore the development of an “e-advisor” app that recommends co-curricular opportunities to students based on a variety of factors. The team used collaborative and content-based filtering to create a recommender-system prototype in R Shiny.
Statistical Science majors Eidan Jacob and Justina Zou joined forces with math major Mason Simon built interactive tools that analyze and visualize the trajectories taken by wireless devices as they move across Duke’s campus and connect to its wireless network. They used de-identified data provided by Duke’s Office of Information Technology, and worked closely with professionals from that office.
Cecily Chase (Applied Math), Brian Nieves (Computer Science), and Harry Xie (Computer Science/Statistics) spent ten weeks understanding how algorithmic approaches can shed light on which data center tasks (“stragglers”) are typically slowed down by unbalanced or limited resources. Working with a real dataset provided by project clients Lenovo, the team created a monitoring framework that flags stragglers in real time.
Integrating study data with public data from the American Community Survey, they built interactive visualization tools that will help researchers understand the study results and the representativeness of study participants.
Lucas Fagan (Computer Science/Public Policy), Caroline Wang (Computer Science/Math), and Ethan Holland (Statistics/Computer Science) spent ten weeks understanding how data science can contribute to fact-checking methodology. Training on audio data from major news stations, they adapted OpenAI methods to develop a pipeline that moves from audio data to an interface that enables users to search for claims related to other claims that had been previously investigated by fact-checking websites.
This project will continue into the academic year via Bass Connections.
A team of students led by Professors Jonathan Mattingly and Gregory Herschlag will investigate gerrymandering in political districting plans. Students will improve on and employ an algorithm to sample the space of compliant redistricting plans for both state and federal districts. The output of the algorithm will be used to detect gerrymandering for a given district plan; this data will be used to analyze and study the efficacy of the idea of partisan symmetry. This work will continue the Quantifying Gerrymandering project, seeking to understand the space of redistricting plans and to find justiciable methods to detect gerrymandering. The ideal team has a mixture of members with programing backgrounds (C, Java, Python), statistical experience including possibly R, mathematical and algorithmic experience, and exposure to political science or other social science fields.
Read the latest updates about this ongoing project by visiting Dr. Mattingly's Gerrymandering blog.
Varun Nair (Mechanical Engineering), Tamasha Pathirathna (Computer Science), Xiaolan You (Computer Science/Statistics), and Qiwei Han (Chemistry) spent ten weeks creating a ground-truthed dataset of electricity infrastructure that can be used to automatically map the transmission and distribution components of the electric power grid. This is the first publicly available dataset of its kind, and will be analyzed during the academic year as part of a Bass Connections team.
Kimberly Calero (Public Policy/Biology/Chemistry), Alexandra Diaz (Biology/Linguistics), and Cary Shindell (Environmental Engineering) spent ten weeks analyzing and visualizing data about disparities in Social Determinants of Health. Working with data provided by the MURDOCK Study, the American Community Survey, and the Google Places API, the team built a dataset and visualization tool that will assist the MURDOCK research team in exploring health outcomes in Cabarrus County, NC.
Alexandra Putka (Biology/Neuroscience), John Madden (Economics), and Lucy St. Charles (Global Health/Spanish) spent ten weeks understanding the coverage and timeliness of maternal and pediatric vaccines in Durham. They used data from DEDUCE, the American Community Survey, and the CDC.
This project will continue into the academic year via Bass Connections.
Dima Fayyad (Electrical & Computer Engineering), Sean Holt (Math), David Rein (Computer Science/Math) spent ten weeks exploring tools that will operationalize the application of distributed computing methodologies in the analysis of electronic medical records (EMR) at Duke.
As a case study, they applied these systems to an Natural Language Processing project on clinical narratives about growth failure in premature babies.
Zhong Huang (Sociology) and Nishant Iyengar (Biomedical Engineering) spent ten weeks investigating the clinical profiles of rare metabolic diseases. Working with a large dataset provided by the Duke University Health System, the team used natural language processing techniques and produced an R Shiny visualization that enables clinicians to interactively explore diagnosis clusters.
Samantha Garland (Computer Science), Grant Kim (Computer Science, Electrical & Computer Engineering), and Preethi Seshadri (Data Science) spent ten weeks exploring factors that influence patient choices when faced with intermediate-stage prostate cancer diagnoses. They used topic modeling in an analysis of a large collection of clinical appointment transcripts.
Nathan Liang (Psychology, Statistics), Sandra Luksic (Philosophy, Political Science),and Alexis Malone (Statistics) began their 10-week project as an open-ended exploration how women are depicted both physically and figuratively in women's magazines, seeking to consider what role magazines play in the imagined and real lives of women.
Jennie Wang (Economics/Computer Science) and Blen Biru (Biology/French) spent ten weeks building visualizations of various aspects of the lives of orphaned and separated children at six separate sites in Africa and Asia. The team created R Shiny interactive visualizations of data provided by the Positive Outcomes for Orphans study (POFO).
Aaron Crouse (Divinity), Mariah Jones (Sociology), Peyton Schafer (Statistics), and Nicholas Simmons (English/Education) spent ten weeks consulting with leadership from the Parents Teacher Association at Glenn Elementary School in Durham. The team set up infrastructure for data collection and visualization that will aid the PTA in forming future strategy.
In tracing the publication history, geographical spread, and content of “pirated” copies of Daniel Defoe’s Robinson Crusoe, Gabriel Guedes (Math, Global Cultural Studies), Lucian Li (Computer Science, History), and Orgil Batzaya (Math, Computer Science) explored the complications of looking at a data set that saw drastic changes over the last three centuries in terms of spelling and grammar, which offered new challenges to data cleanup. By asking questions of the effectiveness of “distant reading” techniques for comparing thousands of different editions of Robinson Crusoe, the students learned how to think about the appropriateness of myriad computational methods like doc2vec and topic modeling. Through these methods, the students started to ask, at what point does one start seeing patterns that were invisible at a human scale of reading (reading one book at a time)? While the project did not definitively answer these questions, it did provide paths for further inquiry.
The team published their results at: https://orgilbatzaya.github.io/pirating-texts-site/
Melanie Lai Wai (Statistics) and Saumya Sao (Global Health, Gender Studies) spent ten weeks developing a platform which enables users to understand factors that influence contraceptive use and discontinuation. Their work combined data from the Demographic and Health Surveys contraceptive calendar with open data about reproductive health and social indicators from the World Bank, World Health Organization, and World Population Prospects. This project will continue into the academic year via Bass Connections.
Bob Ziyang Ding (Math/Stats) and Daniel Chaofan Tao (ECE) spent ten weeks understanding how deep learning techniques can shed light on single cell analysis. Working with a large set of single-cell sequencing data, the team built an autoencoder pipeline and a device that will allow biologists to interactively visualize their own data.
Ashley Murray (Chemistry/Math), Brian Glucksman (Global Cultural Studies), and Michelle Gao (Statistics/Economics) spent 10 weeks analyzing how meaning and use of the work “poverty” changed in presidential documents from the 1930s to the present. The students found that American presidential rhetoric about poverty has shifted in measurable ways over time. Presidential rhetoric, however, doesn’t necessarily affect policy change. As Michelle Gao explained, “The statistical methods we used provided another more quantitative way of analyzing the text. The database had around 130,000 documents, which is pretty impossible to read one by one and get all the poverty related documents by brute force. As a result, web-scraping and word filtering provided a more efficient and systematic way of extracting all the valuable information while minimizing human errors.” Through techniques such as linear regression, machine learning, and image analysis, the team effectively analyzed large swaths of textual and visual data. This approach allowed them to zero in on significant documents for closer and more in-depth analysis, paying particular attention to documents by presidents such as Franklin Delano Roosevelt or Lyndon B. Johnson, both leaders in what LBJ famously called “The War on Poverty.”
Natalie Bui (Math/Economics), David Cheng (Electrical & Computer Engineering), and Cathy Lee (Statistics) spent ten weeks helping the Prospect Management and Analytics office of Duke Development understand how a variety of analytic techniques might enhance their workflow. The team used topic modeling and named entity recognition to develop a pipeline that clusters potential prospects into useful categories.
Tatanya Bidopia (Psychology, Global Health), Matthew Rose (Computer Science), Joyce Yoo (Public Policy/Psychology) spent ten weeks doing a data-driven investigation of the relationship between mental health training of law enforcement officers and key outcomes such as incarceration, recidivism, and referrals for treatment. They worked closely with the Crisis Intervention Team, and they used jail data provided by the Sheriff’s Office of Durham County.
Sophie Guo, Math/PoliSci major, Bridget Dou, ECE/CompSci major, Sachet Bangia, Econ/CompSci major, and Christy Vaughn spent ten weeks studying different procedures for drawing congressional boundaries, and quantifying the effects of these procedures on the fairness of actual election results.
Anna Vivian (Physics, Art History) and Vinai Oddiraju (Stats) spent ten weeks working closely with the director of the Durham Neighborhood Compass. Their goal was to produce metrics for things like ambient stress and neighborhood change, to visualize these metrics within the Compass system, and to interface with a variety of community stakeholders in their work.
Maddie Katz (Global Health and Evolutionary Anthropology Major), Parker Foe (Math/Spanish, Smith College), and Tony Li (Math, Cornell) spent ten weeks analyzing data from the National Transgender Discrimination Survey. Their goal was to understand how the discrimination faced by the trans community is realized on a state, regional, and national level, and to partner with advocacy organizations around their analysis.
Sharrin Manor, Arjun Devarajan, Wuming Zhang, and Jeffrey Perkins explored a lage collection of imagery data provided by the U.S. Geological Survey, with the goal of identifying solar panels using image recognition. They worked closely with the Energy Data Analytics Lab, part of the Energy Initiative at Duke.
Yanmin (Mike) Ma, mathematics/economics major, and Manchen (Mercy) Fang, electrical and computer engineering/computer science major, spent ten weeks studying historical archives and building a model to predict the price of pigs, relative to a number of interesting factors.
Luke Raskopf, PoliSci major and Xinyi (Lucy) Lu, Stats/CompSci major, spent ten weeks investigating the effectiveness of policies to combat unemployment and wage stagnation faced by working and middle-class families in the State of North Carolina. They worked closely with Allan Freyer at the North Carolina Justice Center.
David Clancy, a Stats/Math/EnvSci major, and Tianyi Mu, an ECE/CompSci major, spent ten weeks studying the effects of weather, surroundings, and climate on the operational behavior of water reservoirs across the United States. They used a large dataset compiled by the U.S. Army Corps of Engineers, and they worked closely with Lauren Patterson from the Water Policy Program at Duke's Nicholas Institute for Environmental Policy Solutions. Project mentorship was provided by Alireza Vahid, a postdoctoral candidate in Electrical Engineering.
Biomedical Engineering and Electrical and Computer Engineering major David Brenes, and Electrical and Computer Engineering/Computer Science majors Xingyu Chen and David Yang spent ten weeks working with mobile eye tracker data to optimize data processing and feature extraction. They generated their own video data with SMI Eye Tracking Glasses, and created computer vision algorithms to categorize subject gazing behavior in a grocery purchase decision-making environment.
Biomedical Engineering major Chi Kim Trinh, and Biostatistics MS student Can Cui spent ten weeks constructing a computational and statistical framework to evaluate the effects of health coaching on Type II Diabetes patients’ quality metrics, including Hemoglobin A1c, blood pressure, eye exam consistency, tobacco use, and prescription adherence to statins, aspirin, and angiotensin converter enzyme (ACE)/ angiotensin receptor blocker (ARB).
BME major Neel Prabhu, along with CompSci and ECE majors Virginia Cheng and Cheng Lu, spent ten weeks studying how cells from embryos of the common fruit fly move and change in shape during development. They worked with Cell-Sheet-Tracker (CST), an algorithm develped by former Data+ student Roger Zou and faculty lead Carlo Tomasi. This algorithm uses computer vision to model and track a dynamic network of cells using a deformable graph.
Xinyu (Cindy) Li (Biology and Chemistry) and Emilie Song (Biology) spent ten weeks exploring the Black Queen Hypothesis, which predicts that co-operation in animal societies could be a result of genetic/functional trait losses, as well as polymorphism of workers in eusocial animals such as ants and termites. The goal was to investigate this idea in four different eusocial insect species.
Weiyao Wang (Math) and Jennifer Du , along with NCCU Physics majors Jarrett Weathersby and Samuel Watson, spent ten weeks learning about how search engines often provide results which are not representative in terms of race and/or gender. Working closely with entrepreneur Winston Henderson, their goal was to understand how to frame this problem via statistical and machine-learning methodology, as well as to explore potential solutions.
Matthew Newman (Sociology), Sonia Xu (Statistics), and Alexandra Zrenner (Economics) spent ten weeks exploring giving patterns and demographic characteristics of anonymized Duke donors. They worked closely with the Duke Alumni Affairs and Development Office, with the goal of understanding the data and constructing tools to generate data-driven insight about donor behavior.
Artem Streltsov (Masters Economics) and IIT Mechanical Engineering major Vinod Ramakrishnan spent ten weeks exploring North Carolina state budget documents. Working closely with the Budget and Tax Center, part of the North Carolina Justice Center, their goal was to help build a keystone tool that can be used for analysis of the state budget as well as future budget proposals.
Yuangling (Annie) Wang, a Math/Stats major, and Jason Law, a Math/Econ major, spent ten weeks analyzing message-testing data about the 2015 Marijuana Legalization Initiative in Ohio; the data were provided by Public Opinion Strategies, one of the nation's leading public opinion research firms.
The goal was to understand how statistics and machine learning might help develop microtargeting strategies for use in future campaigns.
Devri Adams (Environmental Science), Annie Lott (Statistics), and Camila Vargas Restrepo (Visual Media Studies, Psychology) spent ten weeks creating interactive and exploratory visualizations of ecological data. They worked with over sixty years of data collected at the Hubbard Brook Experimental Forest (HBEF) in New Hampshire.
Ana Galvez (Cultural and Evolutionary Anthropology), Xinyu Li (Biology), and Jonathan Rub (Math, Computer Science) spent ten weeks studying the impact of diet on organ and bone growth in developing laboratory rats. The goal was to provide insight into the growth dynamics of these model organisms that could eventually be generalized to inform research on human development.
Robbie Ha (Computer Science, Statistics), Peilin Lai (Computer Science, Mathematics), and Alejandro Ortega (Mathematics) spent ten weeks analyzing the content and dissemination of images of the Syrian refugee crisis, as part of a general data-driven investigation of Western photojournalism and how it has contributed to our understanding of this crisis.
Over ten weeks, Computer Science Majors Amber Strange and Jackson Dellinger joined forces with Psychology major Rachel Buchanan to perform a data-driven analysis of mental health intervention practices by Durham Police Department. They worked closely with leadership from the Durham Crisis Intervention Team (CIT) Collaborative, made up of officers who have completed 40 hours of specialized training in mental illness and crisis intervention techniques.
Over ten weeks, Computer Science majors Daniel Bass-Blue and Susie Choi joined forces with Biomedical Engineering major Ellie Wood to prototype interactive interfaces from Type II diabetics' mobile health data. Their specific goals were to encourage patient self-management and to effectively inform clinicians about patient behavior between visits.
Building off the work of a 2016 Data+ team, Yu Chen (Economics), Peter Hase (Statistics), and Ziwei Zhao (Mathematics), spent ten weeks working closely with analytical leadership at Duke's Office of University Development. The project goal was to identify distinguishing characteristics of major alumni donors and to model their lifetime giving behavior.
A team of students led by Dr. Shanna Sprinkle of Duke Surgery will combine success metrics of Duke Surgery residents from a set of databases and create a user interface for residency program directors and possibly residents themselves to view and better understand residency program performance.
Lauren Fox (Cultural Anthropology) and Elizabeth Ratliff (Statistics, Global Health) spent ten weeks analyzing and mapping pedestrian, bicycle, and motor vehicle data provided by Durham's Department of Transportation. This project was a continuation of a seminar on "ghost bikes" taught by Prof. Harris Solomon.
Boning Li (Masters Electrical and Computer Engineering), Ben Brigman (Electrical and Computer Engineering), Gouttham Chandrasekar (Electrical and Computer Engineering), Shamikh Hossain (Computer Science, Economics), and Trishul Nagenalli (Electrical and Computer Engineering, Computer Science) spent ten weeks creating datasets of electricity access indicators that can be used to train a classifier to detect electrified villages. This coming academic year, a Bass Connections Team will use these datasets to automatically find power plants and map electricity infrastructure.
Felicia Chen (Computer Science, Statistics), Nikkhil Pulimood (Computer Science, Mathematics), and James Wang (Statistics, Public Policy) spent ten weeks working with Counter Tools, a local nonprofit that provides support to over a dozen state health departments. The project goal was to understand how open source data can lead to the creation of a national database of tobacco retailers.
Selen Berkman (ECE, CompSci), Sammy Garland (Math), and Aaron VanSteinberg (CompSci, English) spent ten weeks undertaking a data-driven analysis of the representation of women in film and in the film industry, with special attention to a metric called the Bechdel Test. They worked with data from a number of sources, including fivethirtyeight.com and the-numbers.com.
Over ten weeks, BME and ECE majors Serge Assaad and Mark Chen joined forces with Mechanical Engineering Masters student Guangshen Ma to automate the diagnosis of vascular anomalies from Doppler Ultrasound data, with goals of improving diagnostic accuracy and reducing physician time spent on simple diagnoses. They worked closely with Duke Surgeon Dr. Leila Mureebe and Civil and Environmental Engineering Professor Wilkins Aquino.
Over ten weeks, Math/CompSci majors Benjamin Chesnut and Frederick Xu joined forces with International Comparative Studies major Katharyn Loweth to understand the myriad academic pathways traveled by undergraduate students at Duke. They focused on data from Mathematics and the Duke Global Health Institute, and worked closely with departmental leadership from both areas.
Liuyi Zhu (Computer Science, Math), Gilad Amitai (Masters, Statistics), Raphael Kim (Computer Science, Mechanical Engineering), and Andreas Badea (East Chapel Hill High School) spent ten weeks streamlining and automating the process of electronically rejuvenating medieval artwork. They used a 14th-century altarpiece by Francescussio Ghissi as a working example.
Angelo Bonomi (Chemistry), Remy Kassem (ECE, Math), and Han (Alessandra) Zhang (Biology, CompSci) spent ten weeks analyzing data from social networks for communities of people facing chronic conditions. The social network data, provided by MyHealth Teams, contained information shared by community members about their diagnoses, symptoms, co-morbidities, treatments, and details about each treatment.
Zijing Huang (Statistics, Finance), Artem Streltsov (Masters Economics), and Frank Yin (ECE, CompSci, Math) spent ten weeks exploring how Internet of Things (IoT) data could be used to understand potential online financial behavior. They worked closely with analytical and strategic personnel from TD Bank, who provided them with a massive dataset compiled by Epsilon, a global company that specializes in data-driven marketing.
Over ten weeks, Mathematics/Economics majors Khuong (Lucas) Do and Jason Law joined forces with Analytical Political Economy Masters student Feixiao Chen to analyze the spati-temporal distribution of birth addresses in North Carolina. The goal of the project was to understand how/whether the distributions of different demographic categories (white/black, married/unmarried, etc.) differed, and how these differences connected to a variety of socioeconomic indicators.
Furthering the work of a 2016 Data+ team in predictive modeling of pancreatic cancer from electronic medical record (EMR) data, students Siwei Zhang (Masters Biostatistics) and Jake Ukleja (Computer Science) spent ten weeks building a model to predict pancreatic cancer from Electronic Medical Records (EMR) data. They worked with nine years worth of EMR data, including ICD9 diagnostic codes, that contained records from over 200,000 patients.
Over ten weeks, Public Policy major Amy Jiang and Mathematics and Computer Science major Kelly Zhang joined forces with Economics Masters student Amirhossein Khoshro to investigate academic hiring patterns across American universities, as well as analyzing the educational background of faculty. They worked closely with Academic Analytics, a provider of data and solutions for universities in the U.S. and the U.K.
Linda Adams(CompSci), Amanda Jankowski (Sociology, Global Health), and Jessica Needleman (Statistics/Economics) spent ten weeks prototyping small-area mapping of public-health information within the Durham Neighborhood Compass, with a focus on mortality data. They worked closely with the director of DataWorks NC, an independent data intermediary dedicated to democratizing the use of quantitative information.
Over ten weeks, Biology major Jacob Sumner and Neuroscience major Julianna Zhang joined forces with Biostatistics Masters student Jing Lyu to analyze potential drug diversion in the Duke Medical Center. Early detection of drug diversion assists health care providers in helping patients recover from their condition, as well as mitigate the effects on any patients under their care.
William Willis (Mechanical Engineering, Physics) and Qitong Gao (Masters Mechanical Engineering) spent ten weeks with the goal of mapping the ocean floor autonomously with high resolution and high efficiency. Their efforts were part of a team taking part in the Shell Ocean Discovery XPRIZE, and they made extensive use of simulation software built from Bellhop, an open-source program distributed by HLS Research.
Albert Antar(Biology), and Zidi Xiu (Biostatistics) spent ten weeks leveraging Duke Electronic Medical Record (EMR) data to build predictive models of Pancreatic ductal adenocarcinoma (PDAC). PDAC is the 4th leading cause of cancer deaths in the US, and is most often is diagnosed in stage IV, with a survival rate of only 1% and life expectancy measured in months. Diagnosis of PDAC is very challenging due of deep anatomical placement, and significant risk imposed by traditional biopsy. The goal of this project is to utilize EMR data to identify potential avenues for diagnosing PDAC in the early treatable stages of disease.
Joy Patel (Math and CompSci) and Hans Riess (Math) spent ten weeks analyzing massive amounts of simulated weather data supplied by Spectral Sciences Inc. Their goal was to investigate ways in which advanced mathematical techniques could assist in quantifying storm intensity, helping to augment today's more qualitatively-based methods.
Computer Science and Psychology major Molly Chen, and Neuroscience major Emily Wu spent ten weeks working with patient diagnosis co-occurence data derived from Duke Electronic Medical Records to develop network visualizations of co-occurring disorders within demographic groups. Their goal was to make healthcare more holistic, and reduce healthcare disparities by improving patient and provider awareness of co-occurring disorders for patients within similar demographic groups.
Emily Horn (Public Policy, Global Health), Aasha Reddy (Economics), and Shanchao Wang (Masters Economics) spent ten weeks working with data from the National Asset Scorecard for Communities of Color (NASCC), an ongoing survey project that gathers information about asset and debt of households at a detailed racial and national origin level. They worked closely with faculty and researchers from the Samuel Dubois Cook Center for Social Equity.
Vivek Sriram (Computer Science and Math), Lina Yang (Biostatistics), and Pablo Ortiz (BME) spent ten weeks working in close collaboration with the Department of Biostatistics and Bioinformatics implementing an image analysis pipeline for immunofluorescence microscopy images of developing mouse lungs.
Statistical Science majors Nathaniel Brown and Corey Vernot, and Economics student Guan-Wun Hao spent ten weeks exploring changes in food purchase behavior and nutritional intake following the event of a new Metformin prescription for Type II Diabetes. They worked closely with Matthew Harding and researchers in the BECR Center, as well as Dr. Susan Spratt, an endocrinologist in Duke Medicine.
Anne Driscoll (Economics, Statistical Science), and Austin Ferguson (Math, Physics) spent ten weeks examining metrics for inter-departmental cooperativity and productivity, and developing a collaboration network of Duke faculty. This project was sponsored by the Duke Clinical and Translational Science Award, with the larger goal of promoting collaborative success in the School of Medicine and School of Nursing.
Joel Tewksbury (BME) and Miriam Goldman (Math and Statistics, Arizona State University) spent ten weeks analyzing time-series darkness visual adaptation scores from over 1200 study participants to identify trends in night vision, and ultimately genetic markers that might confer a visual advantage.
Lindsay Hirschhorn (Mechanical Engineering) and Kelsey Sumner (Global Health and Evolutionary Anthropology) spent ten weeks determining optimal vaccination clinic locations in Durham County for a simulated Zika virus outbreak. They worked closely with researchers at RTI International to construct models of disease spread and health impact, and developed an interactive visualization tool.
The team built a ground truth dataset comprising satellite images, building footprints, and building heights (LIDAR) of 40,000+ buildings, along with road annotations. This dataset can be used to train computer vision algorithms to determine a building’s volume from an image, and is significant contribution to the broader research community with applications in urban planning, civil emergency mitigation and human population estimation.
Computer Science majors Erin Taylor and Ian Frankenburg, along with Math major Eric Peshkin, spent ten weeks understanding how geometry and topology, in tandem with statistics and machine-learning, can aid in quantifying anomalous behavior in cyber-networks. The team was sponsored by Geometric Data Anaytics, Inc., and used real anonymized Netflow data provided by Duke's Information Technology Security Office.
Molly Rosenstein, an Earth and Ocean Sciences major and Tess Harper, an Environmental Science and Spanish major spent ten weeks developing interactive data applications for use in Environmental Science 101, taught by Rebecca Vidra.
Undergraduate students Ellie Burton (BioPhysics/Math, Johns Hopkins University), Kevin Kuo (Electrical and Computer Engineering), and GiSeok Choi (Electrical and Computer Enhineering/Math) joined a research group led by Douglas Boyer and Professor Ingrid Daubechies, testing and developing mathematical and statistical methodology for measuring similarities between bones and teeth.
Kelsey Sumner, EvAnth and Global Health major and Christopher Hong, CompSci/ECE major, spent ten weeks analyzing high-dimensional microRNA data taken from patients with viral and/or bacterial conditions. They worked closely with the medical faculty and practitioners who generated the data.
Kang Ni, Math/Econ major, Kehan Zhang, Econ/Stats/ major, and Alex Hong, spent ten weeks investigating a large collection of grocery store transaction data. They worked closely with Matt Harding Behavioral Economics and Healthy Food Choice Research Center. (BECR Center).
Ethan Levine, Annie Tang, and Brandon Ho spent ten weeks investigating whether personality traits can be used to predict how people make risky decisions. They used a large dataset collected by the lab of Prof. Scott Huettel, and were mentored by graduate students Emma Wu Dowd and Jonathan Winkle.
Spenser Easterbrook, a Philosophy and Math double major, joined Biology majors Aharon Walker and Nicholas Branson in a ten-week exploration of the connections between journal publications from the humanities and the sciences. They were guided by Rick Gawne and Jameson Clarke, graduate students from Philosophy and Biology.
The goal of this project is take a large amount of data from the Massive Open Online Courses offered by Duke professors, and produce from it a coherent and compelling data analysis challenge that might then be used for a Duke or nation-wide data analysis competition.