Data+

 

Applications Now Open!

Data+ is a full-time ten week summer research experience that welcomes Duke undergraduate and masters students interested in exploring new data-driven approaches to interdisciplinary challenges. It is suitable for students from all class years and from all majors.

Students join small project teams (at most 3 undergrads and 1 masters per team), working alongside other teams in a communal environment. They learn how to marshal, analyze, and visualize data, while gaining broad exposure to the modern world of data science. The projects (see below) come from an extremely diverse set of subject areas.i It is our hope that students will be able to both work deeply into their specific project and get a very broad picture of most of the skills needed for modern data science.

Participants will receive a $5,000 stipend, out of which they must arrange their own housing and travel . Funding and infrastructure support are provided by a wide range of departments, schools, and initiatives from across Duke University, as well as by outside industry and community partners.

Data+ is typically a program where students have dedicated workspace within Gross Hall at Duke University. For the last two summers (2020 and 2021), Data+ ran entirely remotely due to the pandemic, and was quite successful. We hope to resume in-person programming in the summer of 2022, per Duke University guidance.

See below for information about our past projects!

  • I learned there’s much more to it then looking at data. It’s also a way of thinking and organizing what you have analyzed to help others who aren’t able to look at data in such a way to understand it. It’s also a bit of storytelling in a way.

    - Jessica Ho, Math and Neuroscience ‘22
    Predicting Baseball Players’ Athletic Performance Utilizing Baseline Assessments of Vision

  • I didn't really know how data science research applied to social science, but Data+ showed me that it can be a really successful avenue for discovery and change.

    - Nick Datto, Neuroscience, Computer Science, and Cultural Anthropology ‘23
    Race and Housing in Durham over the Course of the 20th Century

  • I’ve learned how interdisciplinary data science is, and how a team of people with many different academic trajectories can work together on the same project, something that I don't think happens very often in other areas.

    - Anonymous

  • What I have discovered is that a majority of data research is about communication. How you interact with your teammates and superiors is just as important, if not more important, than being a genius in your field.

    -  Andrew Scofield, Computer Science ’22, Birmingham-Southern College
    For love of greed: tracing the early history of consumer culture

  • I had expected it to be very analytical, but I was surprised at the creativity that was also required. I enjoyed this aspect a lot.

    - Amber Potter, Computer Science ‘23
    Predicting Baseball Players’ Athletic Performance Utilizing Baseline Assessments of Vision

  • I've gained a lot of valuable insight into the career fields of environmental health and epidemiology. I've also learned a lot about project workflow and how to work through the different phases of a long term project with a team. In addition, my skills in R coding and Tableau have improved a ton.

    - Anonymous

  • My group has been focused on cybersecurity and automation methods to prevent and seek out attackers to keep Duke websites and accounts from being compromised. I have learned a lot about cybersecurity, a field that I otherwise might not have pursued. It has been a very interesting and enlightening experience so far and I am excited to continue learning from the Duke OIT staff.

    - John Taylor, Computer Science ‘21
    Applying Security Orchestration, Automation & Response (SOAR) to security threat hunting with Duke’s ITSO

  • I have gained so much knowledge and confidence! And it is not limited to the area of technology, although I have learned to code in R, navigate PACE, and so much more. I have better discovered the benefit of working with a team and received motivation and mentors by seeing female-identifying students, like myself, succeed. Hearing their success stories via panels or team meetings has given me so much more confidence as a young woman wanting to pursue a career in STEM. I see that it is possible! I also have never worked with data before Data+, but never felt behind in my lack of knowledge as my team is super supportive. They Zoom me outside of the workday and send me resources to help me complete my assignments. I've also realized that I do have an interest in Data Science and feel like I'm making a difference in the world through this program. Knowing that my project (Predicting Blindness in Duke's Glaucoma Patient Population) is going to help so many clinicians, government officials, patients and more is so empowering. It is crazy to think that I am just 19 years old and working on such an advanced project with beyond accomplished students, doctors, and professors, but I'm doing it! Data+ truly has given me the opportunity to expand my knowledge and network in a safe environment. I find these takeaways pretty impressive, especially since it is all remote this year.

    - Sydney Hunt, Engineering ‘23
    Predicting Blindness in Duke’s Glaucoma Patient Population

  • Beyond solid technical machine learning skills, I've received a greater appreciation for data science as a tool to understand everything--from aircraft maintenance to the humanities. Before, I'd never expected that conducting humanities research would teach me how to wield and utilize the most cutting-edge research in machine learning and natural language processing. My team is using new package libraries and research papers written by lead researchers this year to conduct our analysis of ancient texts. In Data+, New meets Old.

    - Albert Sun, Computer Science and Public Policy ‘23
    For love of greed: tracing the early history of consumer culture

  • Working remotely has made coordination much more difficult. However, we really have been embracing GitHub and box to overcome these challenges. I have learned a lot about RNNs and the applications of GRUs and LSTM's and how to implement such layers, in addition to learning how to use pytorch as previously I only used tensor flow.

    - Nathan Warren, MIDS
    Human Activity Recognition using Physiological Data from Wearables

  • As a Biology pre-med, I made the mistake of thinking that coding was irrelevant to me. That changed when I took a biology class where we used R to analyze lab data. That was when I realized that coding (and the problem solving skills that come with it) is invaluable in research. It was difficult at first to jump into Data+, but doing this has benefited me a few ways. Having to learn python on my own, in a very short amount of time, with almost no prior coding experience (I didn't even know what a package was) and quickly turning around and using those skills taught me that I am capable of flexibility and learning on the job. Coding also requires an immense amount of problem solving and independence. Although my mentors are fantastic, it's up to me to figure out where I want to take the project and how I want to do it. Finally, Data+ has been a really invaluable exercise in teamwork. This has been especially challenging with remote learning. However, I still feel like our team has grown very close in working toward a common goal.

    - Ellen Mines, Biology and Philosophy ‘21
    Computational Tools to Improve Healthy and Pleasurable Eating in Young Children

  • My coding skills and machine learning knowledge had a huge leap. I learned how to better work in a team as well.

    - Noah Lanier, Psychology ‘22
    Human Activity Recognition using Physiological Data from Wearables

  • I learned a lot about data science and using code to manipulate data. I learned how to properly use a terminal, deep learning/machine learning, pandas, and many other skills. Also, I gained collaboration skills when it comes to developing code.

    - Pavani Jairam, Physics ‘23
    Finding Space Junk with the World’s Biggest Telescopes

  • I’ve learned to work through the entire process of a data science project, from assembling data sources all the way through presenting our findings. I’ve also developed insight into working in a team with people of different backgrounds and interests, which enabled us to contribute to the project in different ways. I’ve taken various lessons and hard skills that will carry with me into my future academic and professional endeavors.

    - Benjamin Chen, Computer Science, Economics ‘22
    Protecting American Investors? Financial Advice from before the New Deal to the Birth of the Internet

  • Data+ absolutely changed my perception of data science research. Learning data science has been more intuitive than expected. There are also resources all over the Internet in addition to team members that are able to provide assistance when one is facing difficulty with an aspect of a project. Data science is also able to be applied to many more scenarios than I expected; I look forward to continuing data science research in the future.

    - Malik Scott, Global Health ‘22
    Predicting Baseball Players’ Athletic Performance Utilizing Baseline Assessments of Vision

  • I gained concrete skills in R and Tableau, the ability to collaborate in a virtual environment, and a better understanding of what data science actually means. I also got a glimpse into the public health field and got to learn what many different public health careers might actually entail.

    - Anna Zolotor, Undeclared
    Piloting an Environmental Public Health Tracking Tool for North Carolina

  • I have gained a significant amount of knowledge of the cybersecurity industry and attack methods due to the nature of the background research I had to do for my project. In addition, I was able to apply my knowledge of statistical analysis to real data and learn new techniques to arrange data such as time series analysis.

    - Matthew Feder, Computer Science ‘22
    Applying Security Orchestration, Automation & Response (SOAR) to security threat hunting with Duke’s ITSO

  • Since I've never participated in research before, especially not research this independently oriented, the main thing I feel I've gained from this experience is confidence. I feel like I have a much better understanding of my own capabilities, and I honestly feel much less intimidated by the idea of pursuing research, not just in Data Science.

    - Donald Pepka, Math, Political Science, and Creative Writing ‘21
    For love of greed: tracing the early history of consumer culture

  • I learned a number of hard skills in terms of coding languages as well as some soft skills along the lines of working with a team and coordinating with a client.

    - Benjamin Williams, ECE ‘21
    ABOUT-US – A BOundary Update Tool for Utility Services

  • I definitely gained a lot of experience in R and in Tableau, but I also learned a ton about the fields of data science and public health. We had several interviews with community partners that helped me learn a lot about the different types of careers in data science, environmental advocacy, and environmental health.

    - Leah Roffman, Environmental Science ‘23
    Piloting an Environmental Public Health Tracking Tool for North Carolina

  • I learned about team communication and organizational skills, time management, and I think I have a greater appreciation for how socio-cultural analysis from a humanities perspective can work in tandem with STEM based modes of collecting information/data.

    - Luci Jones, Environmental Studies Brown University
    When Black Stories Go Global: Analyzing the Translation of African-American Literature and Film

  • Through the program, I not only developed my technical skills with regards to programming and data visualization, but I also learned a lot more about finance and the intersections of finance and data science. This program really incited my love for programming and problem-solving with data, and has made me even more interested in studying statistical science and data science at Duke. Finally, I learned how to effectively collaborate and communicate with a team in a virtual environment.

    - Helen Chen, Statistics ‘23
    AI in the Investment Office

10
weeks during the summer
2-3
undergraduates per team
1-2
grad student mentors
25
projects sharing ideas and code

Related Videos

Projects

A team of students collaborating with Duke School of Medicine's Root Causes Fresh Produce Program, community members, and physicians throughout the Duke Health network will help integrate data from food deliveries to Duke Health patients with patient health record data and other available data sources to create a dashboard that can analyze, predict, and manage the Root Causes' "Food as Medicine" program. Specific outcomes will contribute to improving the Program's quantitative evaluation of its health impact as well as efficiency and satisfaction for its patients. Students will be assisted with IRB approval and mentorship from faculty and community advisors.

Project Leads: Esko Brummel, Willis Wong

A team of students led by researchers at the Duke Marine Lab will explore the changing distribution of krill around the Antarctic Peninsula. Krill are a key prey species in this ecosystem, supporting a number of animals including whales, seals, and penguins, but they are dependent on winter sea ice and may be in trouble as climate change progresses. Using data from acoustic zooplankton surveys, students will create maps and other products to visualize the spatial distribution of krill over the past 20 summers, then create metrics that allow us to quantify the way that krill distribution around the Antarctic Peninsula is changing as the climate shifts and ice melts. These results will be key to our understanding of the impacts of climate change on this polar ecosystem.

 

Project Lead: Douglas Nowacek

Project Manager: Amanda Lohmann

 

A team of students will partner closely with the City of Durham's newly formed Community Safety Department.  The Community Safety Department's mission is to identify, implement, and evaluate new approaches to enhance public safety that may not involve a law enforcement response or the criminal justice system. The student team will (1) analyze and identify geographic and temporal patterns in 911 calls for service, (2) conceptualize and build an abstracted data pipeline and tools that would enrich currently available 911 data with other social, economic, and health-related data, (3) explore associations between areas of high call volume, indicators of mental health distress, and histories of dispossession; and (4) identify methods by which future researchers could examine connections between varied 911 incident responses (e.g. police response, unarmed response, joint police, and mental health response) and life trajectories (e.g. arrest, jail time, hospitalization, unemployment, etc.).

 

Project Lead: Greg Herschlag, Anise Van, City of Durham

 

 

Today we design communication networks using mathematical models that describe components of the system that affect end-to-end performance. As wireless links become more highly variable, and system components become harder to model, this approach is losing ground. A team of students led by Dr. Robert Calderbank, Dr. Christ Richmond, Dr. Lingjia Liu, Dr. Jeff Reed, and Carl Dietrich will develop machine learning (ML) algorithms that take advantage of special features of new waveforms proposed for 6G wireless communication. The team will be highly interdisciplinary, and will include students from Virginia Tech familiar with wireless communication, as well as students interested in machine learning. Students will design experiments, collect data, and analyze over the air performance, some working onsite using Virginia Tech’s CORNET testbed (https://cornet.wireless.vt.edu/), some virtually using CORENT-based remote lab experiments. The team will present their findings to clients at the Air Force Research Lab in Rome, NY.

Project Leads: Robert Calderbank, Christ Richmond, Lingjia Liu, Jeff Reed, and Carl Dietrich 

This project is also part of Duke’s first Climate+ cohort

A team of students led by researchers at Duke and abroad will develop and evaluate machine learning solutions to model behavioral patterns of electric use, emphasizing data privacy. Data collected in different parts of the world will be analyzed to understand the electric patterns that characterize various appliances and how that information can model users' consumption profiles and prevent fraud. 

 

Project Lead: J. Matias Di Martino


Data+ has been in operation for 8 years, and several other linked programs have started up since, including Code+, which focuses on app development and CS+, which focuses on team-based research in Computer Science.
A team of students led by John Haws (OIT) will collaborate with Plus Programs administrators to review the data that has been gathered on students since each Plus Program began. The team will then make recommendations and create a single data structure and dashboard for Plus Programs that can be used for years to come to report on participants, suggest program improvements, and develop alumni outreach opportunities.

 

Project Lead: John Haws

A team of students led by Professor Anru Zhang (Duke Biostatistics & Bioinformatics, Computer Science, Mathematics, and Statistical Science) will develop methods to investigate the courses of complex diseases through electronic health records. The team will apply tensor methods to identify key features to register the patient's timeline. This work will provide a basis for researchers at Duke and elsewhere to sufficiently utilize the high-dimensional and longitudinal information in the electronic health record data. If time permitted, the related methods and theory will also be studied.

 

Project Lead: Anru Zhang

The QSIDES Institute has pulled thousands of pages of police records from the Williamstown Police Department.  This project will analyze and identify patterns in police behavior in the town, work with journalists and other activists to use the data to develop action plans to address problems and enact public safety reform, and help to build an abstracted process and suite of tools that can be used by other small towns and municipalities to analyze their own data and empower police reform efforts across the nation. This project will interface with the Small Town Policing Accounting, or SToPA Lab. The overall goal for this project is to use the instance of the Williamstown Policing data to build out a toolkit that can be used in other small municipalities for policing transparency.
 

Project Leads: Jude Higdon, Greg Herschlag

A team of researchers associated with the Applied Machine Learning Lab in Duke’s ECE department will lead a team of students in developing novel machine learning techniques that will be used for improving brain computer interfaces (BCIs) using electroencephalography (EEG) data.  Students will learn how to pre-process EEG data, extract EEG features, and train machine learning algorithms for character selection in a spelling interface that allows “locked in” individuals, like Stephen Hawking, to communicate with the outside world.  In addition to developing machine learning algorithms, students will work to develop a dashboard to visualize EEG signals, trained classifier parameters, classifier outputs, and spelling decisions made by the BCI.

 

Project Lead: Leslie Collins

Data+ students led by Prof. Henri Gavin will develop AI models for on-site earthquake early warning, in which sensors at a site provide warnings at that site. The Data+ project will integrate into ongoing work on geophone sensors, IOT microcontrollers, and networking. The Data+ team will focus on machine learning aspects of the project by making use of extensive seismic databases. The machine learning model, an anisotropic Gaussian Process, will relate waveform characteristics from the initial moments of shaking to predict the maximum expected shaking at the monitored site.

 

Project Lead: Henri Gavin

 

 

 

This project is also part of Duke’s first Climate+ cohort

A student team working with the Energy Data Analytics Lab will work to democratize access to data relevant to climate change mitigation and adaptation planning as well as the underlying models to acquire those data. This project will work towards building the first “foundation model” specifically for remote sensing imagery for the purpose of extracting climate change relevant content at scale to enable near real-time tracking of climate causes and impacts.  A foundation model is a model (usually a deep neural network) that has been trained on a large and diverse set of data, after which it can be adapted to a variety of different (but related) inference tasks with a small fraction additional training data and computation. Leveraging recent developments in self-supervised learning, we will develop the dataset for creating this foundation model and begin training it on real-world data. A model developed using such a dataset will enhance climate change mitigation/adaptation monitoring and planning through developing robust features that can be used to monitor a broad range of climate change contributing factors (e.g. energy infrastructure and use, agricultural activity) and impacts (e.g. economic impacts and human migration) for informing climate mitigation and adaptation strategies.

 

Project Lead: Kyle Bradbury

A team of students led by Courtnea Rainey, David Jamieson-Drake, and Edward Balleisen will explore survey data from completers of the PhD and Duke PhD alumni to establish important correlations, document key patterns and longitudinal trends, and develop visualizations that can inform institutional decision-making. In addition to updating the work of last year’s Data+ team, this year’s group will use text mining and topic modeling to draw out key themes from the textual answers to free response questions.  Combining statistics with data science and natural language processing to analyze large datasets, this team will help refine our efforts to improve doctoral training across Duke University.

 

Project Lead: Ed Balleisen

A team of students led by researchers in the BIG IDEAs Lab will work to create a cloud-based infection detection platform that populates and translates wearable data from a variety of sources. The project will involve working with existing wearable data pipelines (e.g., APIs) to collect, process, and visualize wearable device data in real-time as well as implement machine learning techniques to produce insights of user health. The project will also involve app development and UI/UX considerations so that the platform is easily accessible and user friendly. The ultimate goal of this work is to inform wearable device users of changes in their health condition before more serious symptoms occur. 

 

Project Lead: Ali Roghanizad

A student team led by researchers at Duke Surgery and Global Health Institute will further develop the computer application - Alcohol Use Behavioral Phenotyping Test (AUBPT) that can help predict alcohol use and alcohol use disorder risks based on personal characteristics and behavioral performance on Research Domain Criteria paradigms/games. Students will build multi-tasking simulated AI agents using computational neuroscience and deep learning methods. These agents can 'mimic' human behaviors. Students will also use machine learning to model substance use risk from previously collected data. Together, these approaches will give a more rigorous understanding of causal relationships across behavioral paradigms and make AUBPT an adaptive application. This project is a part of the Bass Connections project - 'Alcohol Use Behaviors across Countries and Cultures' (https://bassconnections.duke.edu/project-teams/alcohol-use-behaviors-across-countries-and-cultures-2021-2022). The Bass Connections team is highly interdisciplinary where students are working at the interface of digital, mental, and global health to deployment, evaluation, and cultural adaptation of our AUBPT across multiple global samples.  The Bass Connections team is highly interdisciplinary where students are working at the interface of digital, mental, and global health for deployment, evaluation, and cultural adaptation of AUBPT across multiple global populations.

 

Project Lead: Siddesh Zadey, Dr. Catherine Staton and Dr. Joao Vissoci

This project is also part of Duke’s first Climate+ cohort

A team of students led by researchers in the Hydroclimatological Lab will comprehensively quantify the wetland carbon emissions in the entire Southeast (SE) US using machine learning techniques and various climate datasets—including in situ measurements, remote sensing data, climate observations, and hydrological model (PIHM-Wetland) outputs. Students will first apply machine learning techniques to establish the relationship between hydroclimatological variables and wetland carbon emissions at observational sites. Spatial distributed carbon emissions from the entire SE US wetland ecosystems will be created afterwards. Based on the current climate analysis, future wetland carbon emissions will be predicted in a warming climate. This research will better assess wetland carbon emissions over the entire Southeast and provide critical information on future carbon budgets on regional scales.

Project Lead: Wenhong Li

A team of students led by Co-Principal Investigators Dr Jenny Immich and Dr Vicky McAlister will develop a geospatial methodology to automate data analysis originating from small unmanned aerial vehicles (SUAV) that seeks to identify the homes of ordinary medieval people within the modern Irish landscape. Known as aerial archaeology, this work investigates archaeological remnants on the surface of the earth without excavation. Students will create iterative Python scripts to analyze elevation datasets (hillshades) in areas surrounding medieval castles. Students will create advanced analysis methodologies for exploring low-lying archaeological features in the landscape, including principal component, standard deviation, and difference mean elevations. Students with interest and appropriate backgrounds will be challenged to develop machine learning methods for identification of linear archaeology. The resulting portfolio of work will be robust, including workflows, metadata, methodologies, and scripts. This work will solidify geospatial methodologies in advancing our knowledge of where everyday people lived in Ireland during the medieval period, a question that has long perplexed archaeologists. In addition, it will fundamentally ground our understanding of complex topics in urbanization to the modern period and shed light on the development of communities across time. 

 

Project Lead: Vicky McAlister, Southeast Missouri State University

The goal of this Data+ project is to apply and extend custom analytics solutions to understand and predict microbial population growth. An explosion of data has resulted from tracking the growth of bacteria in high throughput devices. These data were generated to understand how microbes grow. Better models that fit and predict these growth data are needed for better treatment of pathogenic bacterial infections, food safety, beer and bread fermentation, and understanding stress resilience of the microbiome. Using nonparametric statistical models to analyze how microbes grow under stress, the Schmid research lab at Duke has made important discoveries in these areas. These studies generated large data sets and developed statistical models to track and predict how microbes grow and change their gene expression when faced with extreme stress. We built a web application called phenom to make these models accessible to the broader community. In this Data+ project, students will beta test the web app and make improvements, including data visualization, extending the underlying statistical model, and analyzing data using the app.

 

Project Lead: Amy Schmid

Image credit: Tonner, P.D., Darnell, C.L., Bushell, F.M.L., Lund, P.A., Schmid, A.K.*, Schmidler, S.C. 2020. A Bayesian non-parametric mixed-Effects model of microbial growth curves. PLoS Comp Biol. 16(10): e1008366. https://doi.org/10.1371/journal.pcbi.1008366

A team of students led by Biomedical Engineering professor Lingchong You will predict pattern formation of bacterial colonies by integrating experimental results with both mechanistic modelling and machine learning methods. Bacterial colonies have the capability to self-organize into beautiful and intricate patterns. Students will contribute to a method for controlling the outcome of colony spatial patterning, which is an important challenge facing the field of synthetic biology.

 

Project Lead: Lingchong You

This project is also part of Duke’s first Climate+ cohort

Duke Data+ students, in collaboration with Dr. Emily Bernhardt (faculty advisor) and Audrey Thellman (graduate student) will evaluate how changing ice and snow conditions are impacting river ecosystems through classified ice imagery. Currently, our team has data from 7 field cameras that have been taking photos of the stream channel each day since 2018. We have created training data and code for a machine learning classifier to transform these photos into ecologically relevant indices, such as percent snow coverage. The Data+ team will modularize and visualize this classification pipeline to increase accessibility of our data product. Students will have the opportunity to work with a team of scientists at the New Hampshire site, U.S. Geological Survey partners with vested interest in the data product, and data scientists working in the Bernhardt Lab who have completed and are currently working on similar projects that increase availability and usability of environmental data (see  https://cuahsi.shinyapps.io/macrosheds/).

 

Project Lead: Audrey Thellman, Emily Bernhardt

A team of students led by researchers at the Duke Center for Policy Impact in Global Health (CPIGH) will create a user-friendly interactive visualization tool to track the evolution of Universal Health Coverage (UHC) financing policies in the low- and middle-income countries. The students will use the UHC policy surveillance data collected by the CPIGH Bass Connections team in the Fall 2021 and Spring 2022 semesters. This publicly available data visualization will provide an opportunity for policymakers and researchers to conduct a quick comparative cross-country analysis and longitudinal analysis to understand the impact of different policy experiments on UHC progress in different jurisdictions.

A team of students led by Statistical Science professors Mine Çetinkaya-Rundel and Maria Tackett will pull together all data associated with DataFest (https://www2.stat.duke.edu/datafest) for the purpose of retrospective archiving and documentation as well as creating a valuable resource that can serve as the one-stop-shop for students interested in participating in DataFest in the future. Goals of this project include curating “metadata” on past DataFests to build an interactive dashboard that shows the growth of the event (over time and over geographies) as well as highlights winning projects and participants, creating educational materials based on data from past DataFests that students can use to prepare for the event, and building a website that will serve as the student landing page for the global DataFest event and showcase all products of the Data+ projects. Students will develop their skills working with R packages like Shiny (to build the dashboard), blogdown (to build the website), and a large number packages for data wrangling, visualization, and modeling. All content developed as part of this project will be available to DataFest sites and over 3000 annual participants.

 

Project Leads: Mine Çetinkaya-Rundel and Maria Tackett

A team of students led by researchers in the Duke River Center will develop a publicly available and accessible website to serve as a portal to explore diverse and extensive datasets detailing the quality of waterways and the effectiveness of management efforts to reduce risks associated with chemical contaminants, stormwater flow, and flooding. For example, these datasets are critical in developing mitigation efforts to prepare for changing and new climatic patterns (e.g., extreme weather events and flooding, drought and protecting drinking water resources). However, the data is not readily accessible and easy to visualize for the general public. The website will be developed in collaboration with our community collaborator, the Ellerbe Creek Watershed Association, and Data+ teammates will have the unique opportunity to engage with a separate NSF-funded workshop focused on teaching tools for visualizing these datasets; the workshop will bring together ~30 early scientists across academia, government, and industry.

 

Project Leads: Emily Bernhardt and Jonathan Behrens

A team of students will collaborate with Duke librarians to use AI-powered Handwriting Text Recognition (HTR) tools to transform thousands of pages of handwritten text into machine readable data. Using a large dataset of digitized 19th and early 20th-century women’s travel diaries held in the Rubenstein Library students will test and evaluate various HTR technologies, document methods and constraints for extracting text from historical manuscripts, and build an HTR toolset and proof-of-concept interface that the library can build on for future projects. This work will further the library’s initiative to make its historical collections more readily available for computational research (and future Data+ projects!).

Project Leads: Molly Bragg and Noah Huffman

A team of students led by researchers Nicole Schramm-Sapyta (Duke Institute for Brain Sciences) and Maria Tackett (Statistical Science) will explore the impacts of community health services and local laws and policies on the justice-involved population in Durham. The team will create a public-facing interactive timeline of the implementation of mental health services, drug laws, and court policies in Durham; the data used to create the timeline will also be incorporated into the team’s ongoing analyses. Students will also analyze data from the Durham County Detention Center and Duke Health to explore the impact of the COVID-19 pandemic on incarceration and health-care utilization for the justice-involved population, along with the effects of policies related to mental illness and justice involvement. The data sets and analysis produced in Data+ will provide a foundation for the team’s research in Bass Connections during the 2022 - 2023 academic year.  Students are asked to commit to working with the team for Summer, 2022 (Data +) as well as Fall and Spring 2022-2023 (Bass Connections).

 

Project Leads: Nicole Schramm-Sapyta and Maria Tackett

A team of students led by faculty from both Duke and Duke Kunshan will synthesize data from a variety of sources to investigate the social determinants of cancers in local areas, examine the impact of personal behaviors (such as diet, sleeping, exercise, smoking) and community characteristics (such as air/water/soil quality, built environment, social norms, discrimination, marginalization) on cancer-related outcomes, and conduct systematic review and meta-analysis to evaluate the effectiveness of the current cancer prevention and control policies and interventions in China. The findings will provide rich empirical evidence to describe the cancer burden/disparity and its causes, identify the state-of-the-art cancer prevention and control practices, and inspire the development of region-specific and population-tailored policies and interventions to close the equity gap in cancer prevention and control in China.

Project Lead: Meifeng Chen

Using data from the Durham Compass and the NC School Report Card among many other sources, this team will continue the development of an interactive R Shiny dashboard that permits exploration of school statistical data.  The team aims to explore school zones through an asset-based lens in an effort to support ethical and imaginative partnerships between Duke, North Carolina Central, and Durham Public Schools.  This is a continuation of a joint Duke-NCCU Bass Connections project.

 

Project Lead: Alec Greenwald

Is there a right type and amount of consumption? The idea of ethical consumption has gained prominence in recent discourse, both in terms of what we purchase (from fair trade coffee to carbon off-sets) and how much we consume (from rechargeable batteries to energy efficient homes). These modes of ethical consumerism assume that individuals become political, as well as economic, actors through shopping. Concern with the morality of consumption is not new to capitalist societies, and we can see as much in the earliest discourses surrounding the market economy. During the Middle Ages and the Renaissance, acts of consumption became increasingly aligned with corruption, as individual and corporate bodies were depicted as altered, even damaged, by trade.  And, in an increasingly global and colonial economy, authors debated the ethics of expanding into new markets from India to the Americas. These questions and others like them have shaped our own modern discourse around the ethics of consumption.  

 

Building on the past two years of work, this project will extend the analysis of consumption performed in 2020 and 2021 to include the data collected and analyzed by the Bass Connections team from the rare materials archives of Duke’s Rubenstein library and of the University of Alabama Birmingham’s Lister Hill library. These materials include Medieval and Early Modern medical manuals, which will help us to analyze the metaphorical translation from medical consumption of the human body to economic consumption of the body politic; seventeenth-century mercantile theory texts, which theorize how consumption is connected to trade and monetary policy; as well as treatises and legislative documents on monopolies, corporations, and trading companies. 

 

Project Leads: Astrid Giugni, Jessica Hines

A team of students led by Dr. Liz DeMattia (Duke University Marine Lab) and Dr. Rachel Noble (UNC-IMS) will explore the Community Science Initiative’s AdoptADrain citizen science data collected during the 2021/2022 academic year (the first year of the program). Potential data analyses will include: collating, organizing, comparing and contrasting, finding trends, and comparing the AdoptADrain data to other marine debris data from NC. In addition to analyses, students will also explore best practices for displaying information, communicate findings to the general public, and potentially develop new programs to help with data collection. This work will provide the needed mobilization of raw citizen science data into useable information for participants, the public and policy makers.

Project Lead: Dr. Liz DeMattia

Past Projects

Alexa Goble (Finance) joined Econ majors Chavez Cheong and Eli Levine in a ten-week exploration of mortgage enforcement actions related to the financial crisis from earlier in this century. Using NLP techniques on mortgage data from Ohio and Massachusetts, the team validated a new experimental approach to understanding the dynamics between state regulatory agencies, mortgage lenders, brokers, and loan originators. This project was a continuation of two previous Data+ projects:

https://bigdata.duke.edu/projects/american-predatory-lending-global-financial-crisis

https://bigdata.duke.edu/projects/american-predatory-lending-and-global-financial-crisis-year-2

 

View the team's project poster here

Watch the team's final presentation on Zoom:

 

Project Lead: Lee Reiners

Project Manager: Malcolm Smith Fraser

Stats/Sociology major Mitchelle Mojekwu joined Neuroscience majors Kassie Hamilton and Zineb Jaidi in a ten-week exploration of data relevant to an upcoming public school zone redistricting in Durham County. Using information acquired from the General Social Survey and the US Census, the team applied modern mathematical and statistical methods for generating proposed redistricting plans, with the aim of providing decision-makers with information they can use to produce school districts that are equitable and reflective of the Durham County student population.

View the team's project poster here

Watch the team's final presentation on Zoom:

 

Faculty Lead: Greg Herschlag

Project Manager: Bernard Coles

 

Pryia Juarez (BME/ECE), Jonathan Pilland (ECE/BME), and Matthew Traum (CS/Econ) spent teen weeks analyzing sensor data synthesized by an agile waveform generator. The team used deep reinforcement learning techniques to understand the performance of different synthetic agents representing potential attackers to the sensor system.

 

View the team's project poster here

Watch the team's final presentation on Zoom:

 

Faculty leads: Robert Calderbank, Vahid Tarokh, Ali Pezeshki

Client leads: Dr. Lauren Huie, Dr. Elizabeth Bentley, Dr. Zola Donovan, Dr. Ashley Prater-Bennette, Dr. Erin Trip

Project Manger: Suya Wu