Data+

Data+ is a 10-week summer research experience that welcomes Duke undergraduates interested in exploring new data-driven approaches to interdisciplinary challenges. Students join small project teams, collaborating with other teams in a communal environment. They learn how to marshal, analyze, and visualize data, while gaining broad exposure to the modern world of data science.

Data+ 2019 

 

Browse 2019 projects 

Browse 2015 - 2019 projects

 

  • “I feel I have a better understanding of how to communicate my work to different groups of people. We met with stakeholders, other Data+ teams, and people from the University, all with different levels of technical knowledge, and this really allowed me to be adaptable in how I present what I've worked on. Before, I really thought "data scientist" described what I wanted to be. Now, I have a better understanding of the things I like and dislike. Working extensively on modeling this summer has made me aware that I really enjoy project management on real world projects, which is great insight to have so early in my undergraduate career.”
    Meredith Brown
    Getting Granular on Social Determinants of Health

  • “I have gained experience in applying my classroom knowledge of data science to real world problems. I have made connections that could help further my future data science careers. I have also received offers and opportunities to continue working on the Data+ project and other related projects in the next coming semesters. At first, I thought data science and statistics were basically the same thing. I now realize how insanely wrong I was. Data science is this incredible combination of statistics, computer science, communications, and any other hobby or interest you so choose. I never would have realized the infinite ways data science could be applied, and I am so thankful that this program has changed my perception of Data Science research.”
    —Maria Henriquez
    Optimizing Risk Assessment for Duke University Student Athlete Injury Prevention

  • “I’ve gained technical/coding skills, tangible experience with real big data, communication and public speaking skills, patience, and flexibility. There is way more problem-solving, dirty data, and things you have to account for than I thought in data science. Also, nothing is straightforward. Anything "shown" through data has used specific algorithms to get there.”
    —Brooke Scheinberg
    Recidivism in Durham County Jails

  • “Data+ enabled me to hone my research, communication, and presentation skills. I never fully understood the data science discipline until this experience.”
    —Victoria Worsham
    Investigating Oil and Gas Production in the United Kingdom

  • “As a second time participant in Data+, I gained experience in a different type of data science project. Last summer, my team started with dirty data which we cleaned and performed NLP/machine learning classification on. This summer, my team had to web scrape and make API requests to collect a significant portion of data. Collecting and cleaning data enabled me to think more about how to obtain and organize data in a way that is convenient for the next person using it. Building a dashboard to visualize trends in the data introduced me to iterative dashboard design, in which a series of client-specific changes are made after the general structure and functionality of the dashboard is built. For example, changing the user input bar from a sidebar to a horizontal bar displayed input options more clearly and made more user-friendly for our client."
    Cathy Lee
    Breaking the Bundle

  • “The exposure to lots of different methods and applications across data science has been very eye-opening. I've also gained perspective on the kind of skills, both "soft" and "hard,” that are useful both in academic environments and the real world. I didn't realize how much of data science was learning and researching topics on the fly. This is encouraging, because it means that if you can develop this skill of being able to quickly learn a topic/language/method, you're rarely under-qualified for a job or position.”
    Alex Bendeck
    Basketball Analytics Pipeline

  • “I gained a passion for sports analytics and learned important ML concepts. You really can apply data science to any field you like.”
    Anshul Shah
    Basketball Analytics Pipeline

  • “I feel like I gained a lot of hard skills related to coding in different languages like Python and R. I'm really happy I had a chance to dive into some more complex machine learning problems like computer vision. Most importantly, I feel like I've gained enough confidence in my data science skills to pursue a related career after college.
    Niyaz Nurbhasha
    Basketball Analytics Pipeline

  • “The most important thing I've gained this summer is a sense of understanding of what it's like to work for a client and produce deliverables on a timeline. The biggest takeaway I've gotten about data science is how much data is missing in the world, and how hard it can be to find. There are a lot of really beautiful, well formatted datasets out there, but sometimes, there just isn't a way to get what you're looking for--and that's often where you come in. There's a lot of time spent putting together these datasets in a meaningful way, and figuring out how to format them in the most helpful way for future work, especially if you're not going to be the ones using them. It's a lot like commenting code, but tagging your columns is really important to ensure that everyone going forward knows how to work with what you've put together.”
    Cassandra Turk
    Identifying Extreme Events in Wholesale Energy Markets

  • “I now understand there are many ways to be useful in a collaborative environment. As the only non-technical major on my team, I first believed I couldn't bring as much to the table, but I found there is immense utility in being the person with the least technical experience. Not only was I able to perceive problems in a different way, offering fresh solutions, but I was also able to gain much more knowledge from my teammates. Additionally, I was able to incorporate my sociological understanding of people into the design aspect of our project to create a truly useful user experience. No matter your background, technical or non-technical, you always have something important to share and to learn.”
    Elizabeth Loschiavo
    Big Data for Reproductive Health (Year 2)

  • “I think that Data+ has taught me a lot about how to create something really cool out of nothing. We had to change our project goals because of technical issues and holdups, and I think I learned a lot about how to go from just raw data and ask really interesting questions.”
    Aidan Fitzsimons
    Recidivism in Durham County Jails

     

  • “I gained valuable experience with working both with data, something I had never had the chance to do before, but perhaps more importantly I gained insight into working with a small team alongside a project manager.”
    Dennis Harssch​​​​​​​
    Big Data for Reproductive Health (Year 2)

  • “I didn't know Data Science research was a thing at the undergraduate level, so it showed me some possibilities. I have gained a better understanding of machine learning topics and when to apply different methods. I've also learned a lot looking at results from our models and pulling meaning out of them–even if that meaning is just that there was a bug and we need to redo it differently.”
    Varun Nair
    Deep Learning and Energy Access Decisions

  • “Data+ in many ways widened my perception of what it means to participate in Data Science research. My particular project was pretty in line with my perception before coming to Data+ but many of the other projects looked very different than I imagined a Data Science research project looking in terms of the type of data they were using and the clients.”
    Ryan Culhane
    Speech Emotion Analysis

  • “I feel I have gained a much greater understanding of how to approach problems computationally as well as working on a small team in order to solve/fix these problems! It has been a great experience overall, and I wish I could do it again. Working with the stakeholders we had was great, because they were also so involved and enthusiastic. Our project manager couldn't have been better either. I had a great summer with Data+! It was honestly more exciting and fun than I realized, and participating in Data+ helped me see that I could do computational work in the future and be happy doing it.”
    Jake Sumner
    Athletic Injury Risk Assessment

  • “This summer of Data+ showed me that data science research is also interested in intensely analyzing and understanding outliers of a dataset, rather than just the averages. Many probability and statistics classes emphasize the mean, variation, and overall distribution of a dataset without paying too much attention to the outliers and asking if those outliers can really be explained by the same set of processes affecting 99%+ of the other data points. This summer of Data+ showed me that data science research is also interested in intensely analyzing and understanding these outliers.”
    —Alec Ashforth
    Identifying Extreme Events in Wholesale Energy Markets

  • “Having no prior exposure, outside of a statistics course, to data science and data visualization, I feel like I got a lot out of this program. At least for me, there was a little bit of "drinking from a firehose,” and a steep learning curve but I learned a lot very quickly. The data visualization programs with which I worked are definitely ones I could see myself using in the future.”
    —Anonymous student
    Remembering the Middle Passage

  • “I have gained extensive knowledge about all the human rights conflicts in the world and also just a general understanding of many different facts of data science. I realized that I am not as interested in research as a profession, but I did learn that the digital humanities are a very legitimate field.”
    Anonymous student

  • “I have gained some knowledge in R as well as great teamwork skills.”
    Marco Gonzalez Blancas
    Duke Building Energy Use Report

  • “I feel I have gained a lot of technical skills regarding natural language processing which I really wanted to experience. I also got to experience what it was like working on a research team and attempting to solve a problem in that setting, which I found really valuable. I realized that data science really is a part of every field and it is so broad in terms of scope and what kind of work you will be doing.”
    Nikhil Kaul
    Invisible Adaptations

  • “I went from fumbling through CS101 my freshman fall to playing around with the hyper parameters of machine learning algorithms! That's a success in my book. I learned that data science projects take time and many little failures before you can arrive at anything close to the outcome that you'd like. I learned that, seemingly counterintuitively, I can't rely on other people to teach me things I can learn on my own, but there exists a whole community of people in data science who want to help me learn and grow. I feel more confident tackling problems that have no clear solution, and I've found a new appreciation for API documentations. I'm so thankful that I had the opportunity to spend my summer with Data+!”
    Micalyn Struble
    Neuroscience in the Courtroom

  • “I can code significantly faster now. I have also been through a truncate version of the problem solving process in the real world.”
    —Ellis Ackerman 
    Durham Evictions

  • “I have gotten a better look at where humanities can be useful and frankly, necessary, in implementing new methodologies and approaches to the data science field.”
    —Anonymous

  • “I have learned a lot about JavaScript and Google Earth Engine and issuing remote sensing data to see what changes are happening at Alligator River National Wildlife Refuge and how they are happening. I have also learned that working on a project like this is a great way to learn and make something at the same time.”
    Katelyn Chang
    Saltwater Intrusion on Coastal Ecosystems

  • “I’ve learned many technical tools for data analysis as well as improved my abilities as working part of a team. It was a unique experience working in data science research because this type of work isn’t offered in the lab portion of many classes. It’s an incredible opportunity to engage in data science research as an undergraduate student, especially when the decision making process is mostly up to you and your team.”
    Jeevan Tewari
    Human Rights in the Postwar World

  • “Data+ helped me learn a lot about how to create a machine learning pipeline in order to discover meaningful relationships in data. It also gave me a broader view of data science and career opportunities. I also learned that data science research involves a lot of preprocessing and simply turning the data into a form that a model can digest.”
    Michael Xue
    Speech Emotion Analysis

  • “I've learned so much about data science and I now feel very comfortable coding in several languages. There were multiple times during the summer where I was asked to do something and had absolutely no clue what I was doing, but Data+ gave me the skills and confidence to solve problems beyond what I thought was the limit of my knowledge. (Ultimately, I became really good at googling how to code!). I learned the importance of communication. It's very easy to spend all day coding, but if you're not able to communicate with the public on what you are doing, the code is meaningless.”
    Anonymous

  • “I have gained a more realistic understanding of what working in a data scientist role looks like. I have learned a lot more than I was expecting to and cemented the skills that I had learned in class. I came into the summer unsure of what area of technology I wanted to pursue as a career, but now I am more confident of my role within data science. I have further learned what it looks like to work within a team, with a manager, and with other coworkers.”
    Jackson Hubbard
    Basketball Analytics Pipeline

10
weeks during the summer
2-3
undergraduates per team
1-2
grad student mentors
25
projects sharing ideas and code

Related Videos

Projects

Social and environmental contexts are increasingly recognized as factors that impact health outcomes of patients. This team will have the opportunity to collaborate directly with clinicians and medical data in a real-world setting. They will examine the association between social determinants with risk prediction for hospital admissions, and to assess whether social determinants bias that risk in a systematic way. Applied methods will include machine learning, risk prediction, and assessment of bias. This Data+ project is sponsored by the Forge, Duke's center for actionable data science.

Project Leads: Shelly Rusincovitch, Ricardo Henao, Azalea Kim

Project Manager: Austin Talbot

Aaron Chai (Computer Sciece, Math) and Victoria Worsham (Economics, Math) spent ten weeks building tools to understand characteristics of successful oil and gas licenses in the North Sea. The team used data-scraping, merging, and OCR method to create a dataset containing license information and work obligations, and they also produced ArcGIS visualizations of license and well locations. They had the chance to consult frequently with analytics professionals at ExxonMobil.

Click here to read the Executive Summary

 

Project Lead: Kyle Bradbury

Project Manager: Artem Streltsov

Yueru Li (Math) and Jiacheng Fan (Economics, Finance) spent ten weeks investigating abnormal behavior by companies bidding for oil and gas rights in the Gulf of Mexico. Working with data provided by the Bureau of Ocean Energy Management and ExxonMobil, the team used outlier detection methods to automate the flagging of abnormal behavior, and then used statistical methods to examine various factors that might predict such behavior. They had the chance to consult frequently with analytics professionals at ExxonMobil.

 

Click here to read the Executive Summary

 

Project Lead: Kyle Bradbury

Project Manager: Hyeongyul Roh

Team A: Video data extraction

Alexander Bendeck (Computer Science, Statistics) and Niyaz Nurbhasha (Economics) spent ten weeks building tools to extract player and ball movement in basketball games. Using freely available broadcast-angle video footage which required much cleaning and pre-processing, the team used OpenPose software and employed neural network methodologies. Their pipeline fed into the predictive models of Team C.

Click here to read the Executive Summary

 

Team B: Modeling basketball data: offense

Anshul Shah (Computer Science, Statistics), Jack Lichtenstein (Statistics), and Will Schmidt (Mechanical Engineering) spent ten weeks building tools to analyze offensive play in basketball. Using 2014-5 Duke Men’s Basketball player-tracking data provided by SportVU, the team constructed statistical models that explored the relationship between different metrics of offensive productivity, and also used computational geometry methods to analyze the off-ball “gravity” of an offensive player.

Click here to read the Executive Summary

 

Team C: Modeling basketball data: defense

Lukengu Tshiteya (Statistics), Wenge Xie (ECE), and Joe Zuo (Computer Science, Statistics) spent ten weeks building tools to predict player movement in basketball games. Using SportVU data, including some pre-processed by Team A, the team built predictive RNN models that distinguish between 6 typical movement types, and created interactive visualizations of their findings in R Shiny.

Click here to read the Executive Summary

 

Team D: Visualizing basketball data

Shixing Cao (ECE) and Jackson Hubbard (Computer Science, Statistics) spent ten weeks building visualizations to help analyze basketball games. Using player tracking data from Duke basketball games, the team created visualizations of gameflow, networks of points and assists, and integrated all of their tools into an R Shiny app.

Click here to read the Executive Summary

 

Faculty Leads: Alexander Volfovsky, James Moody, Katherine Heller

Project Managers: Fan Bu, Heather Matthews, Harsh Parikh, Joe Zuo

Yanchen Ou (Computer Science) and Jiwoo Song (Chemistry, Mechanical Engineering) spent ten weeks building tools to assist in the analysis of smart meter data. Working with a large dataset of transformer and household data from the Kyrgyz Republic, the team built a data preprocessing pipeline and then used unsupervised machine-learning techniques to assess energy quality and construct typical user profiles.

 

Click here to read the Executive Summary

 

Faculty Lead: Robyn Meeks

Project Manager: Bernard Coles

Bernice Meja (Philosophy, Physics), Jessica Yang (Computer Science, ECE), and Tracey Chen (Computer Science, Mechanical Engineering) spent ten weeks building methods for Duke’s Office of Information Technology (OIT) to better understand information arising from “smart” (IoT) devices on campus. Working with data provided by an IoT testbed set up by OIT professionals, the team used a mixture of supervised and unsupervised machine-learning techniques and built a prototype device classifier.

 

Click here ot read the Executive Summary

 

Project Lead: Will Brockselsby

Interested in understanding the types of attacks targeting Duke and other universities?  Led by OIT and the IT Security Office, students will learn to analyze threat intelligence data to identify trends and patterns of attacks.  Duke blocks an average of 1.5 billion malicious connection attempts/day and is working with other universities to share the attack data.  One untapped area is research into the types of attacks and learning how universities are targeted.  Students will collaborate alongside the security and IT professionals in analyzing the data and with the intent to discern patterns.

Project Lead: Jesse Bowling

Project Manager: Susan Jacobs

Katelyn Chang (Computer Science, Math) and Haynes Lynch (Environmental Science, Policy) spent ten weeks building tools to analyze and visualize geospatial and remote sensing data arising from the Alligator River National Wildlife Refuge (ARNWR). The team produced interactive maps of physical characteristics that were tailored to specific refuge management professionals, and also built classifiers for vegetation detection in LandSat imagery.

 

Click here to read the Executive Summary

 

Faculty Leads: Justin Wright, Emily Bernhardt

Project Manager: Emily Ury

Dennis Harrsch, Jr. ( Computer Science ), Elizabeth Loschiavo ( Sociology ), and Zhixue (Mary) Wang ( Computer Science, Statistics ) spent ten weeks improving upon the team’s web platform that allows users to examine contraceptive use in low and middle income (LMIC) countries collected by the Demographic and Health Survey (DHS) contraceptive calendar. The team improved load times, data visualization latency, and increased the number of country surveys available in the platform from 3 to 55. The team also created a new app that allows users to explore the results of machine learning using this big data set.

This project will continue into the academic year via Bass Connections where student teams will refine the machine learning model results and explore the question of whether and how policymakers can use these tools to improve family planning in LMIC settings.

 

Click here to view the Executive Summary

 

Faculty Lead: Megan Huchko

Project Manager: Amy Finnegan

Nathaniel Choe (ECE) and Mashal Ali (Neuroscience) spent ten weeks developing machine-learning tools to analyze urodynamic detrusor pressure data of pediatric spina bifida patients from the Duke University Hospital. The team built a pipeline that went from raw time series data to signal analysis to dimension reduction to classification, and has the potential to assist in clinician diagnosis.

 

Click here to read the Executive Summary

 

Faculty Leads: Wilkins Aquino, Jonathan Routh

Project Manager: Zekun Cao

Varun Nair (Economics, Physics), Paul Rhee (Computer Science), Jichen Yang (Computer Science, ECE), and Fanjie Kong (Computer Vision) spent ten weeks helping to adapt deep learning techniques to inform energy access decisions.

 

Click here to read the Executive Summary

 

Faculty Lead: Kyle Bradbury

Project Manager: Fanjie Kong

Yoav Kargon (Mechanical Engineering) and Tommy Lin (Chemistry, Computer Science) spent ten weeks working with data from the Water Quality Portal (WQP), a large national dataset of water quality measurements aggregated by the USGS and EPA. The team went all the way from raw data to the production of Pondr, an interactive and comprehensive tool built with R Shiny that permits users to investigate and visualize data coverage, values, and trends from the WQP.

 

Click here to read the Executive Summary

 

Faculty Lead: Jim Heffernan

Project Manager: Nick Bruns

Marco Gonazales Blancas (Civil Engineering) and Mengjie Xiu (Masters, BioStatistics) spent ten weeks building tools to help Duke reduce its energy footprint and achieve carbon neutrality by 2024. The team processed and analyzed troves of utility consumption data and then created practical monthly energy use reports for each school at Duke. These reports show historical usage trends, provide energy benchmarks for comparison, and make practical suggestions for energy savings.

Click here to read the Executive Summary

 

Faculty Lead: Billy Pizer

Project Manager: Sophia Ziwei Zhu

Cathy Lee (Statistics) and Jennifer Zheng (Math, Emory University) spent ten weeks building tools to help Duke University Libraries better understand its journal purchasing practice. Using a combination of web-scraping and data-merging algorithms, the team created a dashboard to help library strategists visualize and optimize journal selection.

 

Click here to read the Executive Summary

 

 

 

 

Faculty Leads: Angela Zoss, Jeff Kosokoff

Project Manager: Chi Liu

 Micalyn Struble (Computer Science, Public Policy), Xiaoqiao Xing (Economics), and Eric Zhang (Math) spent ten weeks exploring the use of neuroscience as evidence in criminal trials. Working with a large set of case files downloaded from WestLaw, the team used natural language processing to build a predictive model that has the potential to automate the process of locating relevant-to-neuroscience cases from databases.

 

Click here to read the Executive Summary

 

Faculty Lead: Nita Farahany

Project Manager: William Krenzer

A team of students will use a variety of data sets and mapping technologies to determine a feasible location for a deep-sea memorial to the transatlantic slave trade. While scholars have studied the overall mortality of the slave trade, little is known about where these deaths occurred. New mapping technologies can begin to supply this data. Led by English professor Charlotte Sussman, in association with the Representing Migrations Humanities Lab, this team will create a new database that combines previously-disparate data and archival sources to discover where on their journeys enslaved persons died, and then to visualize these journeys. This project will employ the resources of digital technologies as well as the humanistic methods of history, literature, philosophy, and other disciplines. The project welcomes students from a broad range of disciplines: computer science; mathematics; English and literature; history; African and African American studies; philosophy; art history; visual and media studies; geography; climatology; and ocean science.

 

Image credit:

J.M.W. Turner, Slave Ship, 1840, Museum of Fine Arts, Boston (public domain)

Faculty Lead: Charlotte Sussman

Project Manager: Emma Davenport

Ellis Ackerman (Math, NCSU), Rodrigo Araujo (Computer Science), and Samantha Miezio (Public Policy) spent ten weeks building tools to help understand the scope, cause, and effects of evictions in Durham County. Using evictions data recorded by the Durham County Sheriff’s Department and demographic data from the American Community Survey, the team investigated relationships between rent and evictions, created cost-benefit models for eviction diversion efforts, and built interactive visualizations of eviction trends. They had the opportunity to consult with analytics professionals from DataWorks NC.

Project Leads: Tim Stallmann, John Killeen, Peter Gilbert

Project Manager: Libby McClure

 

The American public first encountered the term “genocide” in a Washington Post op-ed published in 1944; since then, the word’s meaning has been circulated, debated, and shaped by numerous forces, especially by words and images in newspapers. With the support of Dr. Priscilla Wald (English), a team of students led by Nora Nunn (English graduate student) and Astrid Giugni (English and ISS) will analyze how U.S. mass media—particularly newspapers—enlist text and imagery such as press photographs to portray genocide, human rights, and crimes against humanity from World War II to the present. From the Holocaust to Cambodia, from Rwanda to Myanmar, such representation has political consequences. If time allows, students will also study the representation of collective violence in Hollywood film, querying the relationship between human rights and genre. The implications of these findings could inform future coverage of human rights-related issues at home and abroad.

Faculty Leads: Nora Nunn, Astrid Giugni

How Much Profit is Too Much Profit?

Chris Esposito (Economics), Ruoyu Wu (Computer Science), and Sean Yoon (Masters, Decision Sciences) spent ten weeks building tools to investigate the historical trends of price gouging and excess profits taxes in the United States of America from 1900 to the present. The team used a variety of text-mining methods to create a large database of historical documents, analyzed historical patterns of word use, and created an interactive R Shiny app to display their data and analyses.

Click here to read the Executive Summary

 

(cartoon from The Masses July 1916)

Faculty Lead: Sarah Deutsch

Project Manager: Evan Donahue

Maria Henriquez (Computer Science, Statistics) and Jacob Sumner (Biology) spent ten weeks building tools to help the Michael W. Krzyzewski Human Performance Lab best utilize its data from Duke University student athletes. The team worked with a large collection of athlete strength, balance, and flexibility measurements collected by the lab. They improved the K Lab’s data pipeline, created a predictive model for injury risk, and developed interactive web-based individualized injury risk reports.

Click here to read the Executive Summary

Faculty Lead: Dr. Tim Sell
Project Manager: Brinnae Bent

 

 

Vincent Wang (Computer Science, CE), Karen Jin (Bio/Stats), and Katherine Cottrell (Computer Science) spent ten weeks building tools to educate the public about lake dynamics and ecosystem health. Using data collected over a period of 50 years at the Experimental Lake Area (ELA) in Ontario, the team preprocessed and merged datasets, made a series of data visualizations, and produced an interactive website using R Shiny.

Click here to read the Executive Summary

 

Faculty Lead: Kateri Salk

Project Manager: Kim Bourne

Vivek Sahukar (Masters, Data Science), Yuval Medina (Computer Science), and Jin Cho (Computer Science/Electrical & Compter Engineering) spent ten weeks creating tools to help augment the experience of users in the StreamPULSE community. The team created an interactive guide and used data sonification methods to help users navigate and understand the data, and they used a mixture of statistical and machine-learning methods to build out an outlier detection and data cleaning pipeline.

Click here to read the Executive Summary

Faculty Leads: Emily Bernhardt, Jim Heffernan

Project Managers: Alice Carter, Michael Vlah

Aidan Fitzsimmons (Public Policy, Mathematics, Electrical & Computer Engineering), Joe Choo (Mathematics, Economics) and Brooke Scheinberg (Mathematics) spent ten weeks partnering with the Durham Crisis Intervention Team, the Criminal Justice Resource Center, and the Stepping Up Initiative. Utilizing booking data of 57,346 individuals provided by the Durham County Jail, this team was able to create visualizations and predictive models that illustrate patterns of recidivism, with a focus on the subset of the population with serious mental illness (SMI). These results could assist current efforts in diverting people with SMI from the criminal justice system and into care.

Click here to read the Executive Summary

Faculty Lead: Nicole Schramm-Sapyta, Michele Easter

Project Manager: Ruth Wygle

Have you ever read or watched a movie and realized that you have seen the same story before?  How do you know if you are watching an adaptation? A team of students led by UNC-Chapel Hill graduate student Grant Glass, will develop means to track the movement of adaptations within contemporary culture through machine learning techniques. Drawing upon a variety of textual information drawn from historical and digital sources, the project team will have the opportunity to work with many different types of data. Students will identify features of different master narratives, which will be used to demonstrate how certain stories are modified and retold over and over again. By creating this training dataset, the team will use algorithms to identify adaptations in previously unidentified works. This will allow scholars to better understand at scale how certain narratives are adapted into new stories and forms.

Faculty Lead: Grant Glass

Project Manager: TBD

Jett Hollister (Mechanical Engineering) and Lexx Pino (Computer Science, Math) joined Economics majors Shengxi Hao and Cameron Polo in a ten week study of the late 2000s housing bubble. The team scraped, merged, and analyzed a variety of datasets to investigate different proposed causes of the bubble. They also created interactive visualizations of their data which will eventually appear on a website for public consumption.

Click here to read the Executive Summary

 

Faculty Lead: Lee Reiners

Project Manager: Kate Coulter

Cassandra Turk (Economics) and Alec Ashforth (Economics, Math) spent ten weeks building tools to help minimize the risk of trading electricity on the wholesale energy market. The team combined data from many sources and employed a variety of outlier-detection methods and other statistical tools in order to create a large dataset of extreme energy events and their causes. They had the opportunity to consult with analytics professionals from Tether Energy.

Click here to read Executive Summary

 

Project Lead: Eric Butter, Tether

Andre Wang (Math, Statistics), Michael Xue (Computer Science, ECE), and Ryan Culhane (Computer Science) spent ten weeks exploring the role played by emotion in speech-focused machine-learning. The team used a variety of techniques to build emotion recognition pipelines, and incorporated emotion into generated speech during text-to-speech synthesis.

Click here to read the Executive Summary

 

Faculty Leads: Vahid Tarokh, Jie Ding

Project Manager: Enmao Diao

Past Projects

Brooke Erikson (Economics/Computer Science), Alejandro Ortega (Math), and Jade Wu (Computer Science) spent ten weeks developing open-source tools for automatic document categorization, PDF table extraction, and data identification. Their motivating application was provided by Power for All’s Platform for Energy Access Knowledge, and they frequently collaborated with professionals from that organization.

Click here to read the Executive Summary

 

Jake Epstein (Statistics/Economics), Emre Kiziltug (Economics), and Alexander Rubin (Math/Computer Science) spent ten weeks investigating the existence of relative value opportunities in global corporate bond markets. They worked closely with a dataset provided by a leading asset management firm.

Click here for the Executive Summary

Maksym Kosachevskyy (Economics) and Jaehyun Yoo (Statistics/Economics) spent ten weeks understanding temporal patterns in the used construction machinery market and investigating the relationship between these patterns and macroeconomic trends.

They worked closely with a large dataset provided by MachineryTrader.com, and discussed their findings with analytics professionals from a leading asset management firm.

Click here to read the Executive Summary