Online Financial Behavior and the Internet of Things

Project Summary

Zijing Huang (Statistics, Finance), Artem Streltsov (Masters Economics), and Frank Yin (ECE, CompSci, Math) spent ten weeks exploring how Internet of Things (IoT) data could be used to understand potential online financial behavior. They worked closely with analytical and strategic personnel from TD Bank, who provided them with a massive dataset compiled by Epsilon, a global company that specializes in data-driven marketing.

Themes and Categories
Year
2017
Contact
Paul Bendich
Mathematics
bendich@math.duke.edu

Project Results: The team began by tying specific TD Bank products and potential products to specific financial response variables in the Epsilon data. Then, using advanced statistical and machine-learning techniques, they built models that teased out specific predictor variables, both financial and non-financial, that best illuminated relationships in the dataset. Finally, they storyboarded several potential ways to use Amazon Alexa data, or similar IoT sources, to give precisely targeted information about the relationship between a customer and these predictor variables. They finished their project with a presentation to senior leadership at TD Bank.

Click here for the Executive Summary

Project Lead: Brian Walsh

Faculty Leads: Robert CalderbankEmma RasielPaul Bendich

Project Managers: Shai GorksyBrooke Durham

 

Related People

Related Projects

A large and growing trove of patient, clinical, and organizational data is collected as a part of the “Help Desk” program at Durham’s Lincoln Community Health Center. Help Desk is a group of student volunteers who connect with patients over the phone and help them navigate to community resources (like food assistance programs, legal aid, or employment centers). Data-driven approaches to identifying service gaps, understanding the patient population, and uncovering unseen trends are important for improving patient health and advocating for the necessity of these resources. Disparities in food security, economic stability, education, neighborhood and physical environment, community and social context, and access to the healthcare system are crucial social determinants of health, which studies indicate account for nearly 70% of all health outcomes.

A team of students that worked together for a semester in the Mission Driven Startups class will obtain and analyze data to create a predictive maintenance model for F15-E Fighter Jets from Seymour Johnson Air Base. Using data provided by the Base, the Data+ team will evaluate the relationship between unscheduled maintenance and external factors such as weather, sortie hours between repairs, and failure frequency of aircraft components. These findings will then feed into a predictive maintenance model to enhance the Air Force Crew’s ability to anticipate maintenance needs, helping to minimize unscheduled aircraft downtime. 

 

Faculty Lead: Dr. Emma Rasiel

Client Lead: Lt. Devon Burger

Project Manger:  Vignesh Kumaresan

Most phenomena that data scientists seek to analyze are either spatially or temporally correlated. Examples of spatial and temporal correlation include political elections, contaminant transfer, disease spread, housing market, and the weather. A question of interest is how to incorporate the spatial correlation information into modeling such phenomena.

 

In this project, we focus on the impact of environmental attributes (such as greenness, tree cover, temperature, etc.) along with other socio-demographics and home characteristics on housing prices by developing a model that takes into account the spatial autocorrelation of the response variable. To this aim, we introduce a test to diagnose spatial autocorrelation and explain how to integrate spatial autocorrelation into a regression model

 

 

In this data exploration, students are provided with data collected from remote sensing, census, and Zillow sources. Students are tasked with conducting a regression analysis of real-estate estimates against environmental amenities and other control variables which may or may not include the spatial autocorrelation information.