Topology + Music

Project Summary

This data expedition introduced students to “sliding windows and persistence” on time series data, which is an algorithm to turn one dimensional time series into a geometric curve in high dimensions, and to quantitatively analyze hybrid geometric/topological properties of the resulting curve such as “loopiness” and “wiggliness.”

Themes and Categories
Year
Contact
Paul Bendich
bendich@math.duke.edu

Graduate student: Chris Tralie

Faculty instructors: Paul Bendich and Lisa Huettel

Course: Math 412

Students in this Data Expedition project:

  • Learned a 1D time series analysis technique for complex data
  • Used the “Loop Ditty” software Chris Tralie created to visualize geometric curves that represent the music, projected down to 3D
  • Were able to quantitatively analyze where vocals occurred in music by looking at loops, and that classical music is “smoother” than rock music
  • Got a safe introduction to topological data analysis techniques for analyzing the curve in its true high dimensional embedding
  • Discovered trade-offs between analyzing data visually after projection and analyzing data with more abstract tools in high dimensions

In this expedition, musical audio data was the 1D time series in question, which is a fun and relatable way to explore these complicated time series analysis algorithms. Musical audio data is high dimensional (44100 samples per second), noisy, complex, and highly repetitive. It’s the repetitive nature and the global/local descriptions of music that have been of particular interest in my personal research on music information retrieval. As a result, my data expedition was more about exploring “complex data” as opposed to “big data.” Therefore, most of my efforts curating the data were directed at creating a web-based graphical user interface, which I called “Loop Ditty” (http://www.loopditty.net), which turned music into visual curves in a way that would allow students to visually explore and manipulate complex patterns in the sound that would be difficult to glean from the waveform alone. Figure 1 shows a screenshot from the LoopDitty software

In addition to visual exploration in 3D, students also used tools from topological data analysis to quantitatively analyze the musical curves in high dimensions. Their task was to make observations in both domains.

Read the report to learn more (PDF).

Related Projects

A large and growing trove of patient, clinical, and organizational data is collected as a part of the “Help Desk” program at Durham’s Lincoln Community Health Center. Help Desk is a group of student volunteers who connect with patients over the phone and help them navigate to community resources (like food assistance programs, legal aid, or employment centers). Data-driven approaches to identifying service gaps, understanding the patient population, and uncovering unseen trends are important for improving patient health and advocating for the necessity of these resources. Disparities in food security, economic stability, education, neighborhood and physical environment, community and social context, and access to the healthcare system are crucial social determinants of health, which studies indicate account for nearly 70% of all health outcomes.

We led a 75-minute class session for the Marine Mammals course at the Duke University Marine Lab that introduced students to strengths and challenges of using aerial imagery to survey wildlife populations, and the growing use of machine learning to address these "big data" tasks.

Most phenomena that data scientists seek to analyze are either spatially or temporally correlated. Examples of spatial and temporal correlation include political elections, contaminant transfer, disease spread, housing market, and the weather. A question of interest is how to incorporate the spatial correlation information into modeling such phenomena.

 

In this project, we focus on the impact of environmental attributes (such as greenness, tree cover, temperature, etc.) along with other socio-demographics and home characteristics on housing prices by developing a model that takes into account the spatial autocorrelation of the response variable. To this aim, we introduce a test to diagnose spatial autocorrelation and explain how to integrate spatial autocorrelation into a regression model

 

 

In this data exploration, students are provided with data collected from remote sensing, census, and Zillow sources. Students are tasked with conducting a regression analysis of real-estate estimates against environmental amenities and other control variables which may or may not include the spatial autocorrelation information.