Topology + Music

Project Summary

This data expedition introduced students to “sliding windows and persistence” on time series data, which is an algorithm to turn one dimensional time series into a geometric curve in high dimensions, and to quantitatively analyze hybrid geometric/topological properties of the resulting curve such as “loopiness” and “wiggliness.”

Themes and Categories
Year
Contact
Paul Bendich
bendich@math.duke.edu

Graduate student: Chris Tralie

Faculty instructors: Paul Bendich and Lisa Huettel

Course: Math 412

Students in this Data Expedition project:

  • Learned a 1D time series analysis technique for complex data
  • Used the “Loop Ditty” software Chris Tralie created to visualize geometric curves that represent the music, projected down to 3D
  • Were able to quantitatively analyze where vocals occurred in music by looking at loops, and that classical music is “smoother” than rock music
  • Got a safe introduction to topological data analysis techniques for analyzing the curve in its true high dimensional embedding
  • Discovered trade-offs between analyzing data visually after projection and analyzing data with more abstract tools in high dimensions

In this expedition, musical audio data was the 1D time series in question, which is a fun and relatable way to explore these complicated time series analysis algorithms. Musical audio data is high dimensional (44100 samples per second), noisy, complex, and highly repetitive. It’s the repetitive nature and the global/local descriptions of music that have been of particular interest in my personal research on music information retrieval. As a result, my data expedition was more about exploring “complex data” as opposed to “big data.” Therefore, most of my efforts curating the data were directed at creating a web-based graphical user interface, which I called “Loop Ditty” (http://www.loopditty.net), which turned music into visual curves in a way that would allow students to visually explore and manipulate complex patterns in the sound that would be difficult to glean from the waveform alone. Figure 1 shows a screenshot from the LoopDitty software

In addition to visual exploration in 3D, students also used tools from topological data analysis to quantitatively analyze the musical curves in high dimensions. Their task was to make observations in both domains.

Read the report to learn more (PDF).

Related Projects

KC and Patrick led two hands-on data workshops for ENVIRON 335: Drones in Marine Biology, Ecology, and Conservation. These labs were intended to introduce students to examples of how drones are currently being used as a remote sensing tool to monitor marine megafauna and their environments, and how machine learning can be used to efficiently analyze remote sensing datasets. The first lab specifically focused on how drones are being used to collect aerial images of whales to measure changes in body condition to help monitor populations. Students were introduced to the methods for making accurate measurements and then received an opportunity to measure whales themselves. The second lab then introduced analysis methods using computer vision and deep neural networks to detect, count, and measure objects of interest in remote sensing data. This work provided students in the environmental sciences an introduction to new techniques in machine learning and remote sensing that can be powerful multipliers of effort when analyzing large environmental datasets.

This two-week teaching module in an introductory-level undergraduate course invites students to explore the power of Twitter in shaping public discourse. The project supplements the close-reading methods that are central to the humanities with large-scale social media analysis. This exercise challenges students to consider how applying visualization techniques to a dataset too vast for manual apprehension might enable them to identify for granular inspection smaller subsets of data and individual tweets—as well as to determine what factors do not lend themselves to close-reading at all. Employing an original dataset of almost one million tweets focused on the contested 2018 Florida midterm elections, students develop skills in using visualization software, generating research questions, and creating novel visualizations to answer those questions. They then evaluate and compare the affordances of large-scale data analytics with investigation of individual tweets, and draw on their findings to debate the role of social media in shaping public conversations surrounding major national events. This project was developed as a collaboration among the English Department (Emma Davenport and Astrid Giugni), Math Department (Hubert Bray), Duke University Library (Eric Monson), and Trinity Technology Services (Brian Norberg).

Understanding how to generate, analyze, and work with datasets in the humanities is often a difficult task without learning how to code or program. In humanities centered courses, we often privilege close reading or qualitative analysis over other methods of knowing, but by learning some new quantitative techniques we better prepare the students to tackle new forms of reading. This class will work with the data from the HathiTrust to develop ideas for thinking about how large groups and different discourse communities thought of queens of antiquity like Cleopatra and Dido.

Please refer to https://sites.duke.edu/queensofantiquity/ for more information.