Time Series in Finance

Project Summary

Matt and Ken led two labs for the engineering section of STA 111/130, an introductory course in statistics and probability. The lab assignments were written by Matt and Ken in order to bridge the gap between introductory linear regression, which is often explained in terms of a static, complete dataset, and time series analysis, which is not a common topic in introductory courses. 

Themes and Categories

Graduate students: Matt Johnson and Ken McAlinn 

Faculty instructor: David Banks

Course: STA 130 Probability and Statistics in Engineering 


Over the course of the two modules, students:

  • Learned introductory topics in finance, including returns, log returns, and the Fama/French model for explaining market returns,
  • Learned about and applied autoregressive time series models,
  • Applied linear regression to a new topic, and
  • Applied new functionality in the MATLAB programming language.

Notably, as these topics were presented in a lab setting, little if any time was spent on a lecture. Instead, the material was presented to students through the lab, with students deriving concepts and properties on their own as a part of the assignment (with some help from Matt and Ken when necessary). Students applied the above methods to a dataset containing daily returns for the Standard & Poor’s 500 Index as well as the returns for the three factors included in the Fama and French Three Factor Model (the original dataset, consisting of tick-level data for Japanese stocks, was considered somewhat overwhelming to introduce, clean, and analyze in a single class session). The main topics covered were those important to fitting a linear regression, including:

  • Identifying predictors,
  • Checking for collinearity among predictors,
  • Removing unnecessary predictors using forward selection.

These topics were related to the autocorrelation and partial autocorrelation functions relevant in time series analysis, and especially with regards to autoregressive time series.

Data Description

For Data Expeditions, we have collected two datasets.

The first consists of daily closing prices of the S&P 500 from 1/4/1995-12/31/2014 and market factors.

These are:

  • Date: Trading date, ranging from 1/4/1995 to 12/31/2014.
  • Adj Close: The adjusted closing price of the S&P 500 (adjustments account for dividends, distribu- tions, corporate actions, etc. that occur before the open of the next day). For our purposes, this is the same as the closing price (the closing price is usually the last price at which a stock trades that day).
  • Mkt-RF: Market - Risk Free, compares the performance of the (risky) market as a whole to a risk-free asset (daily returns).
  • SMB: Small Minus Big, a factor portfolio that measures the performance of small cap stocks relative to large cap stocks (daily returns). When SMB goes up, small cap outperforms large cap.
  • HML: High Minus Low, a factor portfolio that measures the performance of “value” stocks compared to “growth” stocks (daily returns).

The second data set includes high frequency trading data for two stocks from the Japanese stock market, Sony (6758) and Toyota (7203), for the month of January in 2012. Each .csv file contains the trading data for that day, where the files are named as YYYYMMDD ####.csv (#### denote the stock number).

In each file, there are five columns. The first column is the stock number, the second is the hour and minute, the third is the second, the fourth is the 1/10 and 1/100th of the second, and the fifth the price. Every time an order is made on the market (either buy or sell), the time and price is recorded. 

Related Projects

KC and Patrick led two hands-on data workshops for ENVIRON 335: Drones in Marine Biology, Ecology, and Conservation. These labs were intended to introduce students to examples of how drones are currently being used as a remote sensing tool to monitor marine megafauna and their environments, and how machine learning can be used to efficiently analyze remote sensing datasets. The first lab specifically focused on how drones are being used to collect aerial images of whales to measure changes in body condition to help monitor populations. Students were introduced to the methods for making accurate measurements and then received an opportunity to measure whales themselves. The second lab then introduced analysis methods using computer vision and deep neural networks to detect, count, and measure objects of interest in remote sensing data. This work provided students in the environmental sciences an introduction to new techniques in machine learning and remote sensing that can be powerful multipliers of effort when analyzing large environmental datasets.

This two-week teaching module in an introductory-level undergraduate course invites students to explore the power of Twitter in shaping public discourse. The project supplements the close-reading methods that are central to the humanities with large-scale social media analysis. This exercise challenges students to consider how applying visualization techniques to a dataset too vast for manual apprehension might enable them to identify for granular inspection smaller subsets of data and individual tweets—as well as to determine what factors do not lend themselves to close-reading at all. Employing an original dataset of almost one million tweets focused on the contested 2018 Florida midterm elections, students develop skills in using visualization software, generating research questions, and creating novel visualizations to answer those questions. They then evaluate and compare the affordances of large-scale data analytics with investigation of individual tweets, and draw on their findings to debate the role of social media in shaping public conversations surrounding major national events. This project was developed as a collaboration among the English Department (Emma Davenport and Astrid Giugni), Math Department (Hubert Bray), Duke University Library (Eric Monson), and Trinity Technology Services (Brian Norberg).

Understanding how to generate, analyze, and work with datasets in the humanities is often a difficult task without learning how to code or program. In humanities centered courses, we often privilege close reading or qualitative analysis over other methods of knowing, but by learning some new quantitative techniques we better prepare the students to tackle new forms of reading. This class will work with the data from the HathiTrust to develop ideas for thinking about how large groups and different discourse communities thought of queens of antiquity like Cleopatra and Dido.

Please refer to https://sites.duke.edu/queensofantiquity/ for more information.