Time Series in Finance

Project Summary

Matt and Ken led two labs for the engineering section of STA 111/130, an introductory course in statistics and probability. The lab assignments were written by Matt and Ken in order to bridge the gap between introductory linear regression, which is often explained in terms of a static, complete dataset, and time series analysis, which is not a common topic in introductory courses. 

Themes and Categories
Year

Graduate students: Matt Johnson and Ken McAlinn 

Faculty instructor: David Banks

Course: STA 130 Probability and Statistics in Engineering 

Summary

Over the course of the two modules, students:

  • Learned introductory topics in finance, including returns, log returns, and the Fama/French model for explaining market returns,
  • Learned about and applied autoregressive time series models,
  • Applied linear regression to a new topic, and
  • Applied new functionality in the MATLAB programming language.

Notably, as these topics were presented in a lab setting, little if any time was spent on a lecture. Instead, the material was presented to students through the lab, with students deriving concepts and properties on their own as a part of the assignment (with some help from Matt and Ken when necessary). Students applied the above methods to a dataset containing daily returns for the Standard & Poor’s 500 Index as well as the returns for the three factors included in the Fama and French Three Factor Model (the original dataset, consisting of tick-level data for Japanese stocks, was considered somewhat overwhelming to introduce, clean, and analyze in a single class session). The main topics covered were those important to fitting a linear regression, including:

  • Identifying predictors,
  • Checking for collinearity among predictors,
  • Removing unnecessary predictors using forward selection.

These topics were related to the autocorrelation and partial autocorrelation functions relevant in time series analysis, and especially with regards to autoregressive time series.

Data Description

For Data Expeditions, we have collected two datasets.

The first consists of daily closing prices of the S&P 500 from 1/4/1995-12/31/2014 and market factors.

These are:

  • Date: Trading date, ranging from 1/4/1995 to 12/31/2014.
  • Adj Close: The adjusted closing price of the S&P 500 (adjustments account for dividends, distribu- tions, corporate actions, etc. that occur before the open of the next day). For our purposes, this is the same as the closing price (the closing price is usually the last price at which a stock trades that day).
  • Mkt-RF: Market - Risk Free, compares the performance of the (risky) market as a whole to a risk-free asset (daily returns).
  • SMB: Small Minus Big, a factor portfolio that measures the performance of small cap stocks relative to large cap stocks (daily returns). When SMB goes up, small cap outperforms large cap.
  • HML: High Minus Low, a factor portfolio that measures the performance of “value” stocks compared to “growth” stocks (daily returns).

The second data set includes high frequency trading data for two stocks from the Japanese stock market, Sony (6758) and Toyota (7203), for the month of January in 2012. Each .csv file contains the trading data for that day, where the files are named as YYYYMMDD ####.csv (#### denote the stock number).

In each file, there are five columns. The first column is the stock number, the second is the hour and minute, the third is the second, the fourth is the 1/10 and 1/100th of the second, and the fifth the price. Every time an order is made on the market (either buy or sell), the time and price is recorded. 

Related Projects

In this two-day, virtual data expedition project, students were introduced to the APIM in the context of stress proliferation, linked lives, the spousal relationship, and mental and physical health outcomes.

Stress proliferation is a concept within the stress process paradigm that explains how one person’s stressors can influence others (Thoits 2010). Combining this with the life course principle of linked lives explains that because people are embedded in social networks, stress not only can impact the individual but can also proliferate to people close to them (Elder Jr, Shanahan and Jennings 2015). For example, one spouse’s chronic health condition may lead to stress-provoking strain in the marital relationship, eventually spilling over to affect the other spouse’s mental health. Additionally, because partners share an environment, experiences, and resources (e.g., money and information), as well as exert social control over each other, they can monitor and influence each other’s health and health behaviors. This often leads to health concordance within couples; in other words, because individuals within the couple influence each other’s health and well-being, their health tends to become more similar or more alike (Kiecolt-Glaser and Wilson 2017, Polenick, Renn and Birditt 2018). Thus, a spouse’s current health condition may influence their partner’s future health and spouses may contemporaneously exhibit similar health conditions or behaviors.

However, how spouses influence each other may be patterned by the gender of the spouse with the health condition or exhibiting the health behaviors. Recent evidence suggests that a wife’s health condition may have little influence on her husband’s future health conditions, but that a husband’s health condition will most likely influence his wife’s future health (Kiecolt-Glaser and Wilson 2017).

Fluid mechanics is the study of how fluids (e.g., air, water) move and the forces on them. Scientists and engineers have developed mathematical equations to model the motions of fluid and inertial particles. However, these equations are often computationally expensive, meaning they take a long time for the computer to solve.

 

To reduce the computation time, we can use machine learning techniques to develop statistical models of fluid behavior. Statistical models do not actually represent the physics of fluids; rather, they learn trends and relationships from the results of previous simulation experiments. Statistical models allow us to leverage the findings of long, expensive simulations to obtain results in a fraction of the time. 

 

In this project, we provide students with the results of direct numerical simulations (DNS), which took many weeks for the computer to solve. We ask students to use machine learning techniques to develop statistical models of the results of the DNS.

Female baboons occasionally exhibit large swellings on their behinds. Although these ‘sexual swellings’ may evoke disgust from human on-lookers, they provide important information to group members about a female’s reproductive state. To figure out what these sexual swellings mean and whether male baboons notice, we need to look at the data.