Application of Linear Algebra in Infants’ Autism Detection

Project Summary

Dr. Guillermo Sapiro, professor in Pratt School of Engineering at Duke University, conducts ongoing autism research. Using image processing, he attempts to program a computer to detect whether babies (around eight to 14 months of age) display a sign of autism. This very early detection enables doctors to train these babies (when their brain plasticity is high) to behave in ways to counter the behavioral limitations autism imposes, thus allowing these babies to act more normally as they grow up. 

Themes and Categories
Paul Bendich

Graduate students: Edward Kim, Hyunsoo Kim, and Zhuoqing Chang 

  • Unusual blinking pattern is a possible sign of autism.
  • Goal: Develop a method to detect the eye state (open/closed) from a 30x30 pixel eye image.
  • Method: Use 7,000 training eye images to learn the relationship between each pixel and eye state; then test the system on 3,000 new images.
  • Students learn a method to solve an overdetermined system (using Moore-Penrose pseudoinverse to find the least squares solution).


One of the behavioral cues Dr. Sapiro looks for is the blinking of the babies’ eyes. This portion of the lecture notes focuses on the creation of an image processing algorithm for the detection of the blinking, in particular for the determination of whether an eye is open or closed, by applying concepts of linear algebra.

The input to the algorithm is a picture of an eye. Each input image consists of 30(width) x30(height)= 900 pixels. Each pixel carries a numerical value in the range of 0 to 255, which represents the combination light intensity detected by the red, blue, and green light receptor channels of a camera. The significance of each pixel varies. The image of an open eye will have two major dark edges around the eyes with a large pupil in the middle. On the other hand, the image of a closed eye will have only one dark edge with no pupil. There is a pattern associated with each of these two categories of images.

The challenge for the algorithm is to determine the pattern from the pixel data with a high probability of success. In our model, this is achieved with the aid of properly chosen weights for the individual pixels. The determination between an open or closed eye is then made in the following simple way. The algorithm multiplies the numerical pixel value by the pixel weight and then sums these products over all pixels. If the result of the summation is positive, the output of the algorithm is the integer ’1’ and the eye is determined to be open. If the sum is negative, the output of the algorithm is the integer ’-1’ and the eye is determined to be closed. 

Read more about this project (PDF).

Related Projects

Over the course of two, one and a half hour sessions we led students in the Duke Marine Lab Marine Ecology class (Biology 273LA) on a data expedition using the statistical programming environment R. We gave an introduction to big data, the role of big data in ecology, important things to consider when working with data (quality control, metadata, etc.), dealing with big data in R, what the Tidyverse is, and how to organize tidy data (see class PowerPoint). We then led a hands-on coding workshop where we explored an open-access citizen science dataset of aquatic plants along U.S. east coast (see dataset details below).

The goal of this Data Expedition was to introduce students to the exploration of social networks data using R. Students learned to load and plot a social network in R and then perform some basic analyses on two different networks: Hockey Fights in the National Hockey League in 2018-2019 and characters in Game of Thrones Season 3. Students used social network analysis to better understand who is connected to whom, how frequently they interact, and how they are interacting.

The data that students see in their statistics courses are often constrained to numeric and tabular data. However, there is an exciting field of data science and statistics known as text analysis. This expedition introduces students to the concept of treating text as data frames of words, and demonstrates how to perform basic analyses on bodies of text using R. Tweets of four Democratic candidates for the 2020 Primary are used as data, and demonstrated text analysis techniques in the expedition include comparisons of word frequencies, log-odds ratios for word usage, and pairwise word correlations.