Compressed Nonnegative Matrix Factorization Is Fast and Accurate

Project Summary

Nonnegative matrix factorization (NMF) has an established reputation as a useful data analysis technique in numerous applications. However, its usage in practical situations is undergoing challenges in recent years.The fundamental factor to this is the increasingly growing size of the datasets available and needed in the information sciences. To address this, in this work we propose to use structured random compression, that is, random projections that exploit the data structure, for two NMF variants: classical and separable. In separable NMF (SNMF) the left factors are a subset of the columns of the input matrix. We present suitable formulations for each problem, dealing with different representative algorithms within each one.

Themes and Categories
Contact
Mariano Tepper
Electrical and Computer Engineering
mariano.tepper@duke.edu

We show that the resulting compressed techniques are faster than their uncompressed variants, vastly reduce memory demands, and do not encompass any significant deterioration in performance. The proposed structured random projections for SNMF allow to deal with arbitrarily shaped large matrices, beyond the standard limit of tall-and-skinny matrices, granting access to very efficient computations in this general setting. We accompany the algorithmic presentation with theoretical foundations and numerous and diverse examples, showing the suitability of the proposed approaches.

See full figure and text (PDF).

Related People

Related Projects

Liuyi Zhu (Computer Science, Math), Gilad Amitai (Masters, Statistics), Raphael Kim (Computer Science, Mechanical Engineering), and Andreas Badea (East Chapel Hill High School) spent ten weeks streamlining and automating the process of electronically rejuvenating medieval artwork. They used a 14th-century altarpiece by Francescussio Ghissi as a working example.

Selen Berkman (ECE, CompSci), Sammy Garland (Math), and Aaron VanSteinberg (CompSci, English) spent ten weeks undertaking a data-driven analysis of the representation of women in film and in the film industry, with special attention to a metric called the Bechdel Test. They worked with data from a number of sources, including fivethirtyeight.com and the-numbers.com.

Students in the Performance and Technology Class create a series of performances that explore the interface between society and our machines. With the theme of the cloud to guide them, they have created increasingly complex art using digital media, microcontrollers, and motion tracking. Their work will be on display at the Duke Choreolab 2016.