Duke TRIPODS was established in 2019 with a grant from the National Science Foundation to further our understanding of foundational principles in data science and to identify opportunities for innovation. Since then, as many as 25 faculty members – as well as graduate students and postdoctoral fellows – are working across disciplines to develop new tools and methods and to create dialogues in data science throughout North Carolina’s “Research Triangle.”
Project collaborators represent Duke departments including Computer Science, Electrical and Computer Engineering, Mathematics, and Statistical Science, and the team anticipates forging additional partnerships through engagement with the Rhodes Information Initiative at Duke and the regional Statistical and Mathematical Sciences Institute (SAMSI).
Themes
This initiative is part of the NSF’s Harnessing the Data Revolution (HDR) Big Idea activity, and all research and outreach efforts will focus on the three key themes below. Duke scientists have already produced significant scholarship on these topics, as reflected in the lists of recent publications and media coverage.
i. Scalable Algorithms with Uncertainty for Data Science
- Estimating Normalizing Constants for Log-Concave Distributions: Algorithms and Lower Bounds
- Efficient Posterior Sampling for High-Dimensional Imbalanced Logistic Regression
- Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets
- Diffusion Based Gaussian Process Regression via Heat Kernel Reconstruction
- Approximating Posteriors with High-Dimensional Nuisance Parameters via Integrated Rotated Gaussian Approximation
- Fast Moment Estimation for Generalized Latent Dirichlet Models
- Gaussian Mixture Models for Stochastic Block Models with Non-Vanishing Noise
- Online Algorithms for Rent-or-Buy with Expert Advice
- Dynamic Set Cover: Improved Algorithms and Lower Bounds
- Multi-Unit Supply-Monotone Auctions with Bayesian Valuation
- Elastic Caching
- Scaling Limit of the Stein Variational Gradient Descent: The Mean Field Regime
ii. Data Science at the Human-Machine Interface
- Machine Learning to Screen for Autism in Children
- New Research Aims to Open the ‘Black Box’ of Computer Vision
- Quantifying Gerrymandering in North Carolina
- Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis
- A New Fully Automated Approach for Aligning and Comparing Shapes
- Duke-Volvo Team Develops First LiDAR System Capable of Long-range Detection and Classification
- This Looks Like That: Deep Learning for Interpretable Image Recognition
- Development and Assessment of Fully Automated and Globally Transitive Geometric Morphometric Methods, With Application to a Biological Comparative Dataset With High Interspecific Variation
- Algorithms to automatically quantify the geometric similarity of anatomical surfaces
- Learning Optimized Risk Scores
iii. Fundamental Limits of Data Science
- The Geometry of Community Detection via the MMSE Matrix
- The All-or-Nothing Phenomenon in Sparse Linear Regression
- All-or-Nothing Phenomena: From Single-Letter to High Dimensions
- Gibbs Posterior Convergence and the Thermodynamic Formalism
- How Many Directions Determine a Shape and other Sufficiency Results for Two Topological Transforms
People
PIs
Sayan Mukherjee, PI
Professor of Statistical Science, Mathematics and Computer Science
Robert Calderbank, Co-PI
Director of the Rhodes Information Initiative at Duke
Charles S. Sydnor Distinguished Professor of Computer Science, Professor of Mathematics and Professor of Electrical and Computer Engineering
Cynthia Rudin, Co-PI
Professor of Computer Science
Prediction Analysis Lab
Jianfeng Lu, Co-PI
Associate Professor of Mathematics and Chemistry
Rong Ge, Co-PI
Assistant Professor of Computer Science