Themes
This initiative is part of the NSF’s Harnessing the Data Revolution (HDR) Big Idea activity, and all research and outreach efforts will focus on the three key themes below. Duke scientists have already produced significant scholarship on these topics, as reflected in the lists of recent publications and media coverage.
i. Scalable Algorithms with Uncertainty
In most application areas, it is of paramount importance to obtain an accurate quantification of uncertainty in conducting machine learning and statistical inferences.
This is particularly true for high-dimensional and complex data as the data collection process is inherently prone to uncertainty. Most methodology developed for these data focus on producing point estimates without any characterization of uncertainty. The lack of quantifying uncertainty leads to the risk of over-interpretation and contributes to the replicability crisis in science.
- Estimating Normalizing Constants for Log-Concave Distributions: Algorithms and Lower Bounds
- Efficient Posterior Sampling for High-Dimensional Imbalanced Logistic Regression
- Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets
- Diffusion Based Gaussian Process Regression via Heat Kernel Reconstruction
- Approximating Posteriors with High-Dimensional Nuisance Parameters via Integrated Rotated Gaussian Approximation
- Fast Moment Estimation for Generalized Latent Dirichlet Models
- Gaussian Mixture Models for Stochastic Block Models with Non-Vanishing Noise
- Online Algorithms for Rent-or-Buy with Expert Advice
- Dynamic Set Cover: Improved Algorithms and Lower Bounds
- Multi-Unit Supply-Monotone Auctions with Bayesian Valuation
- Elastic Caching
- Scaling Limit of the Stein Variational Gradient Descent: The Mean Field Regime
ii. Human Machine Interface
Data analytic and artificially intelligent (AI) systems are increasingly directly impacting society and human lives. Two technology based factors are driving this influence. The first factor is the application of data analytics to high-stakes application domains both at the individual level as well as the societal level through applications such as automated bail and parole decisions and the data analytics used to form gerrymandered voting districts, respectively. The second factor is the ubiquity and increased complexity of sensing technologies from social media to wearable devices to 3D imaging and their impact on data analysis from causal inference to medical applications such as clinical trials.
- Machine Learning to Screen for Autism in Children (WIRED Magazine)
- New Research Aims to Open the ‘Black Box’ of Computer Vision
- Quantifying Gerrymandering in North Carolina
- Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis
- A New Fully Automated Approach for Aligning and Comparing Shapes
- Duke-Volvo Team Develops First LiDAR System Capable of Long-range Detection and Classification
- This Looks Like That: Deep Learning for Interpretable Image Recognition
- Development and Assessment of Fully Automated and Globally Transitive Geometric Morphometric Methods, With Application to a Biological Comparative Dataset With High Interspecific Variation
- Algorithms to automatically quantify the geometric similarity of anatomical surfaces
- Learning Optimized Risk Scores
iii. Fundamental Limits
The fundamental limits of the algorithms and models proposed in the previous themes.. The types of analysis we consider are lower and upper bounds on conditions with theoretical guarantees, minimax rates for estimators, and approximation theory results for the complexity or expressibility of function classes induced by AI algorithms. Specifically, the challenges we consider include: (a) The fundamental limits of robust optimization with uncertain inputs; (b) Characterizing the statistical and approximation power of deep neural network architectures; and (c) The fundamental limits of causal inference in observational studies.
- The Geometry of Community Detection via the MMSE Matrix
- The All-or-Nothing Phenomenon in Sparse Linear Regression
- All-or-Nothing Phenomena: From Single-Letter to High Dimensions
- Gibbs Posterior Convergence and the Thermodynamic Formalism
- How Many Directions Determine a Shape and other Sufficiency Results for Two Topological Transforms