Under Review · Journal of Business & Economic Statistics

Data-Adaptive Inference
for Structural Sparsity

FDR control via Transformational Lasso & Stability Knockoffs under arbitrary transformations

Han Su · Qingyang Sun · Yinfei Kong · Gaorong Li

Beijing Normal Univ · Duke Univ · CSU Fullerton

Overview

Talk Outline

01

Background, Structural Types & Existing Methods

02

Problem Formulation & GLasso

03

TLasso, StatKnock & SATLasso Methods

04

Theoretical Properties (5 Theorems)

05

Simulation: Estimation & FDR Control

06

Alzheimer's Disease Application & Conclusion

01 · Background

From Traditional to Structural Sparsity

  • Traditional sparsity: only a few covariates have nonzero $\beta_j$ — standard Lasso recovers the support $S_\beta$
  • Structural sparsity: sparsity on transformations $\gamma = D\beta$ — the structure of $\beta$ is encoded by matrix $D$
  • Graph viewpoint: $D$ is defined via graph difference operators $\Delta_G^{(k)}$ over an undirected graph $G = (V, E)$ where $V = [p]$ vertices represent covariates

Examples of $D$:

• $D = I_p$: traditional variable selection

• $D = \Delta_G^{(1)}$: fused Lasso / change point detection

• $D = \Delta_G^{(2)}$: trend filtering / piecewise linear detection

• $D = [\Delta_G^{(1)T}, \Delta_G^{(2)T}]^T$: simultaneous testing of multiple structures

01 · Structural Types

Three Types of Structural Sparsity

Type 1: Piecewise Constant

D = ΔG(1) γ sparse: βi - βj = 0 for most edges

Incidence operator. On chain graphs this becomes fused Lasso for change point detection.

Type 2: Piecewise Linear

D = ΔG(2) γ sparse: local graph curvature is zero at most vertices

Second-order graph difference. It detects kinks in piecewise linear trends.

Type 3: Mixed / Arbitrary

D = [ΔG(1)T, ΔG(2)T]T test jumps and curvature together

Stack multiple operators for chain, lattice, and general graphs.

01 · Existing Methods

Current Approaches & Limitations

  • BH procedure: works only when $D = I_p$ and $X$ is orthogonal
  • BY procedure: handles dependency in $X$ but requires computing p-values — limited to specific $D$
  • GKnockoff: requires $\text{rank}(D) = m \leq p$ — fails when transformations exceed covariates
  • Split Knockoff: requires $m + p \leq n$ — very restrictive in high dimensions

What's missing for arbitrary $D_{m \times p}$:

• No method handles $m \gg n$ (ultra-high dimensional $\gamma$)

• No finite-sample FDR guarantees for general transformation matrices

• No support for higher-order structures (piecewise linear, polynomial)

• No framework for testing multiple operators simultaneously

01 · Our Contribution

A Unified Solution

TLassomake arbitrary $D$ estimable
StatKnockadd finite-sample FDR
SATLassoscreen ultra-high $m$
Recyclecombine screening + testing
EstimatorTLasso

Makes structural sparsity estimable for arbitrary $D$ via lifting and noise injection.

TestingStatKnock

Adds finite-sample FDR control on the TLasso-projected design.

ScaleSATLasso

Screens ultra-high dimensional $m$ while keeping true transformation signals.

Final stepRecycle

Screen in $\gamma$-space, then reuse $\beta$ information for post-screening FDR.

01 · Application Context

Motivating Example: Alzheimer's Brain MRI

  • Goal: identify structural brain changes related to memory decline in AD patients
  • Model (5.1): $D = I_p$ — detect abnormal atrophy/expansion in individual brain regions
  • Model (5.2): $D = \Delta_G^{(1)}$ — detect atypical connectivity between neighboring regions during AD progression
  • Model (5.3): $D = \Delta_G^{(2)}$ — detect regions whose effects deviate from the average of their adjacent regions

Data: 65 AD patients from ADNI database, T1-weighted MRI segmented via MRICloud into 279 brain regions (Level 5), ADNI-MEM memory score as response.

Challenge: $n = 65$, $p = 279$, and under Model (5.2) $m = 378$ edges — a genuine high-dimensional structural sparsity problem requiring FDR control.

02 · Setup

Problem Formulation

$$Y = X^T\beta + \varepsilon, \quad \gamma = D\beta$$
Data$Y, X$
Structure$D \in \mathbb{R}^{m \times p}$
Targetsparse $\gamma = D\beta$
Error Control$\text{FDR} \leq q$
Sparsity object$\gamma = D\beta$

We select structural changes, not just raw coefficients.

Active set$S = \{j:\gamma_j \neq 0\}$

The goal is to recover true transformed signals.

Error goal$\text{FDR} \le q$

Discover signals while controlling false discoveries.

$$\min_{\beta,\gamma} \frac{1}{2n}\|Y-X\beta\|_2^2 + \lambda\|\gamma\|_1,\quad \text{s.t. }\gamma=D\beta$$
03 · Method

TLasso — Core Idea

Idea: ordinary GLasso forces γ to live in the lower-dimensional space Col(D). TLasso lifts the model, then projects away nuisance directions so γ behaves like a standard Lasso target.

ObstacleConstrained $\gamma$

$\gamma=D\beta$ lives in Col$(D)$, so it is not a free $m$-vector when rank$(D)

Step 1Inject noise

Augment $D$ to full row rank: $D_+=[D,N]$

Step 2Lift model

Work with $\omega_+=(\omega^T,\omega_N^T)^T$ so $\gamma=D_+\omega_+$.

Step 3Project

Remove nuisance directions and solve standard Lasso directly in $\gamma$.

$$\min_{\omega_+,\gamma} \frac{1}{2n}\|Y - X_+\omega_+\|_2^2 + \lambda\|\gamma\|_1, \quad \text{s.t. } \gamma = D_+\omega_+$$
03 · Algorithm

TLasso Algorithm

Step 1

Construct $N$ via QR decomposition on rank-$r$ matrix $D$; form $D_+ = [D, N]$ with full row rank $m$

Step 2

Generate auxiliary noise $X_N$; obtain augmented design $X_+ = [X, X_N]$

Step 3

Apply projection technique to eliminate nuisance parameters: transform $(X_+, Y)$ to $(\tilde{X}, \tilde{Y})$

Step 4

Solve standard Lasso on projected data: $\hat{\gamma} = \arg\min_\gamma \frac{1}{2n}\|\tilde{Y} - \tilde{X}\gamma\|_2^2 + \lambda\|\gamma\|_1$

03 · FDR Control

StatKnock — Stability Knockoffs

Step 1: Knockoff Construction

Construct transformational knockoff matrix $\widetilde{X}$ based on projected design $\tilde{X}$ from TLasso, satisfying pairwise exchangeability: swapping $\tilde{X}_j$ and $\widetilde{X}_j$ for null $j$ does not change the joint distribution $[\tilde{X}, \widetilde{X}]$.

Step 2: Stability Selection via 'AND' Rule

Subsample data $L = 100$ times; for each subsample, run TLasso on both $[\tilde{X}, \widetilde{X}]$ and $[\widetilde{X}, \tilde{X}]$. Compute importance $W_j$ = probability that feature $j$ is selected in both halves — more powerful than LSM/LCD statistics.

Step 3: FDR Threshold

Apply knockoff filter: $\hat{S} = \{j : W_j \geq T\}$ where $T = \min\{t : \frac{|\{j: W_j \leq -t\}|}{|\{j: W_j \geq t\}| \vee 1} \leq q\}$, ensuring finite-sample FDR $\leq q$.

03 · Ultra-High Dimensions

SATLasso + Post-Screening FDR

SATLasso Screening

  • Adds $\ell_2$-penalty $\alpha\|\omega_N\|_2^2$ on noise to form ATLasso — controls spurious correlations in ultra-high dimensions
  • Sequential partial $\ell_1$-penalty: at each step, selects the most significant $\gamma_j$'s from the remaining candidates
  • GBIC stopping rule with complexity penalty $\eta > 2 - \log_m n'$ prevents overfitting
  • Sherman-Morrison recursive update avoids repeated matrix inversion

"$\gamma$-Screening & $\beta$-Recycling"

  • Two-stage procedure: first screen $\gamma$ via SATLasso, then apply StatKnock on screened set
  • Key advantage: screens on $\gamma$ (transformation) directly instead of $\beta$ (covariates)
  • Does not require $\beta$ to be sparse — critical since $\beta$ is typically dense in structural sparsity
  • Competitors (BY, GKnockoff, SplitKnockoff) screen on $\beta$ and lose power when $\beta$ is dense
04 · Theory

TLasso Theoretical Guarantees

Theorem 1 (Oracle Inequality). With $\lambda \geq (2\sigma/\kappa)\sqrt{2\log m/n}$: (a) TLasso support $\hat{S} \subseteq S$ — no false positives; (b) $\ell_\infty$ bound $\|\hat{\gamma}_S - \gamma_S\|_\infty \leq C_n\lambda$ — estimation error vanishes at rate $\sqrt{\log m / n}$.

Theorem 2 (Selection & Sign Consistency). Under minimal signal $\min_{j \in S}|\gamma_j| \geq \varsigma_n\sqrt{\log m/n}$, TLasso achieves $P(\hat{S} = S) \to 1$ — recovers the exact active set with correct signs asymptotically.

Theorem 3 (Pairwise Exchangeability). The constructed knockoffs satisfy: (a) $[\tilde{X}, \widetilde{X}]_{\text{swap}(S^c)} \overset{d}{=} [\tilde{X}, \widetilde{X}]$; (b) $Y \perp \widetilde{X}_{S^c} \mid \tilde{X}$ — enabling the knockoff filter to control FDR without knowing null distribution.

Key insight: all conditions (minimal eigenvalue, mutual incoherence) are imposed on the projected design $\tilde{X}$, which is constructed algorithmically — not on the raw design $X$.

04 · Theory

FDR Control & Screening Guarantees

Theorem 4 (Finite-Sample FDR). For any $q \in (0,1)$, StatKnock controls the modified FDR: $\text{mFDR}(\hat{S}) = E\left[\frac{|\hat{S} \cap S^c|}{|\hat{S}| + 1/q}\right] \leq q$. With threshold $T_1$, also controls the usual $\text{FDR}(\hat{S}) \leq q$.

Theorem 5 (Power Guarantee). If the base procedure $\Phi$ (TLasso) achieves selection consistency, then $\text{Power}(\hat{S}) = E[|\hat{S} \cap S| / |S|] \to 1$ as $n \to \infty$ — StatKnock discovers all true signals asymptotically.

Theorem 6 (Sure Screening). SATLasso satisfies $P(\hat{S}_\gamma \supseteq S) \to 1$ — retains all true signals while reducing dimensionality from $m = O(e^{n^\xi})$ to $O(n)$, enabling subsequent StatKnock application.

05 · Simulations

TLasso Estimation Performance

$n=300, p=150, m_1=20, A=0.5$ — Chain graph $G_1$ with $D=\Delta_{G_1}^{(1)}$ (piecewise constant)

MetricTLasso①TLasso②TLasso③GenLassoSplitLasso
$\ell_2$ error (lower is better)0.9470.9560.9421.3262.148
True Pos.19.7419.7719.7619.953.00
Model Size75.9476.6275.37147.573.14

Red numbers mark the three TLasso variants, which have the smallest estimation error among competing estimators.

TLasso achieves smallest $\ell_2$ error with compact model size. GenLasso overfits; SplitLasso too conservative. Results robust across 3 noise mechanisms.

05 · Simulations

SATLasso-StatKnock FDR & Power

Key Findings

FDR controlledAll methods stay near the nominal level $q=0.2$.
Power stableSATLasso-StatKnock keeps power high as $\beta$ becomes denser.
Dimension robustThe advantage grows when $m$ increases to 5000.
Average Power
Higher is better, after FDR control at $q=0.2$.
SATLasso0.92
BY0.48
Split0.31

Settings Tested

GraphsChain $G_1$ and $15\times10$ lattice $G_2$.
Dimensions$(n,m)=(500,1000)$ and $(500,5000)$.
CompetitorsBY, GKnockoff, and SplitKnockoff.
06 · Real Data

Alzheimer's Disease Brain MRI

Model 5.1$D=I_p$

Find abnormal atrophy/expansion in hippocampus, amygdala, and cingulate regions.

Model 5.2$D=\Delta_G^{(1)}$

Find atypical edges, including cingulate and frontal connections.

Model 5.3$D=\Delta_G^{(2)}$

Find higher-order deviations in corpus callosum genu and precuneus regions.

Data: 65 AD patients, 279 brain regions (Level 5), ADNI-MEM memory score, $L = 100$ subsamples for StatKnock

Graph: regions connected if they share the same Level 3 anatomical parent, yielding $m = 378$ edges for $\Delta_G^{(1)}$

Methods: SATLasso-StatKnock vs BY vs SplitKnockoff at FDR level $q = 0.2$

06 · Findings

AD Analysis Results

Limbicvalidated biomarkers
Frontaladditional discoveries
Symmetrybilateral pattern
Powerless conservative
01Limbic confirmed

Hippocampus, amygdala, and cingulate regions match established AD biomarkers.

02Beyond limbic

Additional frontal and corpus callosum signals appear under SATLasso-StatKnock.

03Bilateral pattern

Selected regions and connections are balanced across hemispheres.

04Less conservative

BY and SplitKnockoff miss higher-order structures found by Model (5.3).

Conclusion

Summary & Future Work

TLasso

Oracle inequality, selection & sign consistency for arbitrary $D$

StatKnock

Finite-sample FDR control via stability knockoffs

SATLasso

Sure screening for ultra-high dimensional $m$

AD Application

Novel insights into brain structural changes in Alzheimer's

Future directions: Adaptive FDR-controlled screening · Debiased TLasso for global testing · Extension to e-values & FWER control

Thank You — Questions?