Under Review · Journal of Business & Economic Statistics

Data-Adaptive Inference
for Structural Sparsity

FDR control via Transformational Lasso & Stability Knockoffs under arbitrary transformations

Han Su · Qingyang Sun · Yinfei Kong · Gaorong Li

Beijing Normal Univ · Duke Univ · CSU Fullerton

Overview

Talk Outline

01

Background, Structural Types & Existing Methods

02

Problem Formulation & GLasso

03

TLasso, StatKnock & SATLasso Methods

04

Theoretical Properties (5 Theorems)

05

Simulation: Estimation & FDR Control

06

Alzheimer's Disease Application & Conclusion

01 · Background

From Traditional to Structural Sparsity

Traditional sparsity: only a few covariates have nonzero $\beta_j$ — standard Lasso recovers the support $S_\beta$
Structural sparsity: sparsity on transformations $\gamma = D\beta$ — the structure of $\beta$ is encoded by matrix $D$
Graph viewpoint: $D$ is defined via graph difference operators $\Delta_G^{(k)}$ over an undirected graph $G = (V, E)$ where $V = [p]$ vertices represent covariates

Examples of $D$:

• $D = I_p$: traditional variable selection

• $D = \Delta_G^{(1)}$: fused Lasso / change point detection

• $D = \Delta_G^{(2)}$: trend filtering / piecewise linear detection

• $D = [\Delta_G^{(1)T}, \Delta_G^{(2)T}]^T$: simultaneous testing of multiple structures

01 · Structural Types

Three Types of Structural Sparsity

Type 1: Piecewise Constant

D = Δ_G⁽¹⁾ γ sparse: β_i - β_j = 0 for most edges

Incidence operator. On chain graphs this becomes fused Lasso for change point detection.

Type 2: Piecewise Linear

D = Δ_G⁽²⁾ γ sparse: local graph curvature is zero at most vertices

Second-order graph difference. It detects kinks in piecewise linear trends.

Type 3: Mixed / Arbitrary

D = [Δ_G^(1)T, Δ_G^(2)T]^T test jumps and curvature together

Stack multiple operators for chain, lattice, and general graphs.

01 · Existing Methods

Current Approaches & Limitations

BH procedure: works only when $D = I_p$ and $X$ is orthogonal
BY procedure: handles dependency in $X$ but requires computing p-values — limited to specific $D$
GKnockoff: requires $\text{rank}(D) = m \leq p$ — fails when transformations exceed covariates
Split Knockoff: requires $m + p \leq n$ — very restrictive in high dimensions

What's missing for arbitrary $D_{m \times p}$:

• No method handles $m \gg n$ (ultra-high dimensional $\gamma$)

• No finite-sample FDR guarantees for general transformation matrices

• No support for higher-order structures (piecewise linear, polynomial)

• No framework for testing multiple operators simultaneously

01 · Our Contribution

A Unified Solution

TLassomake arbitrary $D$ estimable

StatKnockadd finite-sample FDR

SATLassoscreen ultra-high $m$

Recyclecombine screening + testing

EstimatorTLasso

Makes structural sparsity estimable for arbitrary $D$ via lifting and noise injection.

TestingStatKnock

Adds finite-sample FDR control on the TLasso-projected design.

ScaleSATLasso

Screens ultra-high dimensional $m$ while keeping true transformation signals.

Final stepRecycle

Screen in $\gamma$-space, then reuse $\beta$ information for post-screening FDR.

01 · Application Context

Motivating Example: Alzheimer's Brain MRI

Goal: identify structural brain changes related to memory decline in AD patients
Model (5.1): $D = I_p$ — detect abnormal atrophy/expansion in individual brain regions
Model (5.2): $D = \Delta_G^{(1)}$ — detect atypical connectivity between neighboring regions during AD progression
Model (5.3): $D = \Delta_G^{(2)}$ — detect regions whose effects deviate from the average of their adjacent regions

Data: 65 AD patients from ADNI database, T1-weighted MRI segmented via MRICloud into 279 brain regions (Level 5), ADNI-MEM memory score as response.

Challenge: $n = 65$, $p = 279$, and under Model (5.2) $m = 378$ edges — a genuine high-dimensional structural sparsity problem requiring FDR control.

02 · Setup

Problem Formulation

Y = X^T\beta + \varepsilon, \quad \gamma = D\beta

Data$Y, X$

Structure$D \in \mathbb{R}^{m \times p}$

Targetsparse $\gamma = D\beta$

Error Control$\text{FDR} \leq q$

Sparsity object$\gamma = D\beta$

We select structural changes, not just raw coefficients.

Active set$S = \{j:\gamma_j \neq 0\}$

The goal is to recover true transformed signals.

Error goal$\text{FDR} \le q$

Discover signals while controlling false discoveries.

\min_{\beta,\gamma} \frac{1}{2n}\|Y-X\beta\|_2^2 + \lambda\|\gamma\|_1,\quad \text{s.t. }\gamma=D\beta

03 · Method

TLasso — Core Idea

Idea: ordinary GLasso forces γ to live in the lower-dimensional space Col(D). TLasso lifts the model, then projects away nuisance directions so γ behaves like a standard Lasso target.

ObstacleConstrained $\gamma$

$\gamma=D\beta$ lives in Col$(D)$, so it is not a free $m$-vector when rank$(D)

Step 1Inject noise

Augment $D$ to full row rank: $D_+=[D,N]$

Step 2Lift model

Work with $\omega_+=(\omega^T,\omega_N^T)^T$ so $\gamma=D_+\omega_+$.

Step 3Project

Remove nuisance directions and solve standard Lasso directly in $\gamma$.

\min_{\omega_+,\gamma} \frac{1}{2n}\|Y - X_+\omega_+\|_2^2 + \lambda\|\gamma\|_1, \quad \text{s.t. } \gamma = D_+\omega_+

03 · Algorithm

TLasso Algorithm

Step 1

Construct $N$ via QR decomposition on rank-$r$ matrix $D$; form $D_+ = [D, N]$ with full row rank $m$

Step 2

Generate auxiliary noise $X_N$; obtain augmented design $X_+ = [X, X_N]$

Step 3

Apply projection technique to eliminate nuisance parameters: transform $(X_+, Y)$ to $(\tilde{X}, \tilde{Y})$

Step 4

Solve standard Lasso on projected data: $\hat{\gamma} = \arg\min_\gamma \frac{1}{2n}\|\tilde{Y} - \tilde{X}\gamma\|_2^2 + \lambda\|\gamma\|_1$

03 · FDR Control

StatKnock — Stability Knockoffs

Step 1: Knockoff Construction

Construct transformational knockoff matrix $\widetilde{X}$ based on projected design $\tilde{X}$ from TLasso, satisfying pairwise exchangeability: swapping $\tilde{X}_j$ and $\widetilde{X}_j$ for null $j$ does not change the joint distribution $[\tilde{X}, \widetilde{X}]$.

Step 2: Stability Selection via 'AND' Rule

Subsample data $L = 100$ times; for each subsample, run TLasso on both $[\tilde{X}, \widetilde{X}]$ and $[\widetilde{X}, \tilde{X}]$. Compute importance $W_j$ = probability that feature $j$ is selected in both halves — more powerful than LSM/LCD statistics.

Step 3: FDR Threshold

Apply knockoff filter: $\hat{S} = \{j : W_j \geq T\}$ where $T = \min\{t : \frac{|\{j: W_j \leq -t\}|}{|\{j: W_j \geq t\}| \vee 1} \leq q\}$, ensuring finite-sample FDR $\leq q$.

03 · Ultra-High Dimensions

SATLasso + Post-Screening FDR

SATLasso Screening

Adds $\ell_2$-penalty $\alpha\|\omega_N\|_2^2$ on noise to form ATLasso — controls spurious correlations in ultra-high dimensions
Sequential partial $\ell_1$-penalty: at each step, selects the most significant $\gamma_j$'s from the remaining candidates
GBIC stopping rule with complexity penalty $\eta > 2 - \log_m n'$ prevents overfitting
Sherman-Morrison recursive update avoids repeated matrix inversion

"$\gamma$-Screening & $\beta$-Recycling"

Two-stage procedure: first screen $\gamma$ via SATLasso, then apply StatKnock on screened set
Key advantage: screens on $\gamma$ (transformation) directly instead of $\beta$ (covariates)
Does not require $\beta$ to be sparse — critical since $\beta$ is typically dense in structural sparsity
Competitors (BY, GKnockoff, SplitKnockoff) screen on $\beta$ and lose power when $\beta$ is dense

04 · Theory

TLasso Theoretical Guarantees

Theorem 1 (Oracle Inequality). With $\lambda \geq (2\sigma/\kappa)\sqrt{2\log m/n}$: (a) TLasso support $\hat{S} \subseteq S$ — no false positives; (b) $\ell_\infty$ bound $\|\hat{\gamma}_S - \gamma_S\|_\infty \leq C_n\lambda$ — estimation error vanishes at rate $\sqrt{\log m / n}$.

Theorem 2 (Selection & Sign Consistency). Under minimal signal $\min_{j \in S}|\gamma_j| \geq \varsigma_n\sqrt{\log m/n}$, TLasso achieves $P(\hat{S} = S) \to 1$ — recovers the exact active set with correct signs asymptotically.

Theorem 3 (Pairwise Exchangeability). The constructed knockoffs satisfy: (a) $[\tilde{X}, \widetilde{X}]_{\text{swap}(S^c)} \overset{d}{=} [\tilde{X}, \widetilde{X}]$; (b) $Y \perp \widetilde{X}_{S^c} \mid \tilde{X}$ — enabling the knockoff filter to control FDR without knowing null distribution.

Key insight: all conditions (minimal eigenvalue, mutual incoherence) are imposed on the projected design $\tilde{X}$, which is constructed algorithmically — not on the raw design $X$.

04 · Theory

FDR Control & Screening Guarantees

Theorem 4 (Finite-Sample FDR). For any $q \in (0,1)$, StatKnock controls the modified FDR: $\text{mFDR}(\hat{S}) = E\left[\frac{|\hat{S} \cap S^c|}{|\hat{S}| + 1/q}\right] \leq q$. With threshold $T_1$, also controls the usual $\text{FDR}(\hat{S}) \leq q$.

Theorem 5 (Power Guarantee). If the base procedure $\Phi$ (TLasso) achieves selection consistency, then $\text{Power}(\hat{S}) = E[|\hat{S} \cap S| / |S|] \to 1$ as $n \to \infty$ — StatKnock discovers all true signals asymptotically.

Theorem 6 (Sure Screening). SATLasso satisfies $P(\hat{S}_\gamma \supseteq S) \to 1$ — retains all true signals while reducing dimensionality from $m = O(e^{n^\xi})$ to $O(n)$, enabling subsequent StatKnock application.

05 · Simulations

TLasso Estimation Performance

$n=300, p=150, m_1=20, A=0.5$ — Chain graph $G_1$ with $D=\Delta_{G_1}^{(1)}$ (piecewise constant)

Metric	TLasso①	TLasso②	TLasso③	GenLasso	SplitLasso
$\ell_2$ error (lower is better)	0.947	0.956	0.942	1.326	2.148
True Pos.	19.74	19.77	19.76	19.95	3.00
Model Size	75.94	76.62	75.37	147.57	3.14

Red numbers mark the three TLasso variants, which have the smallest estimation error among competing estimators.

TLasso achieves smallest $\ell_2$ error with compact model size. GenLasso overfits; SplitLasso too conservative. Results robust across 3 noise mechanisms.

05 · Simulations

SATLasso-StatKnock FDR & Power

Key Findings

FDR controlledAll methods stay near the nominal level $q=0.2$.

Power stableSATLasso-StatKnock keeps power high as $\beta$ becomes denser.

Dimension robustThe advantage grows when $m$ increases to 5000.

Average Power

Higher is better, after FDR control at $q=0.2$.

SATLasso0.92

BY0.48

Split0.31

Settings Tested

GraphsChain $G_1$ and $15\times10$ lattice $G_2$.

Dimensions$(n,m)=(500,1000)$ and $(500,5000)$.

CompetitorsBY, GKnockoff, and SplitKnockoff.

06 · Real Data

Alzheimer's Disease Brain MRI

Model 5.1$D=I_p$

Find abnormal atrophy/expansion in hippocampus, amygdala, and cingulate regions.

Model 5.2$D=\Delta_G^{(1)}$

Find atypical edges, including cingulate and frontal connections.

Model 5.3$D=\Delta_G^{(2)}$

Find higher-order deviations in corpus callosum genu and precuneus regions.

Data: 65 AD patients, 279 brain regions (Level 5), ADNI-MEM memory score, $L = 100$ subsamples for StatKnock

Graph: regions connected if they share the same Level 3 anatomical parent, yielding $m = 378$ edges for $\Delta_G^{(1)}$

Methods: SATLasso-StatKnock vs BY vs SplitKnockoff at FDR level $q = 0.2$

06 · Findings

AD Analysis Results

Limbicvalidated biomarkers

Frontaladditional discoveries

Symmetrybilateral pattern

Powerless conservative

01Limbic confirmed

Hippocampus, amygdala, and cingulate regions match established AD biomarkers.

02Beyond limbic

Additional frontal and corpus callosum signals appear under SATLasso-StatKnock.

03Bilateral pattern

Selected regions and connections are balanced across hemispheres.

04Less conservative

BY and SplitKnockoff miss higher-order structures found by Model (5.3).

Conclusion

Summary & Future Work

TLasso

Oracle inequality, selection & sign consistency for arbitrary $D$

StatKnock

Finite-sample FDR control via stability knockoffs

SATLasso

Sure screening for ultra-high dimensional $m$

AD Application

Novel insights into brain structural changes in Alzheimer's

Future directions: Adaptive FDR-controlled screening · Debiased TLasso for global testing · Extension to e-values & FWER control

Thank You — Questions?

Data-Adaptive Inferencefor Structural Sparsity