Z0-Inf: Zeroth Order Approximation for Data Influence

About

A critical aspect of analyzing and improving modern machine learning systems lies in understanding how individual training examples influence a model's predictive behavior. Estimating this influence enables critical applications, including data selection and model debugging; in particular, self-influence, which quantifies the influence of a training point on itself, has found many uses in data quality assessment and outlier detection. Existing methods for measuring data influence, however, are often impractical for large models due to low accuracy or prohibitive computational costs: most approaches either provide poor approximations or rely on gradients and inverse-Hessian computations that remain challenging to scale. In this work, we introduce a highly efficient zeroth-order approximation for estimating the influence of training data that requires only a fraction of the time and memory footprint of prior methods. Notably, our method relies solely on loss values of intermediate checkpoints on the training and test data, along with the checkpoints themselves, making it broadly applicable even when the loss function of interest is non-differentiable. Beyond its computational efficiency, our approach achieves superior accuracy in estimating self-influence and comparable or improved accuracy in estimating train-test influence for fine-tuned large language models, enabling scalable and practical analysis of how training data shapes model behavior.

Narine Kokhlikyan, Kamalika Chaudhuri, Saeed Mahloujifar• 2025

Related benchmarks

Task	Dataset	Result
Recall	Finance-Medical Dataset (test)	Top-5 auPRC28.65	37
Backdoor Attack Task Recall	WebQuestion howdy (test)	Top-5 auPRC0.3394	30
Junk Data Detection	Brain Rot (test)	Top-5 auPRC48.87	30
Junk Data Detection	Brain Rot Predict Future (test)	auPRC (Top 5)46.04	30
Predict Future	Finance–Medical Dataset	Top-5 auPRC39.75	30
Backdoor Attack Predict Future	Howdy!	Top-5 auPRC39.91	29
Data Attribution	Brain Rot Study Evaluation Suite	Brain Rot35.1	28
Backdoor Attack Task Predict Future	WebQuestion Howdy (Alpaca-howdy-52K distribution) (test)	Top-5 auPRC36.01	12
Backdoor Attack Task Recall	WebQuestion (test)	Top 5 auPRC0.3701	12
High-quality data selection	Brain Rot (test)	Top 5 auPRC0.4959	12

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord