A Model for Each Sample: Crazy?

Ben Lengerich
4 min readJan 2


Ben Lengerich

Why Sample-Specific Models?

Most ML systems (including modern deep learning) assume that one single model can summarize complex processes, as long as the model has large enough representational power. However, this assumption breaks down when the underlying processes are different for different samples (such as in heterogeneous processes like cancer genomics, or time-varying processes like the stock market).

Can we go to the other extreme? What if we could instead estimate a different model for each of the samples?! If we can assign every sample its own model parameters, we would immediately have:

  • Sample Embeddings: Sample-specific models are meaningful embeddings which represent the underlying process (not just the input or output data).
  • Data-Driven Clustering: We estimate as many models as there are data points. These models automatically form as many clusters as there are distinct underlying processes in the dataset.
  • Interpretability: Because the models only have to work for a single sample, they can be from a very simple model class (e.g. logistic regression).
  • Interpretability at any granularity: We can automatically zoom in or out from the population level to the individual level to understand

However, estimating a different model for each sample is (really) tough! Models are high-dimensional beasts; how can we estimate them effectively for a single sample? The answer is that we must devise ways to share statistical power between models. Personalized Regression does this by simultaneously estimating all of the personalized models in a soft multitask problem.

What is Personalized Regression?

Personalized Regression is our method to estimate sample-specific models in a soft multitask learning problem. In this view, every sample is a task and we use a different model parameter for each task (sample). We make this into a soft multitask problem by encouraging similar tasks (samples) to have similar model parameters.

As shown in Figure 1, personalized regression estimates a different model parameter for each sample. Personalized Regression does not make any parametric assumptions of how to generate the model parameters, which makes it much more flexible than other frameworks for sample-specific models. As a result, Personalized Regression follows the nonlinear model structure of many real-world datasets.

We first introduced Personalized Regression in our ISMB 2018 paper and made substantial improvements in our NeurIPS 2019 paper.

Sample-specific models are a structured way to generate personalized parameters.

How does Low-Rank Personalized Regression work?

Low-Rank Personalized Regression (as introduced in our NeurIPS 2019 paper) has two main strategies:

  • Distance-Matching Regularization: To make estimation of sample-specific models feasible, we need some information about how the sample relate to each other. In Personalized Regression, we assume that we are given some covariates but we aren’t told how these covariates are related to the personalization process. Thus, we must simultaneously (1) learn a distance metric over the covariates which tells us how the samples are related, (2) induce this sample relatedness in the sample-specific models. This is precisely what distance-matching regularization does: it seeks to match pairwise distances as measured in covariate space with pairwise distances as measured in model parameter space.
  • Low-Rank Collection: The collection of sample-specific models is constrained to be low-rank. We do this by generating each sample-specific model parameter as the inner product of a dictionary of models and a sample-specific vector \theta^(i)=Q^TZ^(i). We can specify the dimensionality of Z to increase or decrease the flexibility of the personalization.

Overall, this produces the full loss function:

Loss function for personalized regression

Where 𝜙 parameterizes the distance metric over covariates,

Loss function with distance-matching regularization

And 𝜓 is a generic regularizer (e.g. l2 loss) on the sample-specific model parameters, and D is the distance-matching regularizer:

Distance-matching regularization

Personalized models are optimized by initializing all models at a population estimator, and then allowing each personalized model to relax to its optimal position. See below for the optimization of the personalized models (with their final positions visualized in Figure 1):

Optimization of personalized parameters, ending in Figure 1

For more information on Personalized Regression, please check out:



Ben Lengerich

Postdoc @MIT | Writing about ML, AI, precision medicine, and quant econ