pull down to refresh

It has been hypothesized that neural networks with similar architectures trained on similar data learn shared representations relevant to the learning task. We build on this idea by extending the conceptual framework where representations learned across models trained on the same data can be expressed as linear combinations of a universal set of basis features. These basis features underlie the learning task itself and remain consistent across models, regardless of scale. From this framework, we propose the Linear Representation Transferability (LRT) Hypothesis—that there exists an affine transformation between the representation spaces of different models.