It has been hypothesized that neural networks with similar architectures trained on similar
data learn shared representations relevant to the learning task. We build on this idea by
extending the conceptual framework where representations learned across models trained
on the same data can be expressed as linear combinations of a universal set of basis
features. These basis features underlie the learning task itself and remain consistent across
models, regardless of scale. From this framework, we propose the Linear Representation
Transferability (LRT) Hypothesis—that there exists an affine transformation between
the representation spaces of different models.
pull down to refresh