Generalization in deep learning remains poorly understood, as neural networks fall outside the framework of classical statistical learning theory. To make progress on this question, research has focused on controlled tasks, such as modular addition, as a testbed for generalization. On this task, models exhibit grokking, i.e. a delayed onset of generalization after training loss has converged. Prior work has identified empirical regularities in learned representations associated with this transition, but the mapping between representation structure and generalization behavior remains empirical and descriptive. We lack a predictive theory of why and when generalization occurs. In this work, we provide such a predictive theory for the modular addition task. We introduce the notion of canonical representation of a task: the representation determined by the target function prior to training which is needed for perfect generalization. For modular addition, the canonical representation is derivable explicitly from the group structure of the task. We then define \representational deviation as the alignment of the learned representation to the canonical representation. From this, we derive that generalization up to a chosen margin requires the representational deviation to fall below a threshold. We finally provide a set of reproducible experiments which empirically confirm the above findings and offer a regularizer to accelerate the grokking transition.
**Matthieu Tehenan is a PhD candidate in NLIP Group, Department of Computer Science & Technology, supervised by Prof Andreas Vlachos.**
