We build neural networks in a modular and programmatic way using software libraries like PyTorch and JAX. But optimization theory has not caught up to the flexibility of this paradigm, and practical advances in neural net optimization are largely heuristics driven. In this talk we argue that, if we are to treat deep learning rigorously, then we must build our optimization theory programmatically and in lockstep with the neural network itself. To instantiate this idea, we propose the "modular norm", which is a norm on the weight space of general neural architectures. The modular norm is constructed by stitching together norms on individual tensor spaces as the architecture is constructed. The modular norm has several applications: automatic Lipschitz certificates for general architectures in both weights and inputs; automatic learning rate transfer across scale; more recently, we built the "duality theory" for the modular norm, leading to dualized optimizers like Muon, which have set speed records for training transformers. We are building the theory of the modular norm into a software library called Modula to ease the development and deployment of rigorous deep learning algorithms---you can find out more at https://modula.systems/.