Deep convolutional neural networks (CNNs) have emerged as the state of the art for predicting neural activity in visual cortex. While such models outperform classical linear-nonlinear and wavelet-based representations, we currently do not know what computations they approximate. Here, we tested divisive normalization (DN) for its ability to predict spiking responses to natural images. We developed a model that learns the pool of normalizing neurons and the magnitude of their contribution end-to-end from data. In macaque primary visual cortex (V1), we found that our interpretable model outperformed linear-nonlinear and wavelet-based feature representations and almost closed the gap to high-performing black-box models. Surprisingly, within the classical receptive field, oriented features were normalized preferentially by features with similar orientations rather than non-specifically as currently assumed. Our work provides a new, quantitatively interpretable and high-performing model of V1 applicable to arbitrary images, refining our view on gain control within the classical receptive field.