Identifying the footprints of selection in coding sequences can inform about the importance and function of individual sites. Two types of methods have been used to do so: classically, analyses of the ratio of nonsynonymous to synonymous substitutions, and, more recently, approaches based on amino acid profiles. In both cases inference is typically performed in the Maximum Likelihood framework. This has two limitations: inference is costly to run, and is limited to probabilistic models of sequence evolution for which the likelihood function is not too complicated to compute. In this presentation, we will show our efforts to use deep learning to overcome both limitations. We will present the transformer-based architecture of our models, and results obtained on simulations in the detection of positive selection, relaxed selection, and changes in the direction of selection.