Phylogenetic inference aims at reconstructing the binary tree describing the evolution of a set of sequences descending from a common ancestor.
The high computational cost of state-of-the-art Maximum likelihood and Bayesian inference methods limits their usability under realistic evolutionary models.
Harnessing recent advances in likelihood-free inference and geometric deep learning, we introduce Phyloformer, a fast and accurate method for evolutionary distance estimation and phylogenetic reconstruction.
Sampling many trees and sequences under an evolutionary model, we train the network to learn a function that enables predicting the latter from the former.
Under a commonly used model of protein sequence evolution and with GPU acceleration, it outpaces fast distance methods while matching maximum likelihood accuracy, on simulated and empirical data.
Under more complex models, some of which include dependencies between sites, it outperforms other methods.
Our results pave the way for the adoption of sophisticated realistic models for phylogenetic inference.