How does DeepL work?

author_by DeepL Team

We are frequently asked how it is that DeepL Translator often works better than competing systems from major tech companies. There are several reasons for this. Like most translation systems, DeepL Translator translates texts using artificial neural networks. These networks are trained on many millions of translated texts. However, our researchers have been able to make many improvements to the overall neural network methodology, mainly in four areas.

Network architecture

It is well known that most publicly available translation systems are direct modifications of the Transformer architecture. Of course, the neural networks of DeepL also contain parts of this architecture, such as attention mechanisms. However, there are also significant differences in the topology of the networks that lead to an overall significant improvement in translation quality over the public research state of the art. We see these differences in network architecture quality clearly when we internally train and compare our architectures and the best known Transformer architectures on the same data.

Training data

Most of our direct competitors are major tech companies, which have a history of many years developing web crawlers. They therefore have a distinct advantage in the amount of training data available. We, on the other hand, place great emphasis on the targeted acquisition of special training data that helps our network to achieve higher translation quality. For this purpose, we have developed, among other things, special crawlers that automatically find translations on the internet and assess their quality.

Training methodology

In public research, training networks are usually trained using the “supervised learning” method. The network is shown different examples over and over again. The network repeatedly compares its own translations with the translations from the training data. If there are discrepancies, the weights of the network are adjusted accordingly. We also use other techniques from other areas of machine learning when training the neural networks. This also allows us to achieve significant improvements.

Network size

Meanwhile, we (like our largest competitors) train translation networks with many billions of parameters. These networks are so large that they can only be trained in a distributed fashion on very large dedicated compute clusters. However, in our research we attach great importance to the fact that the parameters of the network are used very efficiently. This is how we have managed to achieve a similar translation quality even with our smaller and faster networks. We can therefore also offer very high translation quality to users of our free service.

Of course, we are always on the lookout for very good mathematicians and computer scientists who would like to help drive development, further improve DeepL Translator, and break down language barriers around the world. If you also have experience with mathematics and neural network training, and if it fulfills you to work on a product that is used worldwide for free, then please apply to DeepL !