The Transformer Architecture

Learn about the inner workings of LLMs.

Let’s start with a basic question: How can a computer understand and generate text? Over the years, we’ve relied on various neural network structures, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to tackle language problems. Then a new architecture arrived: the transformer. It revolutionized the field so dramatically that most cutting-edge large language models (LLMs) today, including GPT, BERT, and T5, are built with some variation of the transformer.

That said, it’s important to note that not all state-of-the-art LLMs use the same transformer layout:

  • GPT (Generative Pre-trained Transformer) is primarily decoder-only.

  • BERT is an encoder-only model.

  • T5 (and many other text-to-text models) still employs a full encoder-decoder approach.

Get hands-on with 1400+ tech skills courses.