How LLM Works
A document completer model works like this:
user prompt:
model response:
A document generator model works like this:
user prompt:
model response:
Note the differences between the two above.
First model is just a document completer where it will only complete the prompt with what it finds with the highest possibility to become the next character. This is the model which we trained on the chunk of internet data, it's called the base model.
The second model is a document generator where it will generate a response more like a human based on the prompt question. This is the ChatGPT model.
The ChatGPT model is an inference model that can generate a response based on the prompt question. I'll say its 99% the base model, but with two extra steps of training: A fine-tuning step and a reinforcement learning from human feedback step.
Pre-training: Base Model
This constitutes the very core of the AI revolution and is where the magic truly lies.
Training a model is a process of feeding it with a lot of data and let it learn from it.
As described in the GPT-3 paper, the base model is trained on a large chunk of internet data. That's not an easy task for any individuals like you and me. It not only requires obtaining the data, but also requires a lot of computing power like GPU and TPU.
But don't worry, we can still learn to train a small GPT model on our own computer. I'll show you how to do it in the next topic.
The innovation behind LLM training lies in the introduction of the Transformer architecture, which enables the model to learn from vast quantities of data while preserving crucial contextual relationships between different parts of the input.
By maintaining these connections, the model can effectively infer new insights based on the provided contexts, whether they be individual words, sentences, paragraphs, or beyond. With this capability, LLM training has opened up new opportunities for natural language processing and generation tasks, allowing machines to better understand and respond to human communication.
The transformer architecture used to train the base model is shown below:
This is a neural-network-based model training with some old and new techniques: tokenization
, embedding
, position encoding
, feed-forward
, normalization
, softmax
, linear transformation
, and most importantly, multi-head attention
.
This part is that you and me are all mostly interested in. We want to clearly understand the idea behind the architecture and how exactly the training was done. So from next topic and beyond, we will start dig into the paper, code and mathematics that used in training the base model.
Fine-tuning: Train the Assistant
Fine-tuning is a very smart implementation. I guess it's first done by OpenAI. The idea is super simple but works intelligently: hire human labelers to create lots of Q&A conversation pairs (like 100k conversations). Then feed the model with the conversation pairs and let it learn from it.
This process is called Fine-tuning. You know what happens after those 100k sample conversations are trained into the model? The model will start response like a human!
Let's take a look at those sample labeled conversations:
- Human labeled Q&A
Q: What is your name?
A: My name is John.
- Human labeled Q&A
Q: What's the capital of China?
A: China's capital is Beijing.
- Human labeled Q&A
Q: Summarize the plot of the movie Titanic.
A: The movie Titanic is about a ship that sinks in the ocean.
Whoa, these sample Q&As are mocking the way we talk to each other.
By teaching the model these respond styles, the probability of the related contextual response will become very high and become response to a user's prompt. Through training the model in various conversational styles, we increase the likelihood that it will provide relevant and contextually appropriate responses to prompts.
This is how language models can appear so intelligent and human-like; by learning to mimic the rhythms and patterns of real-world conversations, they can convincingly simulate a back-and-forth dialogue with users.
At this step, we can say we obtained an Assistant Model.
Below is a diagram of showing some highlights from pre-training the Base Model to the Fine-tuning the Assistant Model:
RLHF: Reinforcement Learning from Human Feedback
On January 2022 OpenAI published their works on Aligning language models to follow instructions. In their blog post they describe how the model was futhur fine-tuned with human feedback:
This one is a bit tricky. The idea is to let the model learn from human feedback. Instead of providing ~100k labeled Q&A pairs, they gather user's prompts and model responses, then let human rank them. Having the ranked conversations as the most-desired Q&A samples, then feed them to the model again and let it learn from it to improve its overall performance.
This process is introduced by OpenAI on its blog:
Here is a base model vs fine-tuned/RLHF response comparison:
prompt: | Explain the moon landing to a 6 year old in a few sentences. |
---|---|
| Explain the theory of gravity to a 6 year old. |
Explain the theory of relativity to a 6 year old in a few sentences. | |
Explain the big bang theory to a 6 year old. | |
Explain evolution to a 6 year old. | |
| People went to the moon, and they took pictures of what they saw, and sent them back to the earth so we could all see them. |
You can see without fine-tune and RLHF, the model is just a document completer.
Prompt Engineering
Even with the fine-tuning and RLHF, the model still needs some help to get the desired response. And this is where the prompt engineering comes in.
In simple words, we can carefully design the prompt to get the desired response from the model (sometimes even without fine-tuning).
If you are not trying to dive too much into the mathematics and code, then prompt engineering is the good way to pay more attention, because it can get the best out of an LLM model simply by typing a better prompt.
Now let's look at an example:
prompt:
output:
Let's try to improve it a bit:
prompt:
output:
By including some instructions in the prompt, the model will know what to do and what to response.
Let's look at another interesting example:
prompt:
output:
The answer is wrong. Correct answer should be 67. It looks like the model understands the questions but refers to a math calculation instead of logical inference.
Without fine-tuning and RLHF, we can get the correct answer solely by adding more example instructions to the prompt:
prompt:
output:
output 2:
Both answers are correct! We simply add some examples as logic explanations to the prompt, then ask same question again. The model now can understand the question and answer it correctly.
The above example was introduced by Wang et al. (2022)) that computing for the final answer involves a few steps.
Strong prompts can be used to guide the model to perform complex tasks, such as solving math problems or summarizing text. So prompt engineering
also plays a very important role of the LLM ecosystem.
For more about prompt engineering, here is a good prompting guide tutorial.
Summary
You have been reading down here, I'm sure it took a while to digest all the information, especially for those who are new to the LLM world. You're really ready to become an AI expert!
Now I believe we have covered sufficient ground in terms of basic concepts and background information. It is time for us to begin preparations to construct our very own Large Language Model. Enough with the theory, let's get hands-on as we move forward to the crucial component of the Transformers architecture.