Archives
Categories
Blogroll
Fine-tuning LLMs
From April until December 2024, I explored how you go about fine-tuning a 7B base model to handle chat. I started by training a smaller model locally, then found out how to train things on cloud computing environments, including multi-GPU training and training on machines where even a server-grade H100 GPU wasn't big enough to be able to train the model.
Here are the posts in this series:
-
Messing around with fine-tuning LLMs
(27 April 2024).
In the first post in the series, I scope out the task, and fine-tune a 0.5B model on my own machine.
-
Messing around with fine-tuning LLMs, part 2 -- to the cloud!
(28 April 2024).
Next, I take a look at cloud GPU providers and pick Lambda Labs. As a sanity check, I replicate my fine-tune of the 0.5B model on a single-GPU instance there.
-
Messing around with fine-tuning LLMs, part 3 -- moar GPUs
(15 May 2024).
I then work out how to train the 0.5B model faster by using multiple GPUs in parallel.
-
Messing around with fine-tuning LLMs, part 4 -- training cross-GPU.
(21 May 2024).
The first successful fine-tune of a 7B model -- but I have to offload the optimizer to the CPU. I'll need to find out why.
-
Messing around with fine-tuning LLMs, part 5 -- exploring memory usage
(5 July 2024).
Some initial local experiments into memory usage for the 0.5B model to get some ideas as to why I had to offload the optimiser.
-
Messing around with fine-tuning LLMs, part 6 -- measuring memory usage more systematically
(10 July 2024).
Measuring memory usage more systematically for the 0.5B model, also locally, to find out how it behaves with different sequence lengths.
-
Messing around with fine-tuning LLMs, part 7 -- detailed memory usage across sequence lengths for an 8B model
(16 August 2024).
Making similar measurements of memory usage at different sequence lengths for the 8B model.
-
Messing around with fine-tuning LLMs, part 8 -- detailed memory usage across batch sizes
(25 August 2024).
Measuring the effect of batch sizes on memory usage, with a sidetrack looking into Liger Kernel, a new and easy-to use replacement of the default CUDA kernels used for training that promises (and delivers) better memory usage and performance.
-
Messing around with fine-tuning LLMs, part 9 -- gradient checkpointing
(3 September 2024).
Investigating how gradient checkpointing works, in the hope that it might allow me to trade off GPU processing for memory usage and get a larger batch size (meaning that each training iteration was slower, but the overall train took less time). Sadly, those hopes were dashed.
-
Messing around with fine-tuning LLMs, part 10 -- finally training the model!
(22 December 2024).
The last in the series -- a deep dive into fine-tuning the 8B parameter LLM on instruction data, exploring memory usage, training strategies, and model deployment to Hugging Face.