I wanted to build on what I'd learned in chapter 6 of Sebastian Raschka's book "Build a Large Language Model (from Scratch)". That chapter takes the LLM that we've built, and then turns it into a spam/ham classifier. I wanted to see how easy it would be to take another LLM -- say, one from Hugging Face -- and do the same "decapitation" trick on it: removing the output head and replacing it with a small linear layer that outputs class logits
Turns out it was really easy! I used
Qwen/Qwen3-0.6B-Base, and you can see the code
here.
The only real difference between our normal PyTorch LLMs and one based on Hugging
Face is that the return value when you call your model is a ModelOutput object with more to
it than just the output from the model itself. But it has a logits field on
it to get the raw output, and with that update, the code works largely unchanged.
The only other change I needed to make was to change the padding token from the fixed
50256 that the code from the book uses to tokenizer.pad_token_id.
ChatGPT wrote a nice, detailed README for it, so hopefully it's a useful standalone artifact.