WheelSonnet (GPT-2 from Scratch)
124M & 355M GPT-2 from scratch in PyTorch — sentiment, paraphrase detection, sonnet generation, and ONNX export.
WheelSonnet is a from-scratch implementation of the GPT-2 architecture (124M & 355M parameters) in PyTorch, fine-tuned for sentiment classification (SST-5 & CFIMDB), paraphrase detection (Quora), and Shakespearean sonnet generation, with additional instruction-tuning and ONNX export.
Implementation Details
- Multi-head causal self-attention, positional embeddings, Pre-LN transformer layers, residual streams, and a custom AdamW optimizer with decoupled weight decay.
- Reproduces core results from “Language Models are Unsupervised Multitask Learners” (Radford et al., 2019).
- Gradient flow tests, optimizer correctness assertions, and numerical stability sanity checks throughout training.
- Models published on HuggingFace:
nabin2004/WheelSonnet2-355M,nabin2004/WheelSonnet2-355M-it,nabin2004/WheenSonnet-355M-onnx.
Links: GitHub · HuggingFace Models