Vision Transformer & Swin Transformer
From-scratch PyTorch implementations of ViT and Swin Transformer papers.
Vision Transformer (ViT)
Scratch PyTorch implementation of “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” (Dosovitskiy et al., 2020). Implements patch embedding, multi-head self-attention, and classification head.
Links: GitHub
Swin Transformer
From-scratch implementation of “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows” (Liu et al., 2021). Implements shifted window attention, patch merging, and hierarchical feature maps.
Links: GitHub