1. Personal Projects
1) From-scratch PyTorch Implementations of AI papers
| yeondo | nonmun | naeyong |
|---|---|---|
| Vision | ||
| 2014 | VAE (Kingma and Welling) | [] Training on MNIST [] Visualizing Encoder output [] Visualizing Decoder output [] Reconstructing image |
| 2015 | CAM (Zhou et al.) | [] Applying GoogLeNet [] Generating 'Class Activatio Map' [] Generating bounding box |
| 2016 | Gatys et al. | [] Experimenting on input image size [] Experimenting on VGGNet-19 with Batch normalization [] Applying VGGNet-19 |
| YOLO (Redmon et al.) | [] Model architecture [] Visualizing ground truth on grid [] Visualizing model output [] Visualizing class probability map [] Loss function [] Training on VOC 2012 |
|
| DCGAN (Radford et al.) | [] Training on CelebA at 64 x 64 [] Sampling [] Interpolating in latent space [] Training on CelebA at 32 x 32 |
|
| Noroozi et al. | [] Model architecture [] Chromatic aberration [] Permutation set |
|
| Zhang et al. | [] Visualizing empirical probability distribution [] Model architecture [] Loss function [] Training |
|
| 2014 2017 |
Conditional GAN (Mirza et al.) WGAN-GP (Gulrajani et al.) |
[] Training on MNIST |
| 2016 2017 |
VQ-VAE (Oord et al.) PixelCNN (Oord et al.) |
[] Training on Fashion MNIST [] Training on CIFAR-10 [] Sampling |
| 2017 | Pix2Pix (Isola et al.) | [] Experimenting on image mean and std [] Experimenting on nn.InstanceNorm2d()[] Training on Google Maps [] Training on Facades [] higher resolution input image |
| CycleGAN (Zhu et al.) | [] Experimenting on random image pairing [] Experimenting on LSGANs [] Training on monet2photo [] Training on vangogh2photo [] Training on cezanne2photo [] Training on ukiyoe2photo [] Training on horse2zebra [] Training on summer2winter_yosemite |
|
| 2018 | PGGAN (Karras et al.) | [] Experimenting on image mean and std [] Training on CelebA-HQ at 512 x 512 [] Sampling |
| DeepLabv3 (Chen et al.) | [] Training on VOC 2012 [] Predicting on VOC 2012 validation set [] Average mIoU [] Visualizing model output |
|
| RotNet (Gidaris et al.) | [] Visualizing Attention map | |
| StarGAN (Yunjey Choi et al.) | [] Model architecture | |
| 2020 | STEFANN (Roy et al.) | [] FANnet architecture [] Colornet architecture [] Training FANnet on Google Fonts [] Custom Google Fonts dataset [] Average SSIM [] Training Colornet |
| DDPM (Ho et al.) | [] Training on CelebA at 32 x 32 [] Training on CelebA at 64 x 64 [] Visualizing denoising process [] Sampling using linear interpolation [] Sampling using coarse-to-fine interpolation |
|
| DDIM (Song et al.) | [] Normal sampling [] Sampling using spherical linear interpolation [] Sampling using grid interpolation [] Truncated normal |
|
| ViT (Dosovitskiy et al.) | [] Training on CIFAR-10 [] Training on CIFAR-100 [] Visualizing Attention map using Attention Roll-out [] Visualizing position embedding similarity [] Interpolating position embedding [] CutOut [] CutMix [] Hide-and-Seek |
|
| SimCLR (Chen et al.) | [] Normalized temperature-scaled cross entropy loss [] Data augmentation [] Pixel intensity histogram |
|
| DETR (Carion et al.) | [] Model architecture [] Bipartite matching & loss [] Batch normalization freezing [] Training on COCO 2017 |
|
| 2021 | Improved DDPM (Nichol and Dhariwal) | [] Cosine diffusion schedule |
| Classifier-Guidance (Dhariwal and Nichol) | [] Training on CIFAR-10 [] AdaGN [] BiGGAN Upsample/Downsample [] Improved DDPM sampling [] Conditional/Unconditional models [] Super-resolution model [] Interpolation |
|
| ILVR (Choi et al.) | [] Sampling using single reference [] Sampling using various downsampling factors [] Sampling using various conditioning range |
|
| SDEdit (Meng et al.) | [] User input stroke simulation [] Applying CelebA at 64 x 64 [] Total repeats. [] VE SDEdit. [] Sampling from scribble. [] Image editing only on masked regions. |
|
| MAE (He et al.) | [] Model architecture for self-supervised pre-training [] Model architecture for classification [] Self-supervised pre-training on ImageNet-1K [] Fine-tuning on ImageNet-1K [] Linear probing |
|
| Copy-Paste (Ghiasi et al.) | [] COCO dataset processing [] Large scale jittering [] Copy-Paste (within mini-batch) [] Visualizing data [] Gaussian filter |
|
| ViViT (Arnab et al.) | [] 'Spatio-temporal attention' architecture [] 'Factorised encoder' architecture [] 'Factorised self-attention' architecture |
|
| 2022 | CFG (Ho et al.) | |
| Language | ||
| 2017 | Transformer (Vaswani et al.) | [] Model architecture [] Visualizing position encoding |
| 2019 | BERT (Devlin et al.) | [] Model architecture [] Masked language modeling [] BookCorpus data processing [] SQuAD data processing [] SWAG data processing |
| Sentence-BERT (Reimers et al.) | [] Classification loss [] Regression loss [] Constrastive loss [] STSb data processing [] WikiSection data processing [] NLI data processing |
|
| RoBERTa (Liu et al.) | [] BookCorpus data processing [] Masked language modeling [] BookCorpus data processing ('SEGMENT-PAIR' + NSP) [] BookCorpus data processing ('SENTENCE-PAIR' + NSP) [] BookCorpus data processing ('FULL-SENTENCES') [] BookCorpus data processing ('DOC-SENTENCES') |
|
| 2021 | Swin Transformer (Liu et al.) | [] Patch partition [] Patch merging [] Relative position bias [] Feature map padding [] Self-attention in non-overlapped windows [] Shifted Window based Self-Attention |
| 2024 | RoPE (Su et al.) | [] Rotary Positional Embedding |
| Vision-Language | ||
| 2021 | CLIP (Radford et al.) | [] Training on Flickr8k + Flickr30k [] Zero-shot classification on ImageNet1k (mini) [] Linear classification on ImageNet1k (mini) |