GPT-2 Model from scratch (untrained)
I will share my understanding of building LLM from scratch. The codes, figure, and information I will share from the book "Build a Large Language Model (From Scratch)" by Sebastian Raschka. I added codes from Chap-3 & 4. Initially, I will check for the availability of GPU, then check for the version inside jupyter environemnt. The environment setup is prepared following the book and the supporting GitHub repo (and YouTube video as well). I will empty any existing cache memory. In [4]: import torch torch . cuda . is_available () Out[4]: True In [2]: from importlib.metadata import version print ( "matplotlib version:" , version ( "matplotlib" )) print ( "torch version:" , version ( "torch" )) print ( "tiktoken version:" , version ( "tiktoken" )) matplotlib version: 3.10.1 torch version: 2.6.0 tiktoken version: 0.9.0...