Let's reproduce GPT-2 (124M) | Andrej Karpathy | Jun 9, 2024

www.youtube.com

Let's reproduce GPT-2 (124M) | Andrej Karpathy | Jun 9, 2024

www.youtube.com

ericjmorey@programming.devM to

Machine Learning@programming.dev · 5 months ago

- YouTube

www.youtube.com

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

Video description:

We reproduce the GPT-2 (124M) from scratch.

This video covers the whole process:

First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations.

Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.

You must log in or register to comment.

Chat