Andrej Karpathy Recreates GPT From Scratch in Just 243 Lines of Python Code

Written by: Mane Sachin

Published on:

Follow Us

Andrej Karpathy has rolled out a small experimental project that’s getting a lot of attention for one simple reason — it makes GPT models look surprisingly understandable. The project, called microGPT, shows that the core of a generative pre-trained transformer can fit into just 243 lines of plain Python.

There’s no PyTorch or TensorFlow behind it. Not even NumPy. The code relies only on standard Python and basic mathematical operations. That’s intentional. Karpathy has said the goal was to capture the full algorithmic essence of a GPT-style model without layering on the engineering built for speed or scale.

What’s interesting is that nothing essential feels missing. The project includes a character-level tokenizer, token and positional embeddings, and multi-head self-attention tied together with residual connections. Instead of traditional layer normalization, it uses RMS normalization. A small automatic differentiation engine computes gradients during training, and parameters are updated using the Adam optimizer. Once trained, the model generates text step by step, predicting one token at a time.

For many developers, the appeal lies in how readable the entire file is. It’s not buried under thousands of lines of framework code. You can scroll through it and actually follow what’s happening. That accessibility has sparked a wave of appreciation online, especially among people trying to understand how large language models work beneath the surface.

Karpathy has long been known for breaking complex AI topics into digestible explanations. Beyond his roles in AI research and leadership, he has built a reputation for hands-on teaching and deep technical walkthroughs. MicroGPT feels like a continuation of that style — not a commercial product, not a performance benchmark, but a clarity exercise.

As language models continue to grow in size and capability, projects like this bring the focus back to fundamentals. They remind the community that behind the scale and infrastructure, the core ideas remain grounded in relatively concise logic. Sometimes, shrinking something down reveals more than scaling it up.

Also Read: Moxie Marlinspike Introduces a Privacy-Focused ChatGPT Alternative

Mane Sachin

My name is Sachin Mane, and I’m the founder and writer of AI Hub Blog. I’m passionate about exploring the latest AI news, trends, and innovations in Artificial Intelligence, Machine Learning, Robotics, and digital technology. Through AI Hub Blog, I aim to provide readers with valuable insights on the most recent AI tools, advancements, and developments.

For Feedback - aihubblog@gmail.com