🎇 Yada’s ML Explanations 🎇

A little contextual primer

I often find that new ML architectures and methods are communicated in ways that are much too math-heavy than is necessary for the average MLE, especially as you step out of the core line of Transformers/LLMs/RLHF (which has some really great science communicators!). My goal is to distill both cutting-edge and foundational ML concepts into straightforward yet sufficiently rigorous explanations, with pointers to papers if you want a more rigorous understanding of the concepts.

If you have any feedback or corrections, I would highly appreciate if you could email me at [email protected]!

Concepts that are probably relevant to product builders

This section is for you if you are a product-facing ML engineer or software engineer, and are looking for ways to make these LLMs reliable enough for your product usecase.

Model Calibrations

ML Methods Explained Simply

This section is for you if you are interested in the science behind what is making these magical models go brrr. If there isn’t something similar already online, the goal here is to also provide code samples of minimal (but correct) implementations of these methods, similar to Kaparthy’s nanogpt.

Direct Preference Optimization (DPO):
Diffusion Transformers (DiT)

Research Directions Explained Practically

Weak to Strong (W2S) Generalization: