Models
New models, versions, and benchmark context
Embeddings vs generation: two things models do
"Embeddings and generation are different jobs. Knowing which one your problem needs is the fastest way to a system that actually works."
The cost of a token: how model pricing works
"Model bills are measured in tokens, not words or requests. Understanding what a token is, and which ones you pay for, is how you keep costs predictable."
Context windows explained: tokens, attention, and where long context breaks
A bigger context window is not the same as better memory. Here is what a context window really is, why long inputs degrade, and how to design around it.
What a "frontier model" actually means — and why benchmarks mislead
"Frontier model" is a moving label, not a spec. Here is what it really points to, why leaderboard scores rarely tell you what you need, and how to choose well anyway.
How large language models are trained, in plain language
Training a language model happens in stages, not one magic step. Here is what each stage does, in plain language, and why the order matters.
Open-weight vs open-source models: the real difference
"The two terms get used as synonyms and they are not. What you can download, inspect, and reuse differs sharply — and it affects what you are allowed to do."
Why models have knowledge cutoffs
A model's knowledge stops at a date because its knowledge is frozen at training time. Here is why that happens and how tools work around it.
Multimodal models: what "it can see" really means
When a model "sees" an image, it is not looking the way you do. Here is how multimodal models actually work, what that enables, and where they quietly fail.
Tokens and tokenization: why models see text strangely
Models don't read letters or words — they read tokens. Understanding that one fact explains spelling slips, odd costs, and why context limits work as they do.
Open vs closed models: how to choose for a real project
Open weights or a hosted API? The right answer depends on control, cost, and risk — not ideology. Here is a framework that survives contact with production.
Reasoning models: what "thinking" tokens do
"Reasoning models work through a problem before answering. That hidden working costs time and tokens — and pays off only on the right kind of task."
What model "parameters" actually are
"Billions of parameters" gets quoted like horsepower. Here is what a parameter really is, why the count matters, and why bigger isn't automatically better.
Quantization and distillation: making models smaller
"Two different ways to shrink a model — one changes its numbers, the other trains a smaller copy. Here is how each works and when to reach for it."
Mixture-of-experts models, explained simply
Mixture-of-experts lets a model be huge yet cheap to run by using only a slice of itself per input. Here is the idea, plainly, and why it matters.
Temperature, top-p, and sampling: controlling model output
Temperature and top-p decide how a model picks its next word. Knowing what each one really does lets you dial output from rigid to creative on purpose.
Why two runs of the same prompt differ
"Send the same prompt twice and you often get two different answers. That is by design, not a bug — and knowing why tells you when to control it."
Small models, big jobs: when on-device beats the cloud
The biggest model is rarely the right one. Here is why small, on-device models win whole classes of jobs — and how to tell when yours is one of them.
















