All Models Tools Research Use-cases Policy Tutorials

Models

New models, versions, and benchmark context

Embeddings vs generation: two things models do

"Embeddings and generation are different jobs. Knowing which one your problem needs is the fastest way to a system that actually works."

#embeddings#generation#retrieval

06-15 11:41·7 min read

models

The cost of a token: how model pricing works

"Model bills are measured in tokens, not words or requests. Understanding what a token is, and which ones you pay for, is how you keep costs predictable."

#tokens#pricing#cost-management

06-12 15:45·7 min read

models

Context windows explained: tokens, attention, and where long context breaks

A bigger context window is not the same as better memory. Here is what a context window really is, why long inputs degrade, and how to design around it.

#context-window#tokens#attention

06-02 10:06·7 min read

models

What a "frontier model" actually means — and why benchmarks mislead

"Frontier model" is a moving label, not a spec. Here is what it really points to, why leaderboard scores rarely tell you what you need, and how to choose well anyway.

#frontier-models#benchmarks#evaluation

06-01 19:11·7 min read

models

How large language models are trained, in plain language

Training a language model happens in stages, not one magic step. Here is what each stage does, in plain language, and why the order matters.

#training#pretraining#fine-tuning

06-01 12:06·7 min read

models

Open-weight vs open-source models: the real difference

"The two terms get used as synonyms and they are not. What you can download, inspect, and reuse differs sharply — and it affects what you are allowed to do."

#open-weight#open-source#licensing

05-29 16:50·7 min read

models

Why models have knowledge cutoffs

A model's knowledge stops at a date because its knowledge is frozen at training time. Here is why that happens and how tools work around it.

#knowledge-cutoff#training-data#retrieval

05-25 16:26·7 min read

models

Multimodal models: what "it can see" really means

When a model "sees" an image, it is not looking the way you do. Here is how multimodal models actually work, what that enables, and where they quietly fail.

#multimodal#vision#image-understanding

05-22 12:04·7 min read

models

Tokens and tokenization: why models see text strangely

Models don't read letters or words — they read tokens. Understanding that one fact explains spelling slips, odd costs, and why context limits work as they do.

#tokens#tokenization#context-window

05-14 16:37·7 min read

models

Open vs closed models: how to choose for a real project

Open weights or a hosted API? The right answer depends on control, cost, and risk — not ideology. Here is a framework that survives contact with production.

#open-weights#model-selection#deployment

05-11 14:31·7 min read

models

Reasoning models: what "thinking" tokens do

"Reasoning models work through a problem before answering. That hidden working costs time and tokens — and pays off only on the right kind of task."

#reasoning-models#thinking-tokens#inference

04-29 14:40·7 min read

models

What model "parameters" actually are

"Billions of parameters" gets quoted like horsepower. Here is what a parameter really is, why the count matters, and why bigger isn't automatically better.

#parameters#model-size#weights

04-21 18:59·7 min read

models

Quantization and distillation: making models smaller

"Two different ways to shrink a model — one changes its numbers, the other trains a smaller copy. Here is how each works and when to reach for it."

#quantization#distillation#model-compression

04-12 16:37·7 min read

models

Mixture-of-experts models, explained simply

Mixture-of-experts lets a model be huge yet cheap to run by using only a slice of itself per input. Here is the idea, plainly, and why it matters.

#mixture-of-experts#architecture#efficiency

04-11 13:35·7 min read

models

Temperature, top-p, and sampling: controlling model output

Temperature and top-p decide how a model picks its next word. Knowing what each one really does lets you dial output from rigid to creative on purpose.

#sampling#temperature#top-p

04-06 09:43·7 min read

models

Why two runs of the same prompt differ

"Send the same prompt twice and you often get two different answers. That is by design, not a bug — and knowing why tells you when to control it."

#sampling#temperature#determinism

04-04 15:31·7 min read

models

Small models, big jobs: when on-device beats the cloud

The biggest model is rarely the right one. Here is why small, on-device models win whole classes of jobs — and how to tell when yours is one of them.

#small-models#on-device#edge-ai

04-01 12:28·7 min read