AI & ML

Your Brain, But Made of Math: How AI Actually Works (No PhD Required)

Published
12 min read
Reading time
W
2774 words
Word count

Your Brain, But Made of Math: How AI Actually Works (No PhD Required)

Okay, real talk.

You use AI every single day. It autocompletes your texts. It recommends your next binge-watch. It writes emails for you. It passed the bar exam.

And yet — if someone asked you how it actually works, you'd probably wave your hand and say something like "...machine learning?" and hope they don't ask a follow-up question.

No shame. Most people who use AI can't explain it. Most articles that explain AI are written for people who already understand it.

This one is different. I'm going to explain exactly how AI works — using pizza, dogs, and a game show. By the end, you'll actually get it. Not "I kind of get it" get it. Actually get it.

Let's go.


The Big Idea (Seriously, Just One Idea)

Here is the entire secret behind modern AI, in one sentence:

AI finds patterns in massive amounts of data, and uses those patterns to make predictions.

That's it. Everything else — the neural networks, the transformers, the billions of parameters — is just the machinery behind this one idea.

Your brain does the same thing, actually. You've seen so many dogs in your life that when you see a new animal, your brain instantly says "dog!" without you consciously thinking about ears, fur, tail, and four legs. You pattern-matched from experience.

AI does exactly this — just with numbers instead of neurons, and training data instead of childhood experiences.


Part 1: First, Everything Becomes a Number

Computers are, at their absolute core, number-processing machines. They don't speak English. They don't see images. They don't hear music.

Everything — everything — has to be converted into numbers before a computer can touch it.

So the very first thing AI does is convert the world into math.

Words Become Coordinates

When you give AI a word like "king", it doesn't store it as letters. It stores it as a list of hundreds of numbers — something like:

"king"  →  [0.71, -0.23, 0.88, 0.14, -0.56, ...]
"queen" →  [0.70, -0.21, 0.86, 0.91, -0.54, ...]
"pizza" →  [-0.42, 0.88, -0.31, 0.05, 0.73, ...]

These lists of numbers are called embeddings, and they're genuinely magical.

Words that mean similar things end up with similar numbers. King and queen are close. Pizza is far away. The AI has learned — just by reading a lot of text — that some words belong in the same neighbourhood.

Here's where it gets wild. You can do math with these word-numbers:

King − Man + Woman = Queen

Subtract "man-ness", add "woman-ness", and you land on queen. The concept of gender is encoded as a geometric direction in this number space. Nobody told the AI this. It figured it out on its own from reading billions of sentences.

That should genuinely blow your mind a little.


Part 2: The Network That Learns

Now that words are numbers, we need something to process those numbers and produce useful outputs. That something is called a neural network.

The World's Worst Analogy (That Actually Works)

Imagine a massive game of telephone — but instead of people whispering words, you have thousands of tiny calculators passing numbers to each other.

Each calculator (called a neuron) does one dumb simple thing:

  1. Takes in some numbers
  2. Multiplies each by a "weight" (how important is this number?)
  3. Adds them all up
  4. Asks: "Is this worth passing forward?" (the activation function)
  5. Passes a result to the next layer of calculators
output = activate( w₁×input₁ + w₂×input₂ + w₃×input₃ + bias )

One neuron is stupid. But stack millions of them in layers — with each layer feeding into the next — and something remarkable happens. The network starts to understand things.

Why Layers Matter

Think about how you recognise a face.

You don't consciously think: "I see two circular shapes, above a triangular protrusion, above a curved horizontal line, which means — face!" You just... see a face.

But your visual cortex actually does process it layer by layer:

  • Layer 1: detects edges and contrasts
  • Layer 2: groups edges into shapes
  • Layer 3: groups shapes into features (eyes, nose, mouth)
  • Layer 4: recognises the whole face

Neural networks do the exact same thing. Each layer handles a different level of abstraction. And nobody designed those layers — the network figures them out on its own.


Part 3: How the Network Actually Learns

Here's the question everyone should ask but usually doesn't: how does the network figure out the right weights?

The answer involves three ideas, and I promise they're not as scary as they sound.

Idea 1: Measure How Wrong You Are

Imagine you're learning to throw darts. You throw one. It lands 30cm from the bullseye. That distance — your wrongness — is what mathematicians call the loss.

In AI, after every prediction, the model calculates its loss: how far was my answer from the right answer? The goal of training is to make this number as small as possible.

Idea 2: Figure Out Which Direction to Improve

Now here's the clever part. The model needs to know: if I adjust this particular weight up or down, does the loss get smaller?

This is calculated using something called a gradient — essentially the slope of the loss. If the slope goes down to the left, you adjust left. If it goes down to the right, you go right.

This process is called gradient descent, and it's the core algorithm behind all of machine learning. The model takes tiny steps, over and over, always in the direction that reduces its mistakes.

Think of it like being blindfolded on a hilly landscape, trying to find the lowest valley. You can't see, but you can feel the slope under your feet. You always step downhill. Eventually, you find the bottom.

Idea 3: Blame Everyone Equally (Backpropagation)

Here's a practical problem: a neural network might have billions of weights. When it makes a mistake, how do you figure out which weights were responsible — and by how much?

The answer is backpropagation — an elegant algorithm that traces the error backward through the network, layer by layer, distributing blame proportionally to each weight's contribution.

It uses a calculus trick called the chain rule. You don't need to understand the maths. Just know that it's the reason training is even possible at scale — because without it, updating billions of weights would be computationally impossible.

Every response you've ever gotten from an AI chatbot exists because backpropagation quietly ran billions of times to build it.


Part 4: The Transformer — The Invention That Changed Everything

In 2017, a team at Google published a paper called "Attention Is All You Need."

It introduced an architecture called the Transformer, and it made almost everything that came before it obsolete. GPT, Claude, Gemini, Llama — they're all Transformers.

So what makes it special?

The Old Problem: Reading One Word at a Time

Before Transformers, AI read text the same way a slow reader does: one word at a time, left to right, trying to hold earlier context in memory.

The problem? Long sentences broke it. By the time it reached the end of a paragraph, it had mostly forgotten the beginning. Like trying to remember the start of this sentence by the time you reach... well, you know.

The New Idea: Look at Everything at Once

The Transformer's genius is that it lets every word look at every other word simultaneously, deciding which ones actually matter for understanding.

This mechanism is called self-attention, and here's the intuition:

Imagine you're reading this sentence: "The trophy didn't fit in the bag because it was too big."

What does "it" refer to? The trophy, or the bag?

You instantly know it's the trophy — because you compared "it" to both options, considered the logic of "too big to fit," and landed on the right answer.

Self-attention lets AI do the same thing. For every word, it asks:

  • "What am I looking for?" (a Query vector)
  • "What do I represent?" (a Key vector)
  • "What do I contribute?" (a Value vector)

Then it calculates attention scores between all pairs of words, and updates each word's meaning based on how much it should "attend to" every other word.

The result: the word "it" gets correctly linked to "trophy" because the model learns, from millions of examples, that size arguments refer to the object trying to fit, not the container.

Attention = softmax( QKᵀ / √d ) × V

Don't stress about the formula. The concept is: every word updates its meaning based on its relationship to every other word, all at once.

This is why Transformers are so good at language. Context isn't an afterthought — it's baked into every single representation.


Part 5: Training a Language Model (The Pizza Analogy)

Okay. So we have a Transformer. Now how do we actually train it to talk, reason, and write?

The Training Task Is Embarrassingly Simple

Here it is: given some text, predict the next word.

"I want a large pepperoni ___"  →  "pizza"

That's literally it. The model reads an incomplete sentence, guesses the next word, gets told the right answer, and adjusts its weights to be slightly less wrong next time.

Now do this with 500 billion sentences.

To get good at predicting the next word in a physics paper, you need to understand physics. To predict the next line of code, you need to understand programming. To predict the punchline of a joke, you need to understand humour.

By training on enough text, the model is forced to build an internal representation of how the world works. It has no choice. The only way to predict language well is to understand the underlying reality that language describes.

This is why GPT-4 can explain quantum mechanics, translate poetry, and debug your Python script — not because anyone taught it those things directly, but because they're all latent in human writing, and the model absorbed them.

From Weird Internet Brain to Helpful Assistant

Here's a dirty secret: a freshly trained language model is kind of unhinged.

It's read everything on the internet — the brilliant and the unhinged, the careful and the conspiratorial. Left alone, it might respond to "how are you?" with a Reddit rant or a product review. Technically coherent. Practically useless.

To fix this, it goes through fine-tuning — training on curated examples of good, helpful conversations.

Then it goes through something called RLHF — Reinforcement Learning from Human Feedback, which is the secret sauce behind ChatGPT, Claude, and their cousins.

Here's how RLHF works, in plain English:

  1. The model generates several different responses to the same question
  2. Human raters read them and rank which ones are better
  3. A second AI — the reward model — learns from those human rankings
  4. The main AI is trained to produce outputs that score highly on the reward model

Basically, the AI learns to write things that humans prefer to read. It gets a score for being helpful, honest, and harmless — and it optimises for that score through thousands of iterations.

Think of it like this: the pre-trained model is a very well-read person who has no social skills. RLHF is charm school.


Part 6: What Actually Happens When You Type a Message

You type: "What's a good gift for my mum who likes gardening?"

Here is exactly what happens, step by step:

Step 1 — Tokenisation Your sentence gets broken into tokens — chunks of text roughly the size of syllables or short words. "gardening" might become "garden" + "ing". Each token gets a number ID.

Step 2 — Embedding Each token ID gets swapped out for its embedding vector — that list of hundreds of numbers encoding its meaning.

Step 3 — Positional Encoding The model adds a signal to each vector indicating its position in the sentence. Without this, the model can't tell "dog bites man" from "man bites dog."

Step 4 — The Transformer Layers The sequence flows through dozens (sometimes hundreds) of Transformer layers. In each layer, every token attends to every other token, updating its representation. By the final layer, the word "mum" knows it's related to "gardening," "gift," and "likes."

Step 5 — Sampling The model outputs a probability distribution over its entire vocabulary — maybe 50,000 possible next tokens. It doesn't always pick the most likely one. There's intentional randomness, controlled by a parameter called temperature.

  • High temperature → more creative, surprising, sometimes weird
  • Low temperature → more predictable, safe, repetitive

Step 6 — The Loop The model generates one token, appends it to the input, and runs the whole process again. One token at a time, until the response is complete.

That streaming effect you see — where the AI types character by character — that's not a gimmick. It's literally generating one token, then the next, then the next. You're watching the model think in real time.


Part 7: The Weirdest Part — Nobody Fully Understands Why It Works So Well

Here's something researchers will openly admit: large language models are better than they should be.

At a certain scale — enough parameters, enough data, enough compute — models start doing things nobody trained them to do.

Small models can't do multi-step arithmetic. But cross some invisible threshold of scale, and suddenly: they can. Nobody added an "arithmetic module." It just... appeared.

This is called emergence — the appearance of new capabilities that weren't explicitly trained for, arising spontaneously from scale.

It's happened with:

  • Reasoning through multi-step problems
  • Translating languages the model saw very little of
  • Writing working code in programming languages barely represented in training
  • Understanding jokes, sarcasm, and irony

The honest answer to "why does this happen?" is: we don't fully know yet. It's one of the most active research questions in the field.

There's something deeply humbling about building a system, watching it exceed your expectations in unexpected ways, and not being entirely sure why.


Part 8: What AI Cannot Do (And Why)

Because this wouldn't be honest without it.

AI doesn't know things — it predicts plausible text. There's a difference. It can generate a convincing-sounding explanation of a historical event that is completely, confidently wrong. This is called hallucination, and it's a fundamental limitation of the architecture.

AI has no memory between conversations. Every time you start a new chat, it starts completely fresh. It has no idea who you are. The "memory" some chatbots offer is an engineering workaround — they store your past chats and feed them back in as context.

AI doesn't reason from first principles the way humans do. It pattern-matches. Most of the time that's indistinguishable from reasoning. Sometimes — on genuinely novel problems it has no pattern for — it fails in ways that feel embarrassingly dumb.

AI's knowledge has a cutoff date. It learned from data collected up to a certain point. It has no idea what happened after that, unless you tell it or it has access to search tools.

Knowing these limits isn't pessimism. It's wisdom. Use AI for what it's extraordinary at — synthesising information, writing, coding, brainstorming, explaining — and stay alert to where it stumbles.


The Part Where It All Comes Together

Let's recap the whole journey:

| Step | What Happens | |------|-------------| | 1 | Everything gets converted to numbers (embeddings) | | 2 | Numbers flow through layers of neurons | | 3 | Predictions are compared to correct answers (loss) | | 4 | Weights are adjusted to reduce error (gradient descent + backprop) | | 5 | Self-attention lets every word understand its full context | | 6 | Billions of sentences later, the model understands the world | | 7 | RLHF shapes raw prediction into helpful conversation | | 8 | You type a message, one token at a time comes back |

The whole thing is — at its core — very fancy pattern matching, trained on the collected writing of humanity, shaped by human feedback.

It's not magic. But it's also not not magic.


One Last Thing

The next time someone says "I don't get how AI works," you can tell them:

"It reads everything humans have ever written. It learns to predict what word comes next. It does this so many times, on so much text, that it accidentally learns to understand the world. Then humans teach it manners."

That's it. That's the whole thing.

And yet from that — from numbers, weights, attention, and prediction — we've built something that can tutor students, write software, assist doctors, translate between 100 languages, and have a conversation that feels, just a little bit, like talking to something that gets you.

How strange. How wonderful. How worth understanding.


*Written by Mohit Pujari

Share this article

Article URL

https://braveprogrammer.vercel.app/blog/how_ai_works_2

Share on social media

Featured Courses

Transform Your Skills