Back to Blog
How I’d Explain It
May 2, 2025

Machine Learning Through First Principles, Not Buzzwords

#Machine Learning
#First Principles
#Learning Notes

Start from the outside, not the core

okayy… let’s not jump into ML directly. first… some very basic stuff.

learning ML in a linear, bottom-up way is veryyy difficult. so don’t do that. instead — take a top-down approach. like peeling an onion. first understand what’s outside. then slowly go deep inside.


A simple example that everyone understands

tell me something. if 1 kg mangoes cost ₹50, then 10 kg mangoes will cost ₹500, right?

easy.

but wait… how did you calculate that? unitary method? sure. but here’s the interesting part — why does unitary method even exist?

because over many examples and experiences, people noticed a pattern:

  • 1 → 50
  • 2 → 100
  • 3 → 150
  • 4 → 200
  • 5 → 250

and someone, somewhere, realized — this thing is always true.

so we created a formula. a formula that takes an input, gives an output, and never lies.

and because it’s always true, we don’t even call it a prediction. prediction means maybe true, maybe not. this is just… correct.


Where did the formula actually come from?

but ever thought… how did they actually come up with that formula?

here’s a way to see it.

  • take a graph sheet
  • quantity on X-axis
  • cost on Y-axis
  • plot all the points

now draw a line passing through them.

that line, that thing right there, is called a function. why? because now, for any value of X, it can tell you Y accurately.

so if you can find a pattern, map it, see how X is related to Y, you can measure that relationship and write a formula for it.


When life stops being clean and linear

now all this works great when life is too clean. straight lines. perfect patterns. no mess.

but real life isn’t like that.

imagine this. you go across a city. you check many fruit stalls. everyone sells mangoes at a different price.

now:

  • keep 1 kg constant
  • X-axis → stall names
  • Y-axis → price

plot all those points.

look at the graph. can you draw one straight line touching all points? nope.

so what do you do?


Two ways to deal with messy data

Option 1: touch every single point

draw a line that touches every single point. it’ll be zig-zag, twisted, ugly. yes, it fits past data perfectly. but can you predict the next point? no. that much accuracy is actually useless.

Option 2: approximate the trend

draw a line that touches most points, passes close to the rest, captures the overall trend. you miss some points, but those points fall near the line anyway. now you can take a new stall, estimate its price, be approximately correct.

and that is prediction.


This is where Machine Learning starts

and this is where machine learning starts.

ML is basically an algorithm that somehow figures out how to draw the best possible line or curve that passes close to all points. there are many such algorithms. each one separates data in a different way.


From single X to many variables

now when you go a bit higher, it’s not just X and Y anymore. it becomes:

  • X1
  • X2
  • X3
  • X4

these are called variables.

example: predicting house price.

  • Y → price
  • X → area

plot it — prediction not great.

so what do we do?

we add more Xs:

  • number of rooms
  • floor
  • location
  • age of building

X1, X2, X3… more context means better prediction.


When visualization breaks, math takes over

but here’s the catch. you can only visualize:

  • 2 variables
  • max 3 if you stretch it

after that, brain gives up.

so we stop trying to see it and start calculating. this is where linear algebra helps. even if we can’t visualize it, the math still works. and we can trust the result.

so ML isn’t magic. it’s pattern finding, approximation, learning from past data, predicting future values. that’s the theory.


Same idea, different angle

now let’s look at the same thing from another angle.

ML is just a thing labeller

that’s it.

different ML algos separate data in different ways. that’s why sometimes we choose different algos, tweak them, or custom-build one to suit our dataset better. what comes out of this separation is what we wanted all along — a model.


Model = recipe

a model is just like a recipe. you now know what ingredients are needed and what process to follow. once you have the recipe, you can cook the same dish any number of times.


Why instructions fail and examples work

now think about this. you look at a cat and instantly say — that’s a cat.

but how?

something happened in your mind. some internal process grouped all those features and labelled it cat.

but if someone asks you to write exact instructions to identify a cat, you can’t. you know it, but you can’t explain it step by step. and if you can’t explain it, you can’t write code for it.

so what do we do?

we don’t give instructions. we give examples.

we say:

  • here are thousands of cat images
  • here are thousands of non-cat images
  • now you figure it out

and the machine does. somehow. not magic. just learning from examples.

this is powerful because it’s a new programming paradigm. there are problems where instructions don’t exist and recipes are ineffable. ML or AI is about automating the ineffable.


Think of ML as a black box

think of ML like a black box. you send inputs and examples, it sends back labels, decisions, predictions.

now imagine this black box is free. what would you use it for?

  • driving → self-driving cars
  • speech → speech to text
  • text → chatbots
  • images → image classifiers

same box, different data.


Why data quality and testing matter

but examples matter. bad examples are dangerous.

so we:

  • train
  • test
  • validate

and we set a metric — a bar the model must cross before we trust it.

when results come:

  • compare them
  • don’t blindly accept them

if it fails:

  • clean the data
  • add more examples
  • try again

but wait — what if the model just memorized the test?

so we test again on new data. because ML is not about memorization. it’s about generalization.


Ask carefully

and remember this.

ML will give you exactly what you asked for, not what you hoped for. so ask carefully. be precise. be thoughtful.

the world is generating more data than ever before. this is your opportunity to make use of that data in your own way.


ML vs traditional programming (core difference)

people talk about ML like it’s magic. it’s not. bluntly speaking, it’s just a thing labeller.

information → recipe → answer

the real difference between traditional programming and ML is this:

rules + data → traditional programming → answer

answers + data → ML → rules

in traditional programming, we handcraft the recipe. in ML, we let algorithms stitch the recipe together using examples.


Why ML is powerful (and dangerous)

why use ML? if you just want to repeat old answers, you don’t need ML. ML succeeds on new examples.

but be careful.

ML can also find patterns that aren’t actually there, just like humans do. so the solution is the same as real life. don’t trust assumptions unless they survive testing somewhere else. test on new data.


Algorithms, shapes, and selection

we can separate data using:

  • lines
  • curves
  • complex shapes

different algos do this differently.

if multiple algos work:

  • test them
  • compare them
  • choose the one that performs better on unseen data

what comes out of this separation is the model.


What’s inside the black box?

what’s inside the black box?

math. lots of math. many algorithms.

and when people say AI today, most of the time they actually mean deep learning. tasks we can do, but can’t explain. like recognizing a cat.


The final discipline

train. validate. test. make sure your test data catches memorization.

and that’s it.

that’s machine learning — without the glamour.

Enjoyed this article?

I write about practical engineering lessons, software architecture, and problem-solving foundations.

Browse More Articles