Tensors and Transformers From: Scratch
I’ve always admired the ability to explain things simply. I always, always took everything apart. To the great detriment of my computer tinkerer Grandfather who would buy a new computer and gadget only for me to promptly disassemble it.
It’s 2024 and I’m now teasing apart complexity so I can explain to others and to myself. To understand it’s story and marvel at all it’s Lego pieces that bring together ideas.
Why do they call it a Tensor?
I always loved geometry. Like a drawn picture tries to capture a feeling. So to does a Tensor capture the geometry of something. Like the geometry of a triangle, or of several triangles put together. A Tensor ends up looking like a matrix. Similar to how a picture ends up looking like colored lines on paper. They are both representations of “something” in another form.
Tensors are geometric. They're a beautiful way separating the geometry of something separate from the set of numbers we use to represent it, and saying that tensors are arrays is like saying nouns are ordered sequences of letters.
Random Tensor:
tensor([[0.1602, 0.6000, 0.4126],
[0.5558, 0.0912, 0.3004]])
Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing, indexing, slicing), sampling and more are comprehensively described here.
Each of these operations can be run on the GPU (at typically higher speeds than on a CPU).
<aside> ☎️ Enter The Matrix
This topic can get pretty deep. Oddly enough, though the Matrix is a human created idea, it allows for exploring some deep relationships when we rethink of what is “data”.
"Unfortunately, no one can be told what the Matrix is. You have to see it for yourself."