ディスカッション (4件)
10桁の足し算というシンプルな計算タスクを題材に、Transformerの最小構成をゼロから構築する方法を紹介します。LLM(大規模言語モデル)の内部構造を深く理解するために、無駄を削ぎ落とした軽量なモデルの実装と学習プロセスについてまとめています。
Looks like a Tiny Analytic transformer, RNN is arguably a better choice if you are gonna handwire an architecture to mechanically do addition. Learning is about discovering the patterns and algorithm from data. Wiring a machine to follow a procedure defeats that purpose.
I somewhat feel that using floating point arithmetic for what should be a symbol manipulation exercise is cheating. The deserialisation technique is interesting enough that I'm not really upset, though.
The codex solution reversed the order which makes sense for making carry logic easy, but it is less clean.
That's the approach I'd have gone with. I've long been an advocate of little-endian numerical representations. That said, if there's a maximum number of digits, it's straightforward to implement the circuitry needed to do calculate the most-significant digit of the result in one go; and I somehow doubt the AI-generated solution really took advantage of the tricks that little-endian allows.
At some point I set claude code on some debugging to my surprise I don’t recall it actually solving any of the bugs, it seemed much more concerned with “correcting” the funky things I was intentionally doing.
It baffles me that somebody capable of this kind of work would find this surprising. The process that allows LLMs to find bugs in code is the same process that entreats them to "correct" such creativity: their understanding of the world begins and ends at statistical plausibility, and they cannot truly comprehend things (though they can do a very good job of pretending, given sufficient training data).