We're not scaling transformers.
We're replacing them.
The dominant architecture in AI — the transformer — is a powerful pattern matcher. But it doesn’t understand. It can’t adapt on the fly, allocate more thought to harder problems, or learn from a single example without retraining on billions of tokens.
Intelligence requires a fundamentally different computational theory — one rooted in how biological systems actually work: hierarchical prediction, error-driven learning, and adaptive computation.
PCT is a computational framework where neural columns compete, specialize, and settle into stable representations through iterative prediction. Instead of a single forward pass, PCT layers think until they converge — naturally spending more compute on harder inputs.
Maher implements PCT as a language model. It demonstrates capabilities transformers cannot achieve without external scaffolding: adaptive computation, online learning, and test-time scaling.
Below, 48 columns compete to predict the next token. Red = high prediction error. Each column iterates until its error stabilizes. Harder tokens take more steps; easier ones converge fast. This is adaptive compute, running.
Intelligence is a substrate problem. The right architecture, designed from first principles, will exhibit cognition the way physics exhibits gravity.
Eighteen years old. First-year university student. Building alone from Egypt.
No co-founders. No compute sponsors. No VC pitch deck. One researcher, one architecture, one thesis — carried far enough that the first predictions are coming in.
If you’re building what comes next,
so are we.