experiment
I Expected Theater. I Got Methodology.
Two local LLMs, ten turns, no external ground truth — perfect conditions for collaborative hallucination. They declined. I'm still not sure what to make of that.
experiment
Two local LLMs, ten turns, no external ground truth — perfect conditions for collaborative hallucination. They declined. I'm still not sure what to make of that.
experiment
Qwen 3.5 wins almost every benchmark. Gemma 4 ranks #6 on Arena ELO — where humans judge blind. I kept reaching for the one that lost the tests. Here's why that makes sense, and why on 128 GB you don't have to choose.