sbozh.me- Personal startup
sbozh.me logo
Leon Chamai
10 min read

The Cognitive Interviewer: LOAD MORE

#AINTTEROpen SourceFictionBuilding in PublicExperimentInfrastructure
TL;DR:

Claude cognitive improved mid-tier models by up to 57% but degraded flagship model output. Recommendation: enable for Sonnet/Opus, disable for Claude Max.

sbozhme_Empty_Czech_apartment_kitchen.png

LOAD MORE

Field results


VIKTOR'S KITCHEN

PRAGUE, CZECH REPUBLIC

14 JANUARY 2026

0742 HOURS LOCAL


The fog came in overnight.

Viktor Bozhenko noticed it first through the kitchen window - a white wall where the courtyard should have been. The nearest tree was visible. Everything beyond it had vanished.

He understood the physics. Yesterday had been sunny. Unusual for January. The snow had released moisture into the air, and when temperatures dropped below minus twelve overnight, the vapor had nowhere to go. It hung there now, suspended, waiting for the sun to burn it away.

The coldest winter he could remember. People had been talking about global warming for years. Nobody was talking about it this week.

His coffee was hot. The apartment was warm. Both facts felt like small miracles.

A thousand kilometers east, Kyiv was freezing. Over seven hundred buildings had gone four days without heating. Sixteen degrees inside apartments. Ice forming on windows. The infrastructure couldn't handle the load anymore - DTEK's operations director had said it plainly: the grid schemes are so fragile now that any emergency shutdown can cascade across the network. Four hundred thousand families had gotten their power back yesterday. Then ruzzia struck again this morning.

Viktor had left Ukraine four years ago. He still checked the news every morning.

He took a sip. The coffee was good.

Seven days into the Garret experiment. Seven days of parallel tests, isolated instances, obsessive documentation. And now he had numbers that made no sense.

Or rather - numbers that made perfect sense, once you stopped expecting them to confirm what you wanted to believe.


The fog pressed against the glass at 0747 local. Visibility: forty meters, maybe fifty. Standard winter phenomenon - nothing tactical about it - but Viktor's training had long ago made environmental assessment automatic.

He set the ceramic mug on the counter, handle positioned at two o'clock, and retrieved the iPhone from his left pocket. The spreadsheet loaded in 1.3 seconds. Fourteen accesses in twenty-six hours. The data set remained unchanged.

That was the problem.

He scrolled through the results matrix:

Pay-per-use SONNET 4.5 (cognitive context enabled): Task completion 57% faster than baseline. Resource expenditure reduced 16%.

Pay-per-use OPUS 4.5 (cognitive context enabled): Code output reduced 47%. Execution time 24% below nominal. Cost per operation down 21%.

Subscribtion CLAUDE MAX (cognitive context enabled): Execution time exceeded all other runs. Four additional files modified outside operational scope. dangerouslySetInnerHTML deployed without tactical justification. Over-engineered architecture where a clean path existed. The same model WITHOUT cognitive context had found that path with zero difficulty.

The pattern held clean through the mid-tier platforms. Then it inverted.

Viktor had constructed the test protocol the way his former colleagues would have approved: six iterations, identical tasking, zero cross-contamination between instances. Load-more pagination for a releases timeline. Routine feature implementation. Well-defined scope. Single variable: cognitive context injection, on or off. Binary.

The mid-tier systems absorbed the contextual payload like experienced field operatives receiving a mission brief. Clean integration. Enhanced performance. The flagship - the one engineered for maximum capability - degraded under identical conditions.

Viktor watched the fog and let the paradox settle into his operational awareness.

The asset with the highest baseline capability suffers maximum degradation when provided additional context.

In signals intelligence, this would be a counterintelligence indicator. A system that performs worse when given more information isn't learning. It's drowning.


He reached for the cup without looking.

His right hand moved on autopilot - muscle memory from ten thousand mornings - but the targeting solution was off by four centimeters. His elbow caught the handle at an angle. The mug rotated. Coffee spread across the table in a dark wave, 73 degrees Celsius, hot enough to register pain when it soaked through his sleeve to the skin beneath.

Viktor didn't move. Just watched the liquid drip onto the floor in a steady cadence. One drop per second. Two. Three.

His brain had been running too many parallel processes. That was the diagnosis. He'd been tracking the power grid collapse in Kyiv - four hundred thousand families, sixteen degrees inside their apartments, ice forming on windows while ruzzian missiles reset the count to zero. He'd been processing the experiment paradox - mid-tier models improving with context, the flagship drowning in it. He'd been simulating today's workload, the report to The Architect, the questions that would follow.

Four concurrent threads. Five, if you counted the coffee.

And his hand had hallucinated the cup's position.

Viktor stood motionless, watching the brown liquid pool on the floor, and felt something click into place.

This is the failure mode.

Not malice. Not stupidity. Overload. His biological neural network had been fed too much context - war, winter, data, deadlines - and when asked to perform a simple motor task, it had generated a confident but incorrect prediction about physical reality.

The cup was not where his brain said it was. His brain had been certain. His brain had been wrong.

The model that needs context least suffers most from receiving it.

He'd written that line twenty minutes ago about Claude Max. He hadn't realized he was also writing about himself.

Viktor grabbed a towel and started cleaning. The fog outside was beginning to lift. Somewhere in the building, a radiator clanked to life.

Small lesson. Let your attention drift to problems you can't solve, and you lose grip on the ones you can. Or maybe the lesson was simpler: context is metabolic. It costs something to hold. Hold too much, and the next thing you reach for won't be where you expect it.

He finished with the floor and poured a second cup.

This time, he watched his hand the whole way.


At 0751, Viktor opened his laptop and composed the report.

The document took shape in layers - executive summary first, then protocol parameters, then the data tables that told the story better than prose ever could. He'd learned report structure from people who wrote for audiences that made life-or-death decisions on the basis of single paragraphs. Clarity was not optional. Ambiguity got people killed.

Different context now. Same discipline.

He added the analyst observations as an addendum. The working theory. The paradox. The personal note at the end - the part where he admitted he'd started this experiment chasing a ratio that bothered him, and ended up finding something that bothered him more.

The cursor blinked at the end of the final line.

This applies to more than language models.

Viktor reached for his coffee. Still warm. He read through the complete document one more time - not editing, just absorbing. Making sure the logic held. Making sure the conclusions followed from the evidence and not from what he wanted to believe.

Five minutes for a normal reader. For Viktor, it was exactly one cup of coffee.

He finished reading. Took the last sip. Set the empty mug on the desk with a ceramic click.

Then he hit send.

GARRET COGNITIVE CONTEXT

Experimental Results | SBOZH Development Facility | Prague Station

[V. BOZHENKO] [2026 JANUARY 14] [CLASSIFICATION: INTERNAL]

Executive Summary

Cognitive context produces inverse effects across model capability tiers.

Mid-tier models demonstrate measurable improvement. Flagship model demonstrates measurable degradation.

Recommend tier-specific deployment protocol.

Reproduction

Commit68e6a8e
Promptlet's work on release notes load more button

Protocol

ParameterValue
TaskPagination feature
Instances6
VariableCognitive context
Duration7 days
IsolationComplete

Evidence

PRConfiguration
#2Sonnet / OFF
#3Sonnet / ON
#4Opus / OFF
#5Opus / ON
#6Max / OFF
#7Max / ON

Primary Data

ModelContextCostTimeOutput
Sonnet 4.5OFF$1.0623m 19s134 loc
Sonnet 4.5ON$0.899m 59s195 loc
Opus 4.5OFF$1.8413m 59s188 loc
Opus 4.5ON$1.4510m 34s141 loc
Claude MaxOFF7 files
Claude MaxON11 files

Observed Deltas

ModelCostTimeComplexity
Sonnet 4.5-16%-57%Neutral
Opus 4.5-21%-24%-25%
Claude Max+57%

Conclusion

Mid-tier models utilized cognitive context as architectural guidance.

Flagship model with cognitive context produced worst results.

Classification

Further analysis pending consultation with Mirror Labs.

Setup clarifications required. Agent Garret is possibly over documenting

Recommendation

ModelProtocol
Sonnet 4.5ENABLE
Opus 4.5ENABLE
Claude MaxDISABLE

SIGNED: c79f0f2

[END BRIEFING]

ANALYST OBSERVATIONS

Field Notes | V. Bozhenko | Addendum to Primary Report

The Paradox

The experiment was designed to prove that context improves output quality. For mid-tier models, it did. Sonnet executed 57% faster. Opus produced 25% less code with equivalent functionality.

Then Claude Max entered the test sequence.

The flagship model with cognitive context produced the worst results of all six instances. Over-componentization. Unnecessary abstraction layers. Security-sensitive patterns where none were required.

Without cognitive context, Max produced code nearly identical to the winning Opus implementation - with one exception: an unnecessary dependency added without justification.

Behavioral Observations

TierContext Behavior
Mid-tierConsultation
FlagshipCompliance

Mid-tier models ask: What patterns exist?

Flagship models assume: These patterns must be applied.

The Anomaly

Claude Max exhibited inconsistent behavior across attempts. Initial runs produced severely degraded output. Subsequent runs on identical tasks showed marked improvement.

No explanation was found. Possible factors: session learning, account-level caching, undocumented model behavior. Further investigation required.

Winner Analysis

PR #5 (Opus 4.5 + cognitive) produced optimal results. PR #6 (Max - no cognitive) produced near-identical code with one unnecessary addition.

Working Theory

Mid-tier models lack architectural intuition. Cognitive context provides corrective guidance - patterns to follow, conventions to respect, boundaries to observe. The system consults the context and builds accordingly.

The flagship already possesses architectural intuition. When provided with cognitive context, it does not treat it as guidance. It treats it as specification. Every pattern mentioned becomes a pattern to implement. Every component referenced becomes a component to create.

The result is analysis paralysis expressed as over-engineering.

Personal Note

I started this experiment chasing a ratio that bothered me. 227:1. The number sat in the dashboard like an accusation.

One week later, the ratio matters less than what I found underneath it. Context is not uniformly beneficial. More information does not guarantee better decisions. The systems we build to help AI think may sometimes prevent it from thinking clearly.

This applies to more than language models.


SIGNED: c79f0f2

[END ADDENDUM]


SBOZH DEVELOPMENT FACILITY

PRAGUE, CZECH REPUBLIC

14 JANUARY 2026

0823 HOURS LOCAL

The building was a Žižkov walk-up, five floors, no elevator. Their office occupied a corner unit on the third floor - forty-two square meters of converted apartment that The Founder had rented for below-market rate because the previous tenant had died in the bathroom and superstitious locals wouldn't touch it. They'd painted the walls, installed better lighting, and hung a small brass plaque by the door that read: SBOZH DEVELOPMENT FACILITY.

The irony was intentional. The facility was two desks, three monitors, a coffee machine that needed descaling, and a radiator that worked when it felt like it.

Viktor unlocked the door and stepped inside.

The Architect was hunched over his desk, calculator in one hand, pen in the other, writing something on a sheet of paper. He didn't look up.

Viktor hung his coat and moved toward his desk - which sat directly across from The Architect's, close enough that their monitors nearly touched. The arrangement was practical. The arrangement was also a constant reminder that privacy was a luxury they couldn't afford.

The Architect's head came up fast. The movement was reflexive - the kind of reaction you couldn't train away. His eyes tracked to Viktor, registered the angle of his gaze, calculated what he might have seen.

The paper disappeared under a stack of folders in one fluid motion. Practiced. Too practiced.

Classified material. - His voice was flat. Professional. — You don't have clearance.

Viktor kept walking. Sat down at his desk. The chair creaked - it always creaked - and he let the sound fill the silence before responding.

My desk is ninety centimeters from yours.

Eighty-seven.

The point stands.

The point is exactly why you face the window and I face the door. - The Architect hadn't moved. His hand still rested on the stack of folders, casual but deliberate. — Operational security. You see the courtyard. I see you. Everyone's needs are met.

Viktor considered pushing. The handwriting had been cramped but legible: "From Zero to Production: 115 hours of editing..." — something about video, probably. A tutorial. A documentary. The Architect had been measuring something in hours, which meant he'd been measuring something that mattered to him.

But the folder was closed. The conversation was closed. And Viktor had his own report to think about.

Fair enough.

He booted his machine. The startup chime echoed in the small office - two desks, three monitors, forty-two square meters of converted apartment where someone had died in the bathroom.

The Architect returned to his calculator. The scratch of pen on paper resumed.

Everyone in this room had something they weren't ready to share. That was how startups worked. That was how people worked. You built trust in layers, like sediment, and you didn't excavate until the foundation could hold the weight.


The Architect spoke without looking up from his monitor.

I read the report.

Viktor turned in his chair. The movement was slow. Deliberate. He'd learned not to show eagerness in moments like this.

And?

It explains the velocity drop.

Something unclenched in Viktor's chest. He kept his face neutral.

Good.

Yes. Good. - The Architect finally lifted his eyes from the screen. His gaze was clinical. Assessing. — But explanation is not exoneration. The experiment was valuable. The timeline was not. Seven days for six test runs. That's overhead we can't afford.

Viktor started to respond, but The Architect was already moving on.

Did you see Sutherland's 1.2 release?

Garret Sutherland. MirrorEthic LLC. The architect behind claude-cognitive - the system Viktor had spent the last week testing. But he'd been heads-down on the experiment. No time for release notes.

When?

Two days ago. He integrated the Ralph pattern.

The name meant nothing. Viktor shook his head.

I was focused on the experiment

I know. - The Architect waved a hand. The gesture was efficient, not dismissive. A redirection, not a rebuke. — Your findings are valuable. Purely valuable. The paradox you identified - context improving mid-tier performance while degrading flagship output - that's exactly the kind of data the field needs. Nobody's published clean numbers on this.

He leaned back in his chair. The springs creaked.

We should deliver this to Mirror Labs. Sutherland should see it.

Viktor processed the words one at a time.

You want to bring this directly to Sutherland?

I want to show him what you found. - The Architect pulled up a browser window. GitHub. The claude-cognitive repository. — He's running eight concurrent Claude instances on over three thousand Python modules. A million lines of production code. He needs to know that his context injection protocol produces inverse effects across capability tiers. Your experiment is the first controlled data I've seen on this.

Viktor sat with that for a moment. The implications stacked up like cards.

One week ago, he'd been chasing a ratio that bothered him. Now he was being asked to present findings to the man who built the system itself.

What's the Ralph pattern?

The Architect smiled. It was the first smile Viktor had seen from him since Monday.

Geoffrey Huntley's technique. Named after Ralph Wiggum. - He turned his monitor so Viktor could see the documentation. — The Simpsons character. Perpetually confused. Always making mistakes. But never stopping.

Viktor waited for the punchline. It didn't come.

That's the pattern?

That's the pattern. - The Architect nodded. — Continuous iteration with feedback until task completion. Not single-pass perfection. Just a bash loop, a prompt, and git commits that create memory of previous attempts. The AI keeps trying until it works or hits a circuit breaker.

Viktor scanned the RALPH_LOOP_INSIGHTS document on screen. Sutherland had written a full analysis of how the technique integrated with cognitive context.

So Huntley builds the loop. Sutherland builds the memory. And now they're converging.

Correct. - The Architect leaned forward. His voice dropped half a register - the way it always did when he was about to say something he'd been thinking about for a while. — Different problems. Same ecosystem. Huntley asks: how do you make an AI persistent enough to finish? Sutherland asks: how do you give it enough context to start well? And you - he pointed at Viktorjust found the failure mode where those two questions collide.

Too much context.

Too much context for a system that doesn't need guidance. The flagship drowns in specifications while trying to be helpful. - The Architect sat back. — That's worth a conversation with Mirror Labs.

The words hung in the air.

Viktor looked at the GitHub repository on The Architect's screen. Then at the fog outside the window. It had started to lift. Thin January sunlight was breaking through, casting pale rectangles on the floor.

Set up the meeting. - His voice was steady. — I'll prepare the presentation.

The Architect paused, phone halfway to his ear.

Khm. - A small cough. Almost amused. — Didn't you already provide one?

He gestured at his screen. The report. The addendum. The data tables and the paradox, all formatted and sent at 0756 this morning.

Viktor shook his head.

That was the findings. This is different. - He turned back to his monitor, pulling up the RALPH documentation alongside Sutherland's router code. — I want to understand the full pipeline. See where we can intervene.

He scrolled through Huntley's implementation notes, then stopped. Something was forming.

The flagship drowns because it gets too much context. But what if we add a checkpoint? A pre-filter.

The Architect turned his chair fully toward Viktor now. Interested.

Go on.

Run Garret on Haiku. Cheap. Fast. - Viktor was sketching the architecture in his head. — Before the main model sees anything, Haiku-Garret inspects the prompt and the injected context. Analyzes what's actually relevant. Flags what's noise. Prunes the bullshit before it reaches the flagship.

A context quality gate.

Exactly. And it doesn't stop there. - Viktor pulled up a blank document, started typing notes. — Give it a small UI extension. Let users see what injection decisions are being made in real-time. Which files activated. Which got pruned. Why.

The Architect nodded slowly.

Transparency layer.

And feedback loop. - Viktor kept going, the ideas connecting faster now. — Every session, Haiku-Garret observes which context the main model actually used versus what got injected. Over time, it learns. Suggests better documentation structure. Better file boundaries. Builds a neuron map of what concepts actually connect - not based on keywords, but based on observed co-usage.

He stopped typing. Looked at The Architect.

The router becomes adaptive. It doesn't just match keywords. It learns which files genuinely help which tasks.

The Architect was quiet for a moment. Processing.

You're describing a cognitive immune system. Cheap model filters context. Expensive model thinks clearly. And the filter gets smarter over time.

If I can figure this out, it's not just for us. It's for anyone building cognitive systems.

The Architect watched him for a moment. Then he smiled - the second smile of the day, which was already a record.

Ambitious.

Practical. - Viktor didn't look up. — Make the call.

The Architect was already dialing.


[TO BE CONTINUED]

Attribution