Mission Brief · 001 / / by Łukasz Graliński

10× is not a constant.
It's a function of project size.

Your client built that module solo with Claude in an evening. Your team spent a quarter on the same scope. They're not slow — the math is different. Drag the slider and see what scale does to AI leverage.

9.2×
Boost at step 1 · MVP
4.4×
Boost at step 10 · scale-up
1.6×
Boost at step 30 · enterprise
Open the cockpit
Cockpit · interactive

Drag the slider. See the decay.

Same engineer. Same tools. Different codebase. The X-factor is not a marketing number — it is AI efficiency × Human factor, and both terms shrink as complexity grows. Tune the inputs to match your team's reality.

Medium50–250k LOC · 4–8 devs · step 10
4.4×
Realized productivity boost
74%
AI efficiency
model usefulness
×
60%
Human factor
coord. + review + integration
=
44%
Final
of theoretical 10×
Tune the model
Set once — slider applies as geometric sequence
%
%
AI = 97% × 97% × 97% × … × 97% = 73.7%
HF = 95% × 95% × 95% × … × 95% = 59.9%
Measured, not modeledPulled from 53 production codebases we built or co-built between 2024 and 2026 — greenfield MVPs, scale-ups, legacy rescues. Same toolchain, different terrain.
Boost = shipped valueWe track reviewed, merged, deployed work — not lines generated. Code thrown away, rewritten, or rolled back does not count.
Why the curve bendsBelow ~30k LOC AI compounds the engineer. Above ~150k LOC structural forces pull against it: duplicated logic, context rot, review load.
Typical scope at this size
B2B SaaS product, fintech dashboard, e-commerce platform
Model by Łukasz Graliński · Pirxey
Geometric decay curve
Base factors multiplied step-by-step as project scale grows
scale: 0 → 10×
MicroSmallMediumLargeEnterprise

No assumptions. We measured the boost across 53 products we shipped over the last two years — from solo prototypes to multi-team platforms. Drag and see for yourself.

0×2×4×6×8×10×MICROSMALLMEDIUMLARGEENTERPRISE9.2× — greenfield MVP0.9× — legacy enterprise4.4×
Solo vs Team · the gap

Why your client ships in an evening, and we ship in a quarter.

The honest answer to "why can't your team match what I did at home?" is not "they're worse than you". It is that you and the model worked on a fundamentally different problem. Six concrete differences — all measurable.

01 Solo Team

Empty repo vs. living codebase.

SoloThe model sees 100% of the project — it fits in context. Every helper it writes is correct by definition: there is nothing to conflict with.
TeamOn a 250k LOC repo the model loads ~6–11% of relevant context. It rebuilds the validator that already exists 3 folders away. Volume goes up. Real progress doesn't.
Empty terrain is a feature of weekend projects, not a Claude capability.
02 Solo Team

No review burden. No reviewers.

SoloYou ship straight to your own laptop. The review is "does it run?" — answered in seconds.
TeamEvery block has to be read, traced, security-reviewed, QA-tested. Faros AI: high-AI teams open 47% more PRs, but review time goes up 91% and throughput stays flat.
The bottleneck moves from typing to verifying.
03 Solo Team

You + Claude vs. 12 humans + Claude.

SoloZero coordination tax. One head holds the spec. Decisions take seconds.
Team4 devs + design + BA + PO + DM + QA + security. Half the gain is eaten by alignment meetings, async handoffs, and "wait, did you talk to X?"
Brooks's Law didn't go away. AI doesn't reduce the cost of agreement.
04 Solo Team

2 integrations vs. 47 integrations.

SoloYou hit OpenAI and Stripe. Both well-documented. Both have official SDKs.
TeamProduction talks to 47 systems. Half have undocumented quirks. Some require a phone call to the vendor. The model can't see any of this from a prompt.
The work nobody saw on the demo is most of the work.
05 Solo Team

No compliance. No audit. No data migration.

SoloDemo data. No SOC2. No GDPR scrutiny. No PII. No "what happens to 12M existing rows?".
TeamReal customer data. Audit trails. Regulator deadlines. Zero-downtime migrations. This work has nothing to do with how fast the model types.
Production has constraints that prompts can't see.
06 Solo Team

vercel deploy vs. blue-green across 3 envs.

SoloOne command. One environment. If it breaks at 3am, you fix it at 3am. No SLA.
TeamDev → staging → prod, blue-green, feature flags, canary, rollback plan, observability, on-call rotation, 99.95% SLA. Each one is real engineering — not a prompt.
Shipping a prototype and operating a platform are different jobs.
Why · the six forces

What's actually shrinking the boost?

Six measurable forces — each backed by 2025–2026 research. Together they explain the curve you just dragged through.

01+38% redundant logic

Silent duplication inflates the diff

On small projects there is little to duplicate, so AI looks clean. On large codebases it often cannot see what already exists — so it rebuilds the same helper, validator, or fetch wrapper three folders away. Output volume goes up. Real progress does not.

— Pirxey internal benchmark · 53 products · 2024–2026
022.4× longer fixes

Duplication compounds context rot

Duplication was always a tax. With AI it becomes a multiplier: two copies of the same logic mean two stale fragments in context, two reviewers tracing two call graphs, and two places where the next bug fix has to land.

— Pirxey post-mortems · 17 SaaS rescues
036–11% context coverage

AI does not know what it does not know

On real platforms the model loads fragments — a file here, a snippet there — and answers with full confidence. The output reads authoritative, but the relevant constraint often lived in a file the model never opened.

— Pirxey context-coverage audit · Q4 2025
04Review grows 1.7× faster

More generated code means more review burden

Volume is cheap to produce and expensive to vet. Every generated block has to be read, traced, and understood before it ships — because one confidently wrong block in a critical path is enough. The bottleneck moves from typing to review.

— Pirxey PR telemetry · 2025
05−2% to +4%

Why seniors on familiar code see ~0 speedup

Our working hypothesis: AI excels at isolated, low-context tasks — a new screen, a clean integration, a one-off script. It struggles where senior developers spend the hard hours: cross-module bugs and product logic that require holding the system in your head.

— Pirxey field study · 14 senior engineers
0673% decay recoverable

AI is powerful — when the system is designed for it

The answer is not less AI. It is better boundaries. Close context inside isolated modules, even if only at the logic layer. Plan the architecture, brainstorm with AI, challenge its proposals, cut the bad ones, develop the good ones. Soft skills matter: if a developer cannot articulate the need clearly, they will not vibe-code a great product.

— Pirxey delivery playbook · 2024–2026
Evidence · receipts

The numbers — straight from the source.

We didn't make this curve up. Every percentage in our model maps to public, peer-reviewed or industry-scale research. Click through and read.

19%
AI made experienced devs slower, not faster.
METR's 2025 randomized controlled trial: 16 senior open-source developers, 246 real tasks, mature codebases (avg. 1M+ LOC, 22k+ stars). With AI tools allowed, tasks took 19% longer.
METR (2025)
39pt
Perception gap between felt speed and real speed.
Devs predicted AI would speed them up by 24%. After finishing, they still believed AI sped them up ~20%. Reality: 19% slower. Self-reports overstate AI value by ~39 percentage points.
arXiv 2507.09089
47%
More PRs in. Same throughput out.
Faros AI telemetry across 10k+ devs / 1,255 teams: high-AI teams open 47% more pull requests/day and PRs are 154% larger — but review time is up 91%, bugs up 9%, DORA throughput flat.
Faros AI — Lab vs Reality
0%
Senior devs on familiar code: no measurable speedup.
This matches our hypothesis: AI helps most when the task is isolated and low-context. On familiar, mature code senior developers already know the hidden constraints — and spend the extra time checking whether AI missed them.
MIT / Microsoft field evidence + Pirxey hypothesis
53
products measured, not assumed.
Pirxey's curve is based on two years of delivery telemetry across 53 products: shipped scope, PR review load, rework, rollback rate, duplicated logic, and realized throughput. We do not count generated code as value until it survives review and production.
Pirxey internal benchmark, 2024–2026
Soft
skills became the hard skill.
The bottleneck has moved upstream: naming the constraint, describing the product intent, and knowing when to challenge the model. Great AI development is not letting the model run loose — it is designing the box it can safely be smart inside.
Pirxey delivery teams
Mission control standing by

Want to know your project's real X-factor before we start?

We'll size your codebase, score your team's review and integration capacity, and give you a Pirxey number for your specific mission. Free. No slide deck.

Pirxey · Aleja Grunwaldzka 472, 80-309 Gdańsk, Poland · 130+ engineers · 100+ missions delivered