Original title: Lines of Code Got a Better Publicist
Article
The post argues that software teams have shifted from judging engineers by shipped impact to rewarding AI volume claims such as percentages of AI-generated code and daily lines written, a pattern the author says is easy to advertise but weak as an outcome metric. It contrasts this with earlier Copilot-era claims of 55% faster task completion, which were falsifiable, and contrasts a year of later claims from vendors and studies that mostly report adoption intensity rather than clear productivity gains. It cites Cui et al. as the strongest supportive evidence with about 5,000 developers and a 26% task-completion increase, while also noting GitClear’s rising churn and reduced refactoring, METR’s early 19% slowdown finding for experienced developers, and a later retraction toward inconclusive self-reported speed gains because developers now rely on AI and cannot reliably report counterfactual timing. The article also references an NBER survey of roughly 6,000 executives suggesting around 69% AI adoption but little measurable productivity change for most firms, with cross-study expectations near 10% organizational gain, and highlights internal inconsistency in AI maturity frameworks that mostly measure usage. It critiques Anthropic’s “8x code shipped” claim by pairing it with a randomized trial showing lower code comprehension and no significant productivity increase, arguing that excellent products and weak measurement claims can coexist. The post then links these metrics to corporate actions, saying layoffs at companies like Block and Atlassian are often justified with productivity narratives despite no transparent evidence of idle workforce or customer-facing improvement, and concludes that AI adoption should be measured through reliable engineering outcomes like DORA, MT
Commenters generally support the critique that line-count metrics are vanity metrics and often disconnected from customer value, with several citing vendor announcements that foreground line volume without describing shipped outcomes. Some challenge the framing, arguing that firms should not be punished or praised purely for faster AI adoption and that unchanged management structures, roadmaps, and staffing models can absorb any coding gains without visible productivity lift. Several point to incentive distortions, including executive pressure, overhiring, investor optics, and Goodhart’s law, and debate whether layoffs reflect AI efficiency or preexisting cost-cutting agendas. Others echo concerns from safety-critical domains, citing sectors like flight and medical systems where deterministic assurance requirements make LLM-generated production code adoption limited. Repeated concerns include measurement noise from self-reporting, the persistence of old LoC culture, and organizational bottlenecks in review and testing that remain key constraints on throughput. A few commenters question the “must adopt now” urgency and call for clearer standards or audits to separate useful AI-assisted development from hype-driven narratives.