Skip to content
Back to Writing
HR Technology Strategy

The Skill Atrophy Trap.

AI copilots deliver genuine productivity gains – 12% more tasks, 25% faster, measurably higher quality. But the first rigorous experimental evidence shows that workers become 19% less accurate on complex tasks outside AI's capability zone, endoscopists lose 20% of their detection ability after routine AI use, and the cognitive science explaining why has been staring at us for decades. Organizations optimizing for today's speed metrics may be systematically eroding the human judgment they'll need to supervise AI tomorrow.

March 22, 2026
20 min read
Share

Key Takeaways

  • A preregistered experiment with 758 BCG consultants found that AI users completed 12.2% more tasks 25.1% faster with 40%+ higher quality, but were 19% less likely to produce correct solutions on complex tasks outside AI’s capability frontier – the first experimental evidence that productivity gains and skill degradation are two sides of the same coin.

  • The Lancet published the first clinical evidence of AI-induced deskilling: endoscopists who routinely used AI-assisted detection saw their adenoma detection rate drop 20% when working without it, meaning doctors who relied on AI became measurably worse at catching precancerous growths.

  • The cognitive science mechanism is well-established: the generation effect shows that producing your own answers, even wrong ones, creates 0.40 to 1.34 standard deviations better retention than passively receiving correct answers – and AI tools bypass exactly this process every time they produce a first draft.

  • Gartner predicts 50% of global organizations will require AI-free skills assessments by the end of 2026, but only 17% of organizations currently rate their AI implementation as successful and two-thirds haven’t prepared employees to work alongside AI at all – the structural conditions for skill atrophy are already in place.


The Skill Atrophy Trap.

AI makes you faster today and less capable tomorrow – and the evidence is no longer theoretical

A senior HR business partner hasn’t written a compensation analysis from scratch in eight months. Her AI copilot generates them in minutes. When it hallucinates a market rate for a niche robotics engineering role, she almost doesn’t catch it. The number looks right. It sits in the range she’d expect. It isn’t right. She only notices because of a passing comment from a recruiter three weeks ago about that specialty spiking. She flags it, fixes it, moves on. But one question stays with her: how many times in the last eight months has she not caught something she would have spotted a year ago?

She is not imagining things. The first rigorous experimental evidence for what she’s experiencing arrived in March 2026, and it confirms what many knowledge workers have felt but couldn’t prove: AI tools that make you faster at routine work are simultaneously making you worse at the hard work. Not theoretically. Measurably. And the cognitive science explaining why has been sitting in the research literature for decades, waiting for someone to connect it to the most rapid tool adoption in the history of white-collar work.

The Productivity Evidence Is Real

Let’s start with what’s true, because this is not an anti-AI argument.

Ethan Mollick and Fabrizio Dell’Acqua’s landmark study, published in Organization Science in March 2026, deployed AI tools to 758 consultants at Boston Consulting Group in a preregistered field experiment – the gold standard of organizational research. The results on productivity were unambiguous. Consultants with AI access completed 12.2% more tasks, finished them 25.1% faster, and produced output rated 40%+ higher in quality. These numbers held across experience levels, across task types, and across both the “AI only” and “AI plus prompt engineering training” conditions.

This is real. Organizations that have seen similar gains in their own deployments are not hallucinating. AI copilots genuinely accelerate knowledge work on tasks within what Mollick calls the “frontier” – the boundary of what current AI models can handle reliably. Inside the frontier, the productivity story is as good as the vendors say it is. Maybe better.

The problem is what happens at the edge of that frontier, and beyond it.

The 19% Problem: What Happens Outside the Frontier

Mollick’s experiment didn’t just measure speed and output quality. It deliberately included tasks that sat outside AI’s capability zone – problems that looked similar to within-frontier tasks but required the kind of judgment, contextual reasoning, and domain expertise that current models handle poorly. These were the tasks where a consultant’s actual skill mattered most.

On those tasks, consultants who had been using AI were 19% less likely to produce correct solutions than those who hadn’t.

Read that again. The same population that was 12% more productive and 25% faster on routine work became significantly worse at the complex work that most justified their expertise and their compensation. And the effect held regardless of whether they’d received prompt engineering training. Knowing how to use AI well didn’t prevent the skill degradation. Using AI at all was sufficient.

Mollick himself has described the mechanism with characteristic bluntness: “As soon as the AI model is good enough, everyone tends to fall asleep at the wheel – they stop paying attention to what the AI can do and can’t do, and they don’t check the results.” The researchers had deliberately embedded tasks that AI couldn’t handle but that produced plausible-looking output. Consultants with AI access didn’t just fail to catch the errors. They made more errors on these tasks than the control group, because they had shifted from generating analysis to reviewing AI output – and AI output that’s wrong in sophisticated ways is harder to catch than a blank page.

This is the jagged frontier in action. AI capability isn’t a smooth line that recedes evenly as the technology improves. It’s a ragged, unpredictable boundary where AI excels at one task and fails catastrophically at an adjacent one that looks nearly identical. The peaks of AI performance are getting higher. But the valleys – where AI fails and human judgment is the only safety net – aren’t filling in at the same rate. They’re shifting, reshaping, and becoming harder to see precisely because the peaks are so impressive.

The First Clinical Evidence: When Doctors Lose Their Edge

If the BCG study established that skill degradation happens in consulting, the ACCEPT trial established that it happens where the stakes are life and death.

Published in The Lancet Gastroenterology & Hepatology in 2025, the ACCEPT trial was a multicentre observational study across four Polish endoscopy centres involving 1,443 patients. The researchers tracked adenoma detection rates – the ability to spot precancerous growths during colonoscopies – among endoscopists who had been using AI-assisted detection systems and then returned to working without them.

The adenoma detection rate dropped from 28.4% to 22.4%. A 20% decline. Doctors who had been using AI assistance became measurably worse at the core diagnostic skill that defines their specialty. Not because they were less experienced. Not because they were less motivated. Because the AI had been handling the detection work, and the perceptual skill required to spot subtle, flat polyps had degraded from disuse.

This matters beyond gastroenterology for a reason that should keep every CHRO and CIO awake at night: if routine AI use degrades diagnostic skill in medicine – a field with years of specialized training, continuous professional development requirements, high stakes per decision, and a culture of clinical rigor – then it will degrade skills in HR, finance, legal, procurement, and every other knowledge domain where the training is shorter, the feedback loops are longer, and nobody is tracking whether practitioners can still do the work without their tools.

The endoscopist in Kraków who missed that polyp didn’t become a worse doctor overnight. She became a doctor who stopped practicing the skill the AI was handling. That distinction matters because it means the degradation is invisible until the AI isn’t available – a system outage, a novel edge case, a task that falls into one of those valleys on the jagged frontier. And by then, you’ve already lost the capability you needed.

The Workslop Tax: Why Individual Gains Don’t Become Organizational Gains

Here is where the math gets uncomfortable.

If AI makes individual workers 12% more productive and 25% faster, organizations should be seeing massive returns. They aren’t. The MIT Media Lab reported in 2025 that 95% of organizations see no measurable return on investment from generative AI technologies. That’s not a rounding error. That’s a structural disconnect between individual productivity and organizational outcomes.

Part of the explanation comes from BetterUp Labs and the Stanford Social Media Lab, who surveyed 1,150 full-time U.S. desk workers in September 2025. They found that 41% had encountered what researchers call “workslop” – AI-generated output that looks polished on the surface but lacks substance, accuracy, or relevance underneath. Each instance of workslop required nearly two hours of rework to identify and fix. The estimated cost: $186 per worker per month, or roughly $9 million per year for a 10,000-person organization.

The workslop tax explains part of the ROI gap. But the skill atrophy thesis explains a deeper piece: someone has to catch what AI gets wrong. That supervisory function requires exactly the kind of domain expertise, critical judgment, and pattern recognition that routine AI use degrades. When the people checking AI output are themselves becoming less capable of spotting errors – because they’ve been relying on AI for the same kind of analytical work – you get a degrading feedback loop. AI produces more output. Humans catch fewer errors. The errors that slip through create rework. The rework consumes the productivity gains. And the organization wonders why the dashboard says everyone is faster but the quarterly results don’t reflect it.

Microsoft Research documented the cognitive mechanism in a CHI 2025 study of 319 knowledge workers. Higher confidence in AI tools was associated with reduced critical thinking effort. Workers who trusted GenAI the most were the least likely to question its output. The shift was behavioral, not just attitudinal: workers moved from generating original analysis to verifying AI analysis, and verification is a cognitively thinner activity than generation. You catch less because you’re doing less cognitive work, and you’re doing less cognitive work because the tool has trained you to expect it to be right.

”Forgetting How to Think”: What 80,000 Users Are Telling Us

The workers experiencing this aren’t unaware of it. They’re alarmed.

In March 2026, Anthropic published findings from the largest AI user study ever conducted – 80,508 Claude users across 159 countries and 70 languages. The headline finding was that 89% identified at least one major concern about AI. But the data point that should command HR attention is more specific: 16% of respondents independently cited “cognitive decline – losing the ability to think critically” as a top concern. These weren’t researchers or policymakers speculating about risks. These were daily users describing what they were experiencing.

The study revealed what the researchers called a dependency paradox. Users who valued AI most for emotional and thinking support – the people most deeply integrated with the tool, most reliant on it for cognitively demanding work – were three times more likely to fear becoming dependent on it. The people using AI most intensively are the ones sounding the alarm about what it’s doing to them. They can feel the skill erosion. They just can’t stop it, because the productivity benefits are immediate and the skill costs are deferred.

This is the behavioral signature of a dependency loop. The short-term reward (faster, easier work) reinforces the behavior (using AI for everything), while the long-term cost (degraded capability) accumulates invisibly until a moment of crisis reveals it. If that pattern sounds familiar, it’s because dependency loops are among the most studied phenomena in behavioral science. The difference here is that the dependency is organizational, not just individual. When an entire team or function becomes dependent on AI for cognitive work, the collective capacity to catch AI failures degrades in lockstep.

The Cognitive Science: Why This Was Entirely Predictable

The mechanism behind skill atrophy isn’t mysterious. Cognitive science identified it decades before anyone imagined AI copilots.

The generation effect is one of the most replicated findings in learning research. It shows that producing your own answers – even incorrect ones – creates dramatically better retention and deeper understanding than passively receiving correct answers. The effect sizes are large: 0.40 to 1.34 standard deviations, depending on the task and the delay before testing. And critically, the benefit grows over time. At longer retention intervals, self-generated knowledge shows effect sizes of 0.64 to 1.34 standard deviations compared to information that was simply received. The struggle of producing your own analysis isn’t a cost to be optimized away. It’s the mechanism through which expertise develops.

AI tools do the exact opposite. They produce first drafts, generate analyses, write recommendations, and deliver answers – bypassing the cognitive effort that creates durable skill. Every time a knowledge worker accepts an AI-generated first draft instead of writing their own, they skip the generative processing that would have strengthened their understanding of the domain. One skipped draft doesn’t matter. A thousand skipped drafts over twelve months produces the 19% error increase Mollick measured.

The pattern has analogs in other domains with longer observational data. A longitudinal study published in Nature Scientific Reports tracked GPS users over three years and found a negative correlation (r = -0.22) between GPS use and spatial memory, with steeper hippocampal-dependent memory decline among heavy users. The parallel is almost too clean: a tool that handles a judgment-requiring cognitive task (navigation) reliably enough that users stop practicing the underlying skill (spatial reasoning), resulting in measurable degradation of that skill over time.

The SBS Swiss Business School measured the relationship directly in knowledge work: a correlation of r = -0.75 (p<.001) between AI tool dependence and critical thinking scores. That’s a strong negative relationship. And younger workers – the ones who’ve grown up with AI tools and used them most naturally – showed stronger AI dependence and correspondingly lower critical thinking scores. The generation entering the workforce with the most AI fluency may also be entering it with the least practice in independent analytical reasoning.

The Apprenticeship Crisis: Who Will Supervise AI in 2030?

This brings us to Mollick’s most provocative claim for HR audiences, and the one with the longest tail of consequences.

White-collar expertise has always developed through an informal apprenticeship system. Junior analysts learn to build financial models by building bad financial models and having senior analysts tear them apart. First-year lawyers learn to draft contracts by drafting contracts that partners redline into oblivion. New consultants learn to structure client recommendations by watching their work get restructured, over and over, until the pattern becomes intuitive. The process is slow, expensive, and inefficient. It also works, because it forces novices through the generative struggle that the cognitive science says is essential for skill development.

AI is breaking that system from both ends. At the bottom, junior professionals use AI to produce output that looks senior. A first-year analyst submits a competitor analysis their manager calls the best first draft they’ve seen from someone at that level. The analyst doesn’t mention that Claude wrote 80% of it. The manager doesn’t probe, because the output is good and the deadline is met. Six months later, staffed on a project where AI tools are prohibited for data security reasons, the analyst stares at a blank slide. They know what a good analysis looks like – they’ve reviewed dozens the AI produced. They just can’t produce one themselves. They’ve never had to.

At the top, senior professionals use AI to speed through the review work that used to be the primary mechanism for transferring knowledge. When a partner AI-generates the first draft of a client presentation and hands it to a junior associate for “polishing,” the junior associate learns how to polish AI output, not how to construct an argument from evidence. The apprenticeship has been hollowed out.

Mollick has named this directly: “The big education crisis caused by AI is not going to be in schools, but after graduation, as white-collar work is secretly based on an apprenticeship system that will break.” Harvard Business School research reinforces the point: AI boosts novice productivity but cannot turn novices into experts. The tools that enable performance are inhibiting development.

The Copilot data from GitHub makes the software engineering version of this visible: AI now writes 46% of average developer code, with figures reaching 61% in Java. An ACM study of computer science students using Copilot found they completed tasks 34.9% faster with 50% more progress – but reported not understanding how or why the AI suggestions worked. Their workflow shifted from read, understand, implement to prompt, accept, implement. The understanding step, where expertise forms, was optimized out.

Now project forward five years. If today’s junior professionals aren’t developing expertise through practice – because AI handles the practice – who will possess the judgment required to supervise AI in 2030? Who will catch the hallucinated market rate, the flawed legal analysis, the missed polyp? The supervisory capacity an organization needs tomorrow is built by the apprenticeship experiences it invests in today. And right now, most organizations are investing in speed.

The Organizational Response Gap

The data on what organizations are actually doing about this is not encouraging.

SHRM’s AI+HI Project 2026 found that only 17% of organizations describe their AI implementation as “highly successful.” Two-thirds report that their organizations haven’t proactively prepared employees to work alongside AI. Only 35% of leaders feel they’ve prepared employees effectively for AI roles. The gap between deployment velocity and workforce readiness is the structural condition that makes skill atrophy not just possible but inevitable. You can’t maintain skills you aren’t deliberately protecting in an environment that rewards speed over capability.

IDC projects that 90%+ of global enterprises will face critical AI skills shortages by 2026, with sustained gaps threatening $5.5 trillion in losses from delayed products, missed revenue, and impaired competitiveness. Only one-third of employees received any AI training in the past year. The skills most at risk aren’t AI skills – they’re the human skills that AI is quietly degrading: critical thinking, unassisted analytical reasoning, domain judgment, and the ability to produce original work without a copilot.

Gartner’s prediction that 50% of global organizations will require AI-free skills assessments by end of 2026 signals that the market has begun pricing in the risk. But an assessment without a maintenance strategy is just a measurement of the damage already done. Knowing that your senior analysts can’t produce unassisted work at their pre-AI level doesn’t help if you haven’t built the workflow structures that prevent the degradation in the first place.

Building a Skill Maintenance Architecture

The evidence points to five interventions that address the mechanism, not just the symptom.

Map the frontier, task by task. Mollick’s core contribution is the insight that AI capability is jagged, not uniform. Every function – HR, finance, legal, IT – needs to identify where AI is reliable and where it fails in their specific domain. Deploy AI aggressively for productivity on within-frontier tasks. Deliberately preserve human practice on tasks at the frontier’s edge and beyond it. This isn’t intuitive, because the tasks near the edge are the ones where AI output looks most convincingly correct. That’s exactly why human practice on those tasks is most critical to maintain.

Preserve the generation effect. The most actionable finding from the cognitive science: don’t let AI produce first drafts for skill-critical work. Have humans draft first, then use AI for editing, enhancement, and expansion. The cognitive cost of producing a rough first draft is precisely what builds the expertise that enables someone to catch errors in a polished AI draft. Mollick models this in his own workflow – generating his own analysis before checking it against AI output, rather than starting from AI output and trying to verify it. The sequence matters. Starting from your own thinking builds skill. Starting from AI’s thinking degrades it.

Redesign apprenticeship for the AI era. Mollick’s proposal deserves serious consideration: make “managing the machine” the new apprenticeship. Instead of having junior professionals use AI to produce expert-level output, have them use AI as a sparring partner – generating their own work, comparing it to AI output, analyzing where and why they differ, and building critical judgment through that comparison. The AI becomes a training environment rather than a production shortcut. This requires deliberate design and managerial commitment. It’s slower than letting juniors use AI as a crutch. It’s also the only approach consistent with actually developing the next generation of experts.

Institute AI-free assessment checkpoints. Following Gartner’s prediction, build periodic unassisted evaluations into professional development. Not as a Luddite gesture or a gotcha exercise, but as a diagnostic tool for skill maintenance, the way pilots are periodically required to demonstrate manual flying proficiency. If the assessment reveals degradation, that’s a signal to adjust the workflow, not to blame the individual. The goal is visibility into skill health at the organizational level, so that degradation is caught before it reaches the supervisory layer.

Measure skill health, not just productivity. The current dashboard tracks tasks completed, time to completion, and output quality. None of those metrics capture whether the human supervisory layer is maintaining the capability required to catch what AI gets wrong. Organizations need leading indicators of skill health: error detection rates on deliberately seeded test cases, unassisted performance benchmarks over time, and qualitative assessment of analytical depth in human-generated work. If you only measure speed, you’ll optimize for speed. And speed without judgment is how organizations produce confident, polished, wrong answers at scale.

The Real Question

Every organization deploying AI is making an implicit bet about the relationship between productivity and capability. One path treats them as aligned: AI makes workers faster, faster workers are more productive, more productive workers are more valuable. This path leads to maximum AI deployment, minimum friction, and an assumption that human skill will maintain itself through some combination of residual practice and good intentions.

The other path recognizes what the evidence now shows – that productivity and capability can diverge, that the tools producing today’s speed gains are consuming tomorrow’s judgment capacity, and that maintaining human expertise in an AI-augmented environment requires the same deliberate, structural investment that maintaining any critical asset requires. This path is slower. It’s more expensive in the short term. It requires organizational design work that no vendor can sell you.

The consultants in Mollick’s experiment who were 25% faster and 19% less accurate didn’t know they were less accurate. The endoscopists who missed 20% more polyps didn’t feel less skilled. The junior analyst who can’t produce an unassisted analysis doesn’t realize what they never learned. Skill atrophy is silent until the moment it isn’t, and by then, the cost is measured in decisions that nobody in the room was equipped to question.

The organizations that navigate this well will be the ones that refuse to treat AI productivity and human capability as the same metric. They are not the same metric. They may, in fact, be inversely correlated. And the sooner HR and IT leaders internalize that finding, the sooner they can start building the architectures that preserve both.