npm Can't Fix This From the Outside — Blog | Filipe Brito Ferreira

Last week npm shipped 2FA-gated publishing. Eight months too late, two years behind PyPI. The pattern is structural: npm always patches last because the registry is closed source. The fix is three tiers, automated scanning, and opening the source.

Close-up black-and-white photograph of a heavy padlock fastened to the bars of a rusty metal gate

I’d been watching the Shai-Hulud worm propagate for about an hour before I realised what it was actually doing. A compromised npm package, once installed, would unpack itself, run TruffleHog against the developer’s environment to scrape every token it could find, and then use those tokens to authenticate as the developer and publish backdoored versions of every other package that developer maintained. The first run took human effort. The second was automated. By November 2025 the second variant of the worm had moved execution from post-install to pre-install, taken over 796 npm packages totalling 20 million weekly downloads, and exposed 25,000 GitHub repositories across 350 maintainers. The worm didn’t need a person in the loop after the first compromise.

Eight days before Shai-Hulud first ran, the Qix maintainer was phished by a convincing 2FA-reset email from a fake npmjs.help domain. The attacker took the username, the password, and a live TOTP code, then pushed malicious versions of debug, chalk, and sixteen other utilities with a combined 2.6 billion weekly downloads. The payload was a browser-only crypto-wallet drainer. The community detected it inside two hours. Both attacks shipped through the same registry, both unreviewed, both ungated.

Last week, on 22 May 2026, GitHub announced staged publishing and 2FA-gated install controls for npm. Eight months after Qix. Two years and four months after PyPI made 2FA mandatory for every maintainer. The fix isn’t wrong. It’s late, and it always will be, because the registry that needs fixing isn’t open source.

Yes, We’ve Heard This Before

Every registry has had compromises. PyPI shipped malicious packages in 2017 and again in 2022. RubyGems lost a maintainer to credential theft in 2020. Maven Central has been a phishing target since the Log4Shell era. The maintainer-as-sole-defender model is how open-source registries have always worked, and for two decades it worked well enough.

So why is npm different? Two things changed. The install count and the automation.

In March 2016 a developer un-published a package called left-pad, eleven lines that prepended characters to a string, and it broke Babel, React, and a third of npm builds globally. It was a single-maintainer decision. The blast radius was already too big for a single-maintainer decision to be sane. npm Inc reverted the un-publish a few hours later and changed the policy so a package with active dependents couldn’t be removed unilaterally. That was the first time the registry treated a package as critical infrastructure. It was also one of the only times.

The 2025 incidents are the next data point in the same series. Shai-Hulud didn’t depend on a person. It took the first stolen token and rolled itself across the rest of the maintainer’s portfolio at machine speed. The 25,000 affected GitHub repositories weren’t 25,000 separate compromises. They were one compromise that travelled. The blast radius isn’t even maintainer-scale any more. It’s tooling-scale, because the worm is now part of the tooling.

The shape repeats. I wrote last year about the games industry collapse: short-term thinking, growth-at-all-costs, no investment in longevity. The registry layer is now living the same arc, and the new thing is the asymmetry. Each year we add more dependencies per project, each year more of the security work falls on volunteers paid in goodwill, and each year the cost of the next compromise rises faster than the registry’s defence budget. The Qix payload took two hours to detect because Socket and StepSecurity happened to be watching; in 2027 the worm will be obfuscated, and “two hours” will be a story we tell about the last cycle. Nothing about 2026 is fundamentally a new problem. The new thing is what the problem can do unsupervised, and how fast the defence budget is falling behind.

What npm Just Shipped (And Why It Isn’t Enough)

To be fair to GitHub, the registry team has been moving for a year. Trusted publishing via OIDC went GA in July 2025. Sigstore-backed provenance attestations attach automatically when you publish that way. Long-lived classic tokens were disabled in November and revoked entirely in December 2025, with the message to maintainers being: stop using shared secrets, switch to ephemeral OIDC tokens, or use granular tokens with scoped permissions and a 90-day lifetime cap.

The 22 May 2026 announcement extended that with staged publishing. In one sentence: a leaked CI token can no longer silently ship a malicious release without the package’s human maintainer approving the publish. The flow goes: your CI publishes a release to a staging slot, the registry waits for the human maintainer to approve via a 2FA challenge, only then does the package become installable. For trusted publishing setups, the OIDC configuration can be locked to stage-only, so a leaked workflow can’t bypass the human gate. It’s a sensible upgrade. It also means that, as of this week, the publishing flow that pushes a tier-1 dependency to two billion weekly installs is finally as scrutinised as the one PyPI shipped for every maintainer in January 2024.

That’s the problem. The fix isn’t wrong. The fix is two years and four months late.

PyPI made 2FA mandatory for every project owner on 1 January 2024. No install-count threshold, no opt-out, no warning period beyond the announcement. crates.io shipped trusted publishing for the Rust ecosystem in July 2025 and added a per-crate security advisory tab a few months later. Both ecosystems published joint disclosure protocols with the Python Software Foundation and the Rust Foundation taking the legal load. Neither saw a measurable drop in publishing rates.

npm took the same step in stages across thirteen months. Trusted publishing went GA roughly a year after PyPI’s mandatory 2FA. Token revocation came nineteen months later. Staged 2FA-gated publishing arrives in May 2026, after Qix and after Shai-Hulud, after two billion weekly downloads worth of compromise. The pattern isn’t “GitHub is slow”. The pattern is that every fix has to ship through one closed-source product team, on one release cadence, with one set of priorities, and the community that finds the holes has no way to upstream the fix. The next worm is already being written. The next fix is being roadmapped. The race is the structure, not the speed.

The Three-Tier Proposal

The proposal is simple to state and uncomfortable to defend. Tier the registry by install count. Apply the friction proportional to the blast radius. Stop pretending lodash and your weekend project need the same publishing rules.

Tier 1: foundational packages. Roughly 250 packages with more than 5 million weekly downloads. This is the set the rest of the ecosystem can’t function without: lodash, debug, chalk, semver, axios, react, react-dom, the typescript compiler, the eight or ten packages every modern toolchain depends on. Every new release of a tier-1 package goes through a human-reviewed publishing gate, App Store style. The review is structural: provenance attestation present, change diff against the previous release within a sane bound, install scripts inspected, maintainer signature verified against an established history. Apple’s App Store review team is the existing proof that a human team can process developer-facing review queues at scale, and that’s a feasibility precedent, not a security one. App Store review catches policy violations, not obfuscated payloads, and the App Store ships its share of stalkerware and crypto-drainers anyway. The tier-1 npm queue would be the size of an OpenJS Foundation working group, not an Apple-scale operation. Its reviewers need to read JavaScript and trace install scripts; large security backports get a fast-track so emergency CVE responses don’t sit behind routine review; ambiguous cases escalate to a published security-advisory queue. Staffing it is a budget decision, not a feasibility one.

Tier 2: high-impact packages. Roughly 5,000 packages with more than 1 million weekly downloads. No human review. What’s mandatory instead: 2FA-gated publishing (the staged flow that just shipped), Sigstore provenance attestation on every release, an automated malware scan against the build artefact, and a freeze period of a few hours between publish and install eligibility so the community detection that caught the Qix payload in under two hours actually has time to fire. Tier 2 is where the registry does the lifting after a one-time maintainer migration. Wiring up OIDC trusted publishing and provenance attestation is a weekend of work, not ten minutes. The registry should ship a bulletproof migration guide and a working GitHub Actions template before the threshold becomes mandatory, plus a rollback-and-support path for failed publishes outside business hours. Then leave the maintainer alone.

Tier 3: the long tail. The other three million packages. Status quo. Opt-in provenance, opt-in scanning, the same publishing flow that’s existed for a decade. The “anyone can publish” promise survives where the blast radius is small enough to justify it. A solo maintainer publishing their first npm package shouldn’t have to clear an Apple-style review queue. They also shouldn’t be one stolen token away from compromising a quarter of the internet.

The thresholds in this proposal are illustrative, not load-bearing. Socket, Snyk, and the OSSF have already published registry-wide install-count distributions; the natural breaks in that data should set tier boundaries, not the round numbers in this post.

Whatever the final cuts, publish them. Gate every transition with a notice period and a maintainer appeal path. No maintainer wakes up to find their package silently tier-bumped overnight. Adjust the thresholds quarterly based on real attack data. What’s not arguable is the tiering itself. Treating every package as equivalent has been the registry’s design choice since 2010, and every supply-chain attack of the last five years has exploited that choice. The reform is to stop treating identical-looking entries in a directory as if they carry identical risk.

The closest existing precedent is the Certificate Authority ecosystem. Anyone can run a CA in principle. In practice, a tiered trust model — root CAs reviewed by browser vendors, intermediate CAs operating under audited policies, end-entity certificates issued automatically — has kept the web’s PKI working for two decades. The registry needs the same shape: a small reviewed core, a larger automated middle, a permissive edge. None of that requires npm to stop being open. It requires npm to stop treating “open” as a synonym for “unreviewed at every layer”.

Scanning Is Solved, We Just Don’t Run It

The hardest objection to tier 2 is “automated scanning will miss things, what’s the point”. The literature disagrees, and has disagreed for four years.

AMALFI, an ML-based malicious-package detector from 2022, scored below 1% false positives on its evaluation set of real npm packages, at high recall, scanning a package in seconds. Whether those numbers hold against the adversarial distribution of packages actually submitted to npm in 2026 is an open question, but it’s the kind of open question that gets answered by running the scanner server-side at publish time, not by paywalling it inside a vendor product. ACME generates signatures from representative malicious-code fragments and scans the registry for new matches. The LLM-based detection paper from 2024 achieved comparable precision using a cheaper inference cost than the ML pipeline. The OSSF Scorecard project catalogues repository-side dangerous-workflow patterns that catch the build-side half of supply-chain attacks.

Socket runs a commercial version of this stack as a paid product on top of the public registry. So does Snyk. So does Endor Labs. The detection capability is built, measured, and sold. The only place it doesn’t run is server-side at publish time on the registry itself.

The structural reason is that npm has no way to land community-built scanning rules. Socket’s rule set is Socket’s intellectual property. Snyk’s is Snyk’s. None of them upstream into the registry, because there’s nowhere upstream to land. A maintainer who writes a new heuristic for detecting credential-harvesting payloads after the next Shai-Hulud variant has the choice of publishing a blog post about it, selling it to one of the scanning vendors, or shouting into the void. The capability exists. The contribution path doesn’t.

Move the scanning server-side at publish time. Make the rules pluggable. Accept community PRs against the rule set the same way the Linux kernel accepts driver patches. The false-positive risk is real and the way you manage it is exactly the way every other critical-infra project manages it: a maintainership track, a review process, a deprecation pathway for noisy rules. The cost of running scanning at registry scale is now a rounding error against what GitHub already spends on the registry’s CDN. The hard problem isn’t compute. It’s governance, and governance lives upstream of the closed-source decision.

“But This Will Kill Open Source”

The honest counter-argument is that tiering, scanning, and review will gatekeep open source out of existence. The maintainer of event-stream was a single volunteer who handed it off to the wrong person because he was burned out, and the result was the 2018 cryptocurrency-wallet-targeting compromise that took weeks to clean up. The fear is that more friction means more maintainers walk away, more handoffs go wrong, and more “anyone can publish” turns into “almost nobody can”.

The fear is reasonable. The data doesn’t support it.

PyPI made 2FA mandatory for every project owner on 1 January 2024, with no install-count threshold and no opt-out. The PyPI publishing rate didn’t drop. Maintainer turnover didn’t spike. The transition cost the average maintainer about ten minutes to enrol a YubiKey, and that’s the specific evidence that 2FA enrolment doesn’t chase off volunteers. It is not evidence that an App Store-style review queue, an OIDC migration, or a provenance attestation gate would be equally tolerated. Each of those is a separate empirical question, and the tier system above is where those questions get asked at the right scale rather than as a single yes/no for the whole registry.

Manual review at tier 1 is a sharper objection, but the tier-1 set isn’t a lone-maintainer pipeline. lodash has eight maintainers and a Slack channel. React has Meta paying the bill. The typescript compiler is owned by Microsoft. The packages large enough to land in tier 1 are large enough that bus factor is already the bigger threat than reviewer friction. App Store review isn’t built for indie developers either, and there are about 30 million Apple developer accounts shipping under that model.

The tier 3 long tail is where the “anyone can publish” ethos lives, and the tier 3 long tail keeps its current rules. The proposal isn’t to make open source harder. It’s to stop pretending the foundational hundred carry the same publishing risk as a personal weekend project. That’s the actual gatekeeping happening now: a registry that treats all submissions identically and lets a phishing email turn lodash into a wallet drainer. The current model already gates. It just gates with the lock pointing the wrong way.

The Closed-Source Problem

Two hours. That was the gap between the Qix malicious release hitting the registry and the community identifying it on GitHub. The registry’s own public-facing detection lagged that community signal by hours. Socket caught it. So did StepSecurity. So did half a dozen independent researchers running their own scanning pipelines. By the time the official npm response cycle rolled, the community had already mapped the payload, traced the phishing domain, and notified the maintainer.

Same shape for Shai-Hulud. Wiz, Datadog, Palo Alto Unit 42, and Microsoft Security Response each published full analyses within a day of the worm’s first run. The CISA alert went out on 23 September 2025. The first community-built mitigation scripts hit GitHub within hours. None of that detection capability lives inside the registry.

The reason it doesn’t is that the registry server is closed source. Verdaccio and Nandu exist as open-source private-mirror servers, which proves both the appetite and the technical feasibility. Neither is a drop-in npm replacement, because the npm Inc-operated registry has unique data, unique scale, and unique deployment plumbing that can’t be replicated externally without GitHub’s cooperation. So the contribution path stops at “fork our cache and run a private mirror”. It never reaches the public registry. The community is left building paid products on top, not patches underneath.

Source-availability alone doesn’t unblock governance — PyPI’s Warehouse codebase is public and PyPI still ships its own features on its own cadence. What open source does is make the policy conversation legible and create a fork option if a registry’s choices stall. The structural fix here is a community-governed contribution process layered on top of an open-sourced registry server, in the shape of a Linux Foundation or Rust Foundation working group rather than a single product team. The incentive question is real: the registry has been npm Inc’s moat since 2014, and the foundation model only works if its dominant commercial users (GitHub, Microsoft, the AWS and Cloudflare ecosystems that ship npm packages all day) accept the cost. That conversation isn’t dodged by calling for open source. It starts with GitHub publishing a registry-server licence and a public RFC process, and Microsoft accepting that a foundation-governed registry is worth more than an internal product team because the next worm crosses ecosystems and the on-call ends up at GitHub anyway.

The proprietary commercial layer (paid mirrors, enterprise integrations, the npm Inc operations team) keeps its commercial logic intact. The registry-server code that decides how a publish becomes installable becomes a public good, the way the Linux kernel is a public good underneath every commercial Linux distribution. The Shai-Hulud worm authors didn’t need npm’s cooperation to find the holes. The community that defended against them shouldn’t need npm’s cooperation to fix them.

What to Do on Monday Morning

None of the fixes here are revolutionary. They’re the basics every other critical-infrastructure ecosystem has already shipped. Skip the platitudes; here’s what to actually do on Monday morning.

For developers. Turn on --ignore-scripts for everything except your immediate dependencies, and audit which packages actually need install hooks. Pin minor versions on production-critical paths. Read the package-lock.json diff before you merge a Renovate PR; treat unsigned tier-2 packages as a flag the same way you’d treat a dependency from a maintainer you’ve never heard of.

For OSS maintainers. Move to trusted publishing via OIDC this week, before npm makes it mandatory. Stop using long-lived tokens. Enable the May 2026 staged publishing flow for any package above the high-impact threshold. If your package has more than a million weekly downloads, you’re tier 2 whether the registry says so yet or not, and your publishing flow should match.

For security engineers. Run npm audit signatures in CI on every install, and reject builds where a tier-2 dependency lacks a verifiable Sigstore attestation. Maintain an internal block-list of packages that fail OSSF Scorecard checks. The capability is free, the rules are public, and the work that doesn’t get done here lands on the on-call engineer at 2 a.m.

For engineering leadership. Budget for a private mirror. Verdaccio, Nexus, or Artifactory; pick one, run it. Realistic cost is two to four engineer-weeks of setup plus a day a month of ongoing maintenance, a small fraction of the IBM 2024 cost-of-a-data-breach study’s $4.88M average, and well under the on-call cost of a single multi-hour registry outage. The upside is your production builds don’t break the next time the registry takes hours to roll a malicious package back, and when a malicious release does sneak through you’re rolling forward from a known-good cache rather than a public registry your security team can’t see into. Same argument as a database replica. You wouldn’t run production off a single Postgres instance either.

For npm Inc and GitHub. Tier the registry. Scan by default at publish time. Open-source the registry server. The “anyone can publish” promise lives at tier 3, where it belongs. The foundational hundred carry too much load to keep that promise unconditional.

For PyPI, crates.io, Maven Central, and RubyGems. You’re ahead. Publish a joint disclosure protocol with named legal counsel. The next worm crosses ecosystems, because the maintainers do. The first incident that hits Python and Rust simultaneously will need a response plan that already exists, not one that’s being drafted while the worm spreads.

The maintainer was never supposed to be the last line of defence. The closed-source registry is why they still are. Two hours of community detection beat the registry’s own response on Qix. A week after the May 2026 update, the lock on the door is finally turned. The wall is still missing.