Rosa Del Mar

Issue 37 2026-02-06

Rosa Del Mar

Daily Brief

Issue 37 2026-02-06

Dataset Sunset And Hard Removal

Issue 37 Edition 2026-02-06 4 min read
General
Sources: 1 • Confidence: Medium • Updated: 2026-02-06 16:59

Key takeaways

  • World Factbook pages were configured to return a 302 redirect to the closure announcement.
  • Until 2020 the CIA published annual ZIP file archives of the entire Factbook site, and those archives are available via the Internet Archive.
  • A December 10, 2020 Factbook 'What's New' entry reports Nepal and China agreed on Mount Everest's height as 8,848.86 meters, which the Factbook rounds to 8,849 meters and propagates throughout its database.
  • Archived versions of The World Factbook could have remained available with a banner noting it is no longer maintained instead of being removed.
  • A 384MB 2020 Factbook ZIP archive was extracted into the GitHub repository simonw/cia-world-factbook-2020 and published for browsing using GitHub Pages.

Sections

Dataset Sunset And Hard Removal

The core change is not only cessation of updates but removal of the site and historical archives, implemented in a way that breaks deep links by redirecting all pages to a closure notice. The absence of an official explanation increases uncertainty about permanence expectations for similar public resources and complicates contingency planning.

  • World Factbook pages were configured to return a 302 redirect to the closure announcement.
  • The CIA has not provided an explanation for why it decided to stop maintaining The World Factbook.
  • The CIA has sunset The World Factbook publication.
  • The CIA removed the entire World Factbook site, including archives of previous versions.

Preservation And Continuity Paths

Despite removal, the dataset's public-domain status enables lawful mirroring and redistribution. Practical continuity options exist via Internet Archive-hosted official ZIP archives (through 2020) and at least one GitHub-based mirror that restores browseability, creating an immediate workaround for disruption.

  • Until 2020 the CIA published annual ZIP file archives of the entire Factbook site, and those archives are available via the Internet Archive.
  • A 384MB 2020 Factbook ZIP archive was extracted into the GitHub repository simonw/cia-world-factbook-2020 and published for browsing using GitHub Pages.
  • The World Factbook has been in the public domain since it began.

Data Editorial Process And Precision Behavior

The corpus provides an example of how updates were operationalized: ingesting a new agreed measurement, rounding it, and propagating the rounded value throughout the database. This is a specific reminder that the dataset includes editorial normalization choices that can affect downstream precision and change tracking.

  • A December 10, 2020 Factbook 'What's New' entry reports Nepal and China agreed on Mount Everest's height as 8,848.86 meters, which the Factbook rounds to 8,849 meters and propagates throughout its database.

Sunsetting Best Practice Expectation

The corpus includes an expectation that archives could have remained online with a staleness banner, contrasting with the observed full removal. This functions as an explicit normative benchmark rather than evidence about why the removal occurred.

  • Archived versions of The World Factbook could have remained available with a banner noting it is no longer maintained instead of being removed.

Unknowns

  • What rationale (if any) did the CIA have for sunsetting The World Factbook and removing the site and archives?
  • Will the CIA restore any part of The World Factbook content (especially archives), or is the removal intended to be permanent?
  • Do official bulk archives exist for years after 2020, and if so where can they be obtained?
  • How durable and complete are third-party mirrors (e.g., GitHub Pages browsing) relative to the original site structure and datasets?
  • To what extent did the 302-redirect configuration break existing citations, automated scrapers, and downstream systems, and what remediation patterns are most effective?

Investor overlay

Read-throughs

  • Hard removal with universal redirects highlights operational risk for teams depending on public datasets, potentially increasing demand for resilient data sourcing, archiving, and link stability practices.
  • Public domain status plus third party mirrors suggests continuity can shift from official hosting to community redistribution, influencing how organizations assess provenance, update cadence, and compliance for reference datasets.
  • Rounding and propagation behavior in updates signals that editorial normalization can alter precision and change tracking, reinforcing the need for downstream validation and versioning when datasets are refreshed or mirrored.

What would confirm

  • Public reports of broken deep links, failed scrapers, or disrupted downstream systems tied to the 302 redirects, followed by remediation work such as rewriting citations or swapping to mirrors.
  • Publication of additional third party mirrors or tools that restore browseability and bulk access, alongside evidence that these mirrors are being adopted as stopgap sources.
  • An official rationale from the CIA or any official restoration path such as re hosting archives or providing post 2020 bulk files that clarifies permanence and access expectations.

What would kill

  • CIA restores The World Factbook content or archives with stable deep links or a staleness banner approach, materially reducing reliance on third party mirrors.
  • Official bulk archives for years after 2020 become available in an authoritative channel, lowering uncertainty about continuity and provenance.
  • Evidence emerges that the redirects did not materially impact citations, automated scrapers, or downstream systems, indicating limited practical disruption.

Sources