Dataset Sunset And Hard Removal
Key takeaways
- World Factbook pages were configured to return a 302 redirect to the closure announcement.
- Until 2020 the CIA published annual ZIP file archives of the entire Factbook site, and those archives are available via the Internet Archive.
- A December 10, 2020 Factbook 'What's New' entry reports Nepal and China agreed on Mount Everest's height as 8,848.86 meters, which the Factbook rounds to 8,849 meters and propagates throughout its database.
- Archived versions of The World Factbook could have remained available with a banner noting it is no longer maintained instead of being removed.
- A 384MB 2020 Factbook ZIP archive was extracted into the GitHub repository simonw/cia-world-factbook-2020 and published for browsing using GitHub Pages.
Sections
Dataset Sunset And Hard Removal
The core change is not only cessation of updates but removal of the site and historical archives, implemented in a way that breaks deep links by redirecting all pages to a closure notice. The absence of an official explanation increases uncertainty about permanence expectations for similar public resources and complicates contingency planning.
- World Factbook pages were configured to return a 302 redirect to the closure announcement.
- The CIA has not provided an explanation for why it decided to stop maintaining The World Factbook.
- The CIA has sunset The World Factbook publication.
- The CIA removed the entire World Factbook site, including archives of previous versions.
Preservation And Continuity Paths
Despite removal, the dataset's public-domain status enables lawful mirroring and redistribution. Practical continuity options exist via Internet Archive-hosted official ZIP archives (through 2020) and at least one GitHub-based mirror that restores browseability, creating an immediate workaround for disruption.
- Until 2020 the CIA published annual ZIP file archives of the entire Factbook site, and those archives are available via the Internet Archive.
- A 384MB 2020 Factbook ZIP archive was extracted into the GitHub repository simonw/cia-world-factbook-2020 and published for browsing using GitHub Pages.
- The World Factbook has been in the public domain since it began.
Data Editorial Process And Precision Behavior
The corpus provides an example of how updates were operationalized: ingesting a new agreed measurement, rounding it, and propagating the rounded value throughout the database. This is a specific reminder that the dataset includes editorial normalization choices that can affect downstream precision and change tracking.
- A December 10, 2020 Factbook 'What's New' entry reports Nepal and China agreed on Mount Everest's height as 8,848.86 meters, which the Factbook rounds to 8,849 meters and propagates throughout its database.
Sunsetting Best Practice Expectation
The corpus includes an expectation that archives could have remained online with a staleness banner, contrasting with the observed full removal. This functions as an explicit normative benchmark rather than evidence about why the removal occurred.
- Archived versions of The World Factbook could have remained available with a banner noting it is no longer maintained instead of being removed.
Unknowns
- What rationale (if any) did the CIA have for sunsetting The World Factbook and removing the site and archives?
- Will the CIA restore any part of The World Factbook content (especially archives), or is the removal intended to be permanent?
- Do official bulk archives exist for years after 2020, and if so where can they be obtained?
- How durable and complete are third-party mirrors (e.g., GitHub Pages browsing) relative to the original site structure and datasets?
- To what extent did the 302-redirect configuration break existing citations, automated scrapers, and downstream systems, and what remediation patterns are most effective?