Hands are reviewing colorful files inside a filing cabinet.

Here’s why MuckRock and POGO had to archive FOIAonline

Transparency organizations publish nearly 34,000 documents after officials failed to ensure information would remain accessible.

This analysis was produced in partnership with the Project on Government Oversight.

Last month, the Environmental Protection Agency (EPA) dismantled a vital tool for transparency when it decommissioned FOIAonline.gov, an online resource that allowed the public to make and track Freedom of Information Act (FOIA) requests to over 20 federal agencies, and to view responsive documents. The EPA, which oversaw FOIAonline on behalf of participating agencies, claims to have fulfilled over 1.5 million requests and attracted 34,000 active registered users over the decade-plus that the portal was operating. But while the decommissioning of FOIAonline has been in the works for several years, it still remains unclear when the public can expect access to these records to be restored by government agencies, if ever. In the interim, POGO and MuckRock have partnered to host a publicly available archive of nearly 34,000 documents captured before FOIAonline was shuttered.

Two years ago, in October of 2021, the EPA released a proposed communication plan for informing agencies, partners and the public about their intention to sunset FOIAonline. While the EPA serviced and operated FOIAonline, over 20 federal agencies, including the Federal Communications Commission (FCC), the National Archives and Records Administration (NARA) and the U.S. Department of Commerce, used it to process requests.

While the plan set deadlines for participating agencies to migrate their data from the system, it did not set any requirements for when agencies must have their own FOIA processing portals and reading rooms available, let alone when those resources would need to be populated with the data that was available on FOIAonline.

Even if the plan had included such guidelines, enforcing this kind of timetable would be outside the EPA’s purview. But no other office with broader responsibility for government transparency, for example the Office of Information Policy within the Department of Justice, appears to have stepped up to ensure continued or even future access to these records.

Moreover, despite a 1994 requirement that agencies publish documents commonly requested via FOIA in electronic reading rooms, MuckRock has noted that these public resources are inconsistent across agencies, both in terms of information quality and user experience. The lack of reliability across FOIA reading rooms reinforces the potential value of a central government portal like FOIAonline, or the newer FOIA.gov, where improvements to a single access point can immediately improve the FOIA process for all participating agencies.

In the final weeks leading up to the public shuttering of FOIAonline, once it became clear the government had no plan to restore access to these records in a timely fashion, POGO and MuckRock captured over 110 gigabytes of documents from seven agencies and posted them to DocumentCloud, MuckRock’s platform that allows journalists and researchers to organize and search primary source documents. Documents from four agencies — the EPA, the National Labor Relations Board, the General Services Administration, and the Defense Logistics Agency, the procurement and supply wing of the Department of Defense — represent the bulk of the archive and may be of significant public interest.

Now that these files have been uploaded to MuckRock’s servers, the public can leverage the full suite of analytical tools available in DocumentCloud. The documents have been processed using optical character recognition, which greatly enhances accessibility. It’s now possible to search within the text across the documents, similar to a Google search, and even use advanced search techniques.

In addition to preserving the FOIAonline archive on DocumentCloud, MuckRock is also making it available via IPFS and Filecoin, new technologies that help distribute hosting across multiple providers to ensure their long-term accessibility.

The new archive hosted by MuckRock, while substantial, is far from complete, and it represents only a fraction of the 1.5 million records requests fulfilled over the lifetime of FOIAonline. This is due to several factors. First, due to technical limitations on FOIAonline, POGO was unable to archive any requests older than the previous 10,000 requests. (A complementary effort by Ed Summers to archive information about FOIAonline preserved significant metadata about the requests, but not the files themselves.)

Second, several agencies which had participated in FOIAonline migrated their data off of the site before the sunset date, including Customs and Border Protection and the Nuclear Regulatory Commission. For these agencies, either by design or for technical reasons, many of the previously released documents were no longer hosted on FOIAonline by the date of our extraction. Finally, POGO also found instances where agencies participating in FOIAonline released records to requesters, but didn’t post any corresponding records on the site.

The records that we were able to recover also have another notable limitation: POGO reviewed hundreds of records posted to FOIAonline and found that the majority of them did not include the original FOIA request as part of the release, which limits our understanding of how and why the released documents matter.

To aid users in making sense of these documents, POGO and MuckRock have “tagged” them by agency and year and have provided the most recent FOIA request tracking number available, so that documents can be more easily matched to information available elsewhere, such as in available FOIA logs. The tagging system also empowers users to limit searches within the collection to just a specific agency they are interested in, such as searching for only EPA documents that mention “toxins” or only Navy documents that include the term “unidentified.”

Perhaps most importantly of all, POGO and MuckRock’s FOIAonline archive preserves this information outside of the government infrastructure, where it cannot be deleted or removed by the agencies without notice.

POGO and MuckRock invested time and resources to create this archive in order to prevent a wealth of public information from disappearing. Relying on homegrown solutions to problems created by federal agencies which had at least two years to implement a more practical solution is both unsustainable and leaves the public with only a fraction of the information to which it is entitled. It also calls into question the government’s commitment to transparency and access to information at a time where verifiable information is of crucial importance.


Header image by Nora via Shutterstock.