Three predictions about AI's impact on FOIA and how you can help • MuckRock

In almost every FOIA policy discussion these days — like virtually all other fields — the topic of AI comes up, particularly the impact of Large Language Models (LLMs) like ChatGPT and Google Gemini.

There are predictions ranging from a moderately positive impact to AI making FOIA obsolete to understandable panic about automation overwhelming already strained systems. This weekend, I was invited to speak at Yale’s annual Access & Accountability conference to share my thoughts on the topic. It’s one we’ve been thinking about a lot since 2015, when we implemented simple machine learning bots to detect the types of responses we were getting from a given agency.

Before implementing that simple model, every incoming response from government agencies was hand-classified by a MuckRock staff member. It was tedious when we handled 10,000 communications, but since then, we’ve helped classify well over 300,000 incoming communications — rejections, copies of records, requests for more time, and many more — with a mix of automation and manual methods. AI has been essential for helping our human staffers spend more time doing what we do best — analyzing and understanding information — while spending less time doing tedious classification.

But I also have serious concerns about the AI’s impact on the Freedom of Information Act. Some of the problems we are trying to solve are ultimately less about access and more about how and even when governments store, collect and use information.

A more detailed yet flawed understanding of what’s happening across the country

In 2009, when we were still drawing up plans for the project that would eventually become MuckRock, one of the things that drew us to public records was its universality. With few exceptions, public records help get information about your community at the local, state and federal level, no matter which state you live in or what issues you care about. One early use case we imagined was helping people file requests for city budgets from communities across the country and comparing them in an apples-to-apples way. They are all budgets; how different could they be?

It turns out, incomparably different. Some places include benefits and retirement entitlements per department, while others account for it separately. Some combine public safety under one budget, while others break them down across police, fire, and medical services. Even in New Jersey, which has done a lot of work to institute, collate and publish standardized “user-friendly” budgets, the comparable cross-tabs hide a lot of differences and nuance in how communities actually work.

In scenarios like this, AI can be a helpful assistant with visualizing or doing some kinds of analysis, but if asked to do things that the data doesn’t properly support, like giving an apples-to-apples comparison, it is often likely to project an air of confidence as those critical nuances are dropped or misinterpreted.

There are other cases where AI, carefully applied, will actually open up new avenues for understanding our communities. I’m optimistic that advances in machine learning will give new insights into policies and practices across America, turning troves of public records into tidy-looking data we can analyze, visualize and explore.

Already, a number of organizations have used MuckRock’s DocumentCloud service to host public meeting minutes and agendas from government bodies across the country, which provides an excellent testbed for this work. Through bulk analysis — ranging from simple N-Gram analysis to AI categorization — we have new ways to gauge how policies spread and ultimately impact people’s lives. Other organizations have used MuckRock’s request tools to file census-style requests collecting documents that detail how governments are balancing privacy, security and civil rights. Machine learning — aiding document classification, data extraction, and more — is already reshaping the kinds of questions researchers and journalists can ask and answer.

But just like our complex municipal budgets, I expect these approaches will often miss or mislead on important issues. While AI holds tremendous potential to understand and shape better policy, much of what makes democracy work is distinctly intangible human relations, both with each other and with the places where we live. Trying to quantify these dynamics can often lead to unintended outcomes, blindspots and harms. I recommend Shannon Mattern’s A City Is Not a Computer: Other Urban Intelligences as an excellent primer on these complexities.

Strained transparency laws are pushed even further past the breaking point

We are already seeing agencies try to leverage AI thoughtfully to help address their backlogs, with the most commonly cited example being the State Department’s use of AI to speed up declassification reviews as well as provide faster responses to FOIA requests for previously released information. I think there’s potential for these kinds of initiatives, but we are also likely to see the distance between the public’s expectations and FOIA’s reality grow even wider.

One image from a 2010 presentation on FOIA from Michael Ravnitzky and Phil Lapsley has stuck with me a decade and a half later. It depicts Google’s ubiquitous search landing page, retitled to just “FOIA.” The presenters then go on to discuss the various differences between Google (virtually instant, search engine is a computer) with FOIA (takes years, search engine is a team of humans).

The slides get a laugh because the experiences are night and day apart. But the truth is that public expectations are set by modern search engines and other technologies that offer instant and expansive results, context and options, even when a request is poorly worded or half thought out. FOIA officers are left to bridge this impossible divide, but the public understandably expects the government to find a way to keep FOIA up to the advances we have seen in almost every other realm when it comes to accessing information.

Since it hasn’t, we are already seeing the public try to leverage LLMs like ChatGPT to bridge the divide. An agency sends a baroque, legalistic denial? No worries, many requesters now just pop it into their favorite chat agent and ask for an appeal. It may or may not cite relevant or accurate arguments, but the response is instant and gives the requester a sense of agency in the process.

Deciphering appeals that cite irrelevant or non-existent case law or databases ties up already strained resources and grows backlogs even more, leading to more requesters turning towards these chat agents for any kind of leverage they can find.

AI partnerships risk further eroding what the public can learn

Over the last fifteen years, one of the most concerning trends in FOIA is the expanding use of trade secrets and other commercial-related exemptions. The Argus Leader case confirmed that the public has a vastly narrowed right to know how their tax dollars are funding grocery and convenience chains through SNAP reimbursements, but a wide array of once regularly available information is now less accessible because it’s now maintained or administered by a non-governmental third party, stored in a system or format that is considered proprietary.

As government agencies rush to embrace AI, I worry this trend will expand. Large Language Models are already a widely misunderstood black box, but technology companies — particularly those that focus on mastering government procurement — often claim that even the most basic details of how the services they operate are trade secrets that cannot be disclosed lest they suffer irreparable commercial harm.

Transparency laws do not have enough safeguards to ensure the public can understand and assess how these tools are being rolled out, even as some government personnel will see the ability to deny more requests as a benefit, not a bug. Civil society — including journalists, researchers and the public — must be vigilant to ensure that the basic data on everything from arrests and public spending we have today is available tomorrow, no matter how it processed or stored.

Join in shining a light on how AI is quietly reworking FOIA

AI and machine learning hold the potential to be useful tools for requesters and agencies, but recent history suggests that the default outcome is the public loses out on access. At MuckRock, we are working to continue to explore how these tools can be used for good — by journalists, the public and government agencies themselves — while minimizing the harm.

That starts with continuing to get a better sense of what these rollouts look like. Fortunately, that is something we can do today — and you can help. Earlier this month, we put out a request to help us review over a hundred FOIA annual reports to analyze and categorize how agencies are using AI within their FOIA processes. Thanks to the help of dozens of volunteers, we’ve gotten through over 90 percent of the reports, and you can help out by reviewing one of the reports that hasn’t been looked at yet.

We’re also exploring the positive ways we can use a range of automation and AI tools to help scale up our efforts to watchdog these kinds of program. You can join the MuckRock FOIA Slack and our Data Liberation Project to help out, or if you’re a journalist, you can try out new AI tools we’ve added to DocumentCloud or participate in our ongoing research on how newsrooms are using these kinds of tools to improvet their work.

The work being done on all these fronts will help shape what transparency, government and civic participation will look like for years to come, and those that want a government working for the people need all the help we can get.

Header image by Collagery via Shutterstock and is not licensed for reuse without prior authorization.