A magnifying glass highlights blue dollar bill signs in a sea of gray dollar bill signs, to invoke the idea of searching for hidden expenses

Add-On case study: Uncovering legal expenses in local reporting

Explore how Lisa Rowan of Cardinal News utilized DocumentCloud’s powerful extraction tools to reveal the financial implications of a school board’s lawsuit.

Local reporters often find financial documents in public portals or in response to public records requests. The data inside can point to important stories to tell, but manually reviewing a bundle of documents that span days, weeks, months or even years to analyze spending behavior can be tedious. Tools like DocumentCloud’s Regex Extractor and Azure Table Extractor Add-Ons can help streamline this task, giving reporters more time to spend answering the key questions of their investigations.

Read ahead to see how Lisa Rowan of Cardinal News used these tools to track spending on legal fees after a local school board sued a parent—and how you can use these tools to boost your reporting, too.


I’m a reporter covering K-12 and higher education for Cardinal News, a website founded in 2021 that focuses on Southwest and Southside Virginia. In my 18 months at Cardinal, I’ve seen a lot of drama and heated discussion at local school board meetings. But only once have I heard about a school board suing a parent.

This unusual legal action prompted me to explore its financial implications.

Typically, when lawsuits involve school boards, it’s the other way around: a case gets filed by a disgruntled former employee or frustrated parent, putting the school board on the defensive. But in this instance, the school board in rural Bedford County, Virginia, was the plaintiff. To complicate matters, there was confusion about whether all the school board members were aware that a suit had been filed against a parent for allegedly harassing staff at his son’s school.

Some locals commenting in Facebook groups bristled at the idea of their tax dollars being used to pay for a lawsuit against a parent, regardless of whether the allegations were true. Talking about that aspect of this already-odd case led my editor and I to wonder what it was costing the school board to pursue this suit.

After reviewing more than a year’s worth of weekly financial statements, I was able to determine that the school division spent about $30,000 on the suit. The school board suddenly announced during my reporting that it intended to dismiss.

Going through dozens of bills to find legal fees was no easy task. Here’s how it worked for me – and how DocumentCloud can help you complete a similar project.


I had come across the large amounts the school division was routinely paying to its attorneys by chance: I was reviewing a school board agenda on the BoardDocs platform and realized that each month, the board approved weekly lists of bills due to various vendors. Among them were regular payments to the law firm that was representing the board in its lawsuit against the parent.

I had two main questions:

  1. How much was the school division spending on legal bills for this particular situation?

  2. Was the division spending more on legal bills this year vs. last year?

I downloaded 74 billing statements the board had approved, but I wasn’t sure which ones listed payments due to the law firm. Going through them one by one sounded like a nightmare. I knew there had to be a better way to weed out the irrelevant bills.

I uploaded the statements to a DocumentCloud project. Then, I ran the Regex Extractor Add-On on this project to be able to filter which documents mentioned the law firm by name. A regular expression, or Regex, is a way to search for specific patterns in text: In this case, the law firm’s name is capitalized in the text layer of the document as SANDS ANDERSON PC, so that’s what I provided to get exact results. I could have formed a regular expression that would catch matches regardless of case sensitivity (for example it would catch Sands Anderson PC too) but this wasn’t necessary for my case, so I kept it simple. If you’d like to read a brief introduction to regular expressions, check out this Regex lesson presented at NICAR.

The Regex Extractor Add-On narrowed down the collection from 74 documents to 30 that mentioned the law firm. It added a tag “SANDS ANDERSON” to each document that contained the matching string, which made it easy for me to filter out which documents didn’t mention the law firm by name.

Knowing which documents to look at was helpful, but going through each document manually and compiling a spreadsheet of the payments would still take hours. I needed a way to extract all the payments as tabular data to be able to easily calculate total payments made to the law firm.

That’s where the Azure Table Extractor came in. It is designed to extract structured data from PDFs into a more manageable file format like CSV, which you can open with your favorite spreadsheet program to filter and do calculations.

The Add-On allows you to avoid having to copy and paste these tables manually, and for documents with handwritten text in tables or those with otherwise poor OCR, the table extraction is also more accurate than copy and pasting.

When the Add-On completed, I opened the zip file that it prompts you to download. It included 30 individual CSVs, one per document, which I merged into a single file for easier sorting. From there, I was able to pick out the law firm payments and start making sense of them.


With the information gleaned from the bill lists, I was able to submit a FOIA request that had a narrow scope and timeline in order to obtain the actual invoices from the law firm. Some of those invoices returned to us mentioned the defendant by name, allowing me to confirm what was spent specifically on that case.

The combination of the bill lists in CSV format and the FOIA request allowed us to clearly state the financial impact – so far – this case has had on the school division’s legal costs. I’m keeping up with the case and the board’s activity by monitoring the bill lists and planning ahead to request additional invoices related to the case every few months.

As I worked on this project, a few takeaways emerged that could benefit other journalists facing similar challenges:

  • If you’re a journalist with a technical challenge, chances are that others have run into similar roadblocks. Engaging with the community on the News Nerdery Slack helped me connect with the DocumentCloud team for insights.

  • The billing statements I downloaded already had good OCR applied, which allowed the Regex Extractor to sort them accurately. If your document set includes handwritten materials or isn’t as clean, consider using a more robust OCR engine before running the Regex Extractor to improve accuracy.

  • Using the Azure Table Extractor saved me a lot of time getting the financial data into a more manageable data format.

  • There wasn’t any “silver bullet” that came from this process, no single revelation that changed the trajectory of my reporting. In my case, while the payments to the law firm increased slightly over time, that increase couldn’t be directly connected to the $30,000 paid to the law firm for this suit against a parent.

While I don’t anticipate many situations exactly like this on my beat, I now know how to take large sets of bills or budget line items and put them into a format that’s easier to examine, especially if I want to look at an institution’s expenses for a category or vendor over a span of time.


While Lisa Rowan’s experience focused on tracking legal expenses for a school board, the potential applications of these data extraction tools are vast. For example, a team of reporters investigating police misconduct recently leveraged the Regex Extractor Add-On to extract data from police misconduct reports, including the names of officers, the officer’s troop and the date of misconduct. The reporters then used this data to sort and filter the documents on DocumentCloud for easier document review. Stay tuned for a deeper dive into how this team of reporters used this tool and other user stories about DocumentCloud Add-Ons.