Our team has been hard at work to finish the year strong. In this Release Notes, there are a number of useful updates: DocumentCloud gets a new and improved GPT 3.5 Turbo Add-On; there’s an easier way to automate request submission; and a new DocumentCloud Python library is released.
You can help support our ongoing work by making a one-time or recurring donation.
Open Data
MuckRock’s editorial team has publicly released data and findings for our recent collaboration with the Cicero Independiente, “The Air We Breathe”.
The data includes information from the Illinois Air Emissions Inventory as well as the EPA’s Toxics Release Inventory.
The team used this data to understand air pollution in Cicero and the impact of Koppers, a century-old coal tar plant that neighbors the town. Follow all of our reporting by signing up for our newsletter.
MuckRock
MuckRock and DocumentCloud user Forest Gregg made an open-source tool, Periodic FOIA, to file periodic public records requests using the MuckRock platform. You can schedule the records requests as desired using the provided markdown templates. The tool even searches to see if a similar request has already been filed recently to avoid duplication. Forest included some example requests to get you started with the tool.
Periodic FOIA builds on another open-source library that recently saw some updates. Ben Welsh is the maintainer of the Python wrapper for the MuckRock API, python-muckrock. After recent updates, you can now use it to follow public records requests and file your own requests on MuckRock programmatically without structuring the API calls yourself.
When submitting a request using the MuckRock API, there is now a parameter where you can specify which organization you want to file the request under. By default, it will file it under your currently active organization.
MuckRock also has a new 403 page, which will make it easier to identify permissions issues versus a page not existing.
DocumentCloud
Add-Ons now have an instructions field separate from the description. This hides lengthy text from the browser menu. Add-Ons like Sidekick and GPT 3.5 Turbo Playground had lengthy descriptions. Now, instructions are in their own field and only get displayed in the Add-On dispatch view.
Before:
After:
Add-Ons
When testing a few Add-Ons, we discovered that wrapping a text field in single quotes like so ‘test this ‘ caused the Add-On to fail. This rarely encountered issue has since been fixed.
Our GPT 3 Playground Add-On was upgraded to GPT 3.5 Turbo Playground. Access to GPT 3.5 Turbo Playground requires a professional or organizational account. To upgrade your DocumentCloud account, head over to the plan selection page.
The GPT 3.5 Turbo model performs better than GPT-3 in the following ways:
-
GPT 3.5 Turbo is an order of magnitude less expensive to run per page than GPT-3, while producing better results.
-
GPT 3.5 Turbo has a 4 times larger context window than GPT-3 (16K tokens vs. 4K tokens) so GPT 3.5 Turbo can analyze up to about 30 average sized pages of text. This allows the model to see more text from your documents to produce better results on prompts.
-
The new Add-On has better error handling and rate-limiting controls for a better experience.
Our Azure Document Intelligence OCR Add-On now works on private documents.
Our Google Cloud Vision OCR received a bug fix for documents with empty pages.
python-documentcloud
Our Python wrapper for the DocumentCloud API, python-documentcloud, has a new release - 4.0.1. The new release includes:
-
Minor code styling changes for tests.
-
Pylint and black now run on the tests directory on commits and PRs in the repository so code quality control checks are now implemented on the entire repository.
-
Old references to Python 2 code (mostly print statements) in the documentation have been replaced with their Python 3 equivalents.
-
The documentation has moved back to its original theme, which has fixed a sidebar overflow issue.
MuckRock Accounts
MuckRock’s Accounts tool also has a new 403 page: