Model Card — CAPTCHA Solver

System / Product: SCOTUS email subscription CAPTCHA solver
Model Type: Out-of-the-box audio transcription
Version: v1.0.1
Date: 2025-12-09
Owner: Morgan + State Content Team
Status: [x] Production [ ] Deprecated

1. Purpose

1.1 What this model does

Transcribes the audio version of SCOTUS email subscription CAPTCHAs to allow us to know the instant a docket is updated.

1.2 Intended users

This model is intended solely for CourtListener backend use in the SCOTUS email subscription task.

1.3 Scope and limitations

This model has only been verified to work for SCOTUS audio CAPTCHAs which consist of alphanumeric characters read out using the NATO phonetic pronounciation.

2. Base Model

Field	Details
Base model name	`whisper-1`
Version / checkpoint	1
Provider	OpenAI
License	OpenAI license
Link to production model location	https://developers.openai.com/api/docs/models/whisper-1

2.1 Model Modifications

Continued pretraining — [Describe the pretraining data, volume, and objective]
Finetuning — [Describe the finetuning approach (e.g., full finetuning, LoRA, RLHF) and link to training data in Section 3.1]
Prompting only (generative models) — [No weights modified; document prompt design in Section 3.3]
Other — [Describe any other modifications, e.g., quantization, distillation, etc]

3. Data

3.1 Pretraining Dataset (if applicable — continued pretraining only)

N/A

3.2 Finetuning Dataset (if applicable — finetuned models only)

N/A

3.3 Prompt Design and Versioning (if applicable — generative models only)

No additional prompting beyond the audio input.

3.4 Validation and Test Dataset

Model testing and comparison was performed using 100 CAPTCHA images and 100 CAPTCHA audio files fetched directly from the SCOTUS subscription page.

For the 2026-04-28 update, testing was performed using 200 audio samples fetched directly from the SCOTUS CAPTCHA page. See CL#7266 for a detailed analysis.

3.5 Label Documentation (if applicable — traditional ML and finetuned models only)

N/A

4. Training & Evaluation

4.1 Design Decisions and Rationale

See CL#6928 for details.

See CL#7266 for 2026-04-28 update analysis.

4.2 Metrics and Evaluation

See CL#6928 for details.

See CL#7266 for 2026-04-28 update analysis.

4.3 Failure Analysis

The model was not tested on CAPTCHAs other than those required for SCOTUS email subscription as of November 2025, and may fail if used in other scenarios or if SCOTUS subscription CAPTCHAs change.

It was also discovered that sometimes the transcription models would output the incorrect number of characters, which necessitated the 2026-04-28 update. This update involved changing the model from gpt-4o-transcribe to whisper-1 and passing a language="en" parameter to the API call, which combined lowered the rate of invalid transcriptions (anything other than five characters) in testing from 16% to 4%. In tandem with retry logic, this should lead to a less than 0.1% rate of generating invalid solution candidates.

5. Known Limitations

Specificity: The model was chosen to solve exactly one kind of CAPTCHA on exactly one site. Any changes by SCOTUS to the CAPTCHA or by Free Law to the context it's deployed in have the potential to break it.
Hallucinations Sometimes, the transcription model will generate incorrect output. Logic is in place to generate a new audio CAPTCHA and retry transcription if the incorrect number of characters are generated, but any other errors cannot be detected until the solution candidate is submitted to the CAPTCHA validation endpoint.

6. Deployment and Monitoring

6.1 Deployment Setup and Data Dependencies

The model is deployed in the CourtListener repository and used in the SCOTUS email subscription task.

6.2 Monitoring Plan

Logging is in place to monitor successful and failed solution attempts. Failed attempts will be surfaced in Sentry.

6.3 Re-evaluation Criteria and Process

In the event that the model begins consistently failing to solve CAPTCHAs, it will be deactivated and the evaluation process will be repeated to determine a new approach.

7. Version History

Version	Date	Author	Summary of changes
v1.0	2026-04-08	Morgan Bennet	Initial release
v.1.0.1	2026-04-28	Morgan Bennet	Update to improve model performance

8. Contacts

Role	Name	Contact
Model owner	Morgan Bennet	morgan@free.law
AI team contact	Rachel Gao	rachel@free.law
Legal / compliance (if applicable)

SCOTUS CAPTCHA Solver