Foundation AI helps Stockwell Harris streamline its document intake using Extract Filer

Foundation AI is an Artificial Intelligence Solutions Provider. We help organizations process, manage, and leverage their unstructured data to automate labor-intensive tasks, make better data-driven decisions, and drive real business value.

Stockwell, Harris, Woolverton, & Helphrey is a leading California Workers’ Compensation defense firm. Since their inception in 1957, they have devoted their practice to meet the legal needs of the California risk management industry. Their clients include self-insured employers, insurance companies, and third-party administrators.


Goals

  • To automatically ingest document packages directly from Stockwell's batch scanner and fax system and split them into their separate documents.
  • To determine which file in their Content Management System (CMS) each document belongs to (using AI to replicate the same lookup process that Stockwell's clerks used to perform manually).
  • To extract the requisite metadata (e.g., dates and names) and enter it into their CMS.
  • To index the document directly into the correct file folder (matter) in their CMS.

Approach

  • Uses OCR/Object Detection model to convert the PDF into a readable format.
  • Employs Boosting Tree-based Classification models and CNNs to separate the consolidated file into its constituent documents and classify each document by type.
  • Applies multiple NLP models, including BERT and graph-based convolutional networks (GCNs), to extract relevant information from each document depending on its type.
  • Fuzzy matches each document based on the extracted information to the correct file (matter) in their CMS.

Results

  • Decreased Lead Time: 50%
  • Increased Ingestion Speed: 80%
  • Increased Accuracy: up to 98%
  • Cost Savings: Over $400,000 / year
  • Operations staff can now work remotely

Background

Stockwell Harris went paperless years ago and provided attorneys as well as paralegals with remote access to its electronic case management software (CMS). However, going paperless brought on a barrage of new operational activities to sort, separate, and index the 10,000 pages/day of inbound mail and fax documents into the CMS.

Most incoming documents are unstructured and rarely indicate the firm’s internal file number, so the process started with manually querying the CMS to find the correct file (using whatever descriptive information staff could locate on the document like names, case numbers, and insurance claim IDs). After locating the correct file, staff manually scanned the document one page at a time, uploaded it, and keyed in the title and other requisite information into the CMS. To keep up with this manual process, the firm was forced to expand the department through a staffing agency.

Challenge

Despite this additional staffing, many files were still being indexed inconsistently and incorrectly, and due to the manual, repetitive, human-driven process, lead time was still several days to get incoming documents into their system.

Stockwell Harris approached Foundation AI to integrate ExtractWC (the Workers’ Compensation version of Extract Filer) into its technical infrastructure.

Foundation AI configured ExtractWC to:

  • Automatically ingest document packages directly from Stockwell's batch scanner and fax system and split them into their separate documents

  • Determine which file in the CMS each document belongs to (using AI to replicate the same lookup process that Stockwell's clerks used to perform manually)

  • Extract the requisite metadata (e.g., dates and names) and enter it into Stockwell Harris’ CMS

  • Index the document directly into the correct file folder in the CMS

Solution

Data Used

Stockwell Harris’ data consists of over 20 different document types, both structured and unstructured. These document types include Orders, Subpoenas, Emails, Correspondence, AME and PQME documents. As Workers’ Compensation defense touches healthcare and insurance verticals in addition, our system needs to be able to process documents from all three of these domains. Out of the box, ExtractWC is trained to recognize and process every document type that Workers’ Compensation firms encounter.

Methodology

When Stockwell Harris receives mail they unpack the mail, batch it, and load it into the scanner. The scanner places the scanned files into a directory. That directory is automatically synced into the ExtractWC application. New files that are received via email can be dropped into this folder or directly uploaded to the application

As the first step of the processing, ExtractWC splits each file (PDFs that contain more than one document) by page. It converts each page into a high-resolution image using image preprocessing techniques including image denoising and binarization. This preprocessing step ensures that all text is properly recognized by the OCR engine even if it is obscured by a watermark or overlapping image. Once the images are preprocessed, they are run through multiple OCR models to extract text from the image. ExtractWC then performs text processing on the OCRed text to identify the constituent documents in the file. It uses Boosting Tree-based Classification models and Computer Vision techniques that utilize Convolutional Neural Networks (CNNs) to identify the starting page of each document inside the consolidated file. Now that the system has divided the file into its constituent documents, the system then performs Document Classification on each individual document using Boosting Tree algorithms, based on the document's contents. Multiple NLP models, including BERT and graph-based convolutional networks (GCNs), are then applied to understand each document's content and extract relevant information. The extracted information, like adjudication number and claim number, is used to fuzzy match the document to the correct Case Name and Matter ID in Stockwell Harris' downstream CMS.

Users have the option to update or correct the data extracted, to ensure that each document is being filed into the correct matter. These changes are then used as a feedback loop to improve our models’ classification and extraction accuracy.

Once the user has confirmed that the document has been split correctly and that the information extracted from the document is correct, Extract WC automatically renames each document based on the document type and information extracted. For example, for a case belonging to John Doe, an AME document provided by Dr.Jane in December 2020 would be renamed as JohnDoe_AME_Dr.Jane_202012.PDF. The system then automatically indexes each file into Stockwell Harris’ Case Management system based on each document’s type and extracted data. What used to be a manual process involving multiple applications is now done in a singular tool and is largely automated.

Results

Within two weeks of deploying Extract, we were processing documents twice as fast and saving $8000 per week.

George Woolverton,
Managing Partner

ExtractWC increases the speed and accuracy of document processing and information archiving, enabling Stockwell Harris to save time and money:

  • Document Volume: 10,000 pages a day

  • Decreased Lead Time: 50%

  • Increased Ingestion Speed: 80%

  • Increased Accuracy: up to 98%

  • Cost Savings: Over $400,000 / year

The best part is that I know exactly how many documents are getting processed every day, and I can monitor the staff no matter where we are.

Rosanna Renteria,
Office Manager

Because of the COVID-19 pandemic, Stockwell Harris was forced to transition to remote operations with very little notice. ExtractWC enabled its operations staff to seamlessly transition to a remote working environment. So long as staff members have access to a scanning device, they can perform all necessary actions remotely through Extract’s secure web-based interface.

Artificial Intelligence for the Real World
© 2021 Foundation AI