This repository is paired with H2O Document AI for sharing custom post-processing scripts, custom pipeline YAML configurations, and other common recipes.
Custom pipeline YAML configurations are used to define the processing steps for DocAI scoring pipelines. These configuration files make it easy to customize the behavior of the pipeline by adjusting the settings in the YAML file. See example yaml files at pipeline_config
folder.
Customize Document AI pipeline output using Python post-processing scripts. These scripts refine model results to generate the final output.
The post_processor
folder contains various sample scripts to aid users in crafting custom post-processing scripts.
Different DocAI versions offer varying post-processing capabilities. To accommodate this:
- Browse the relevant folder based on your DocAI version. Versions later then v0.7 can use the post-processors for v0.7.
- Select the appropriate post-processing script for your needs.
- Use the script directly or modify it according to your needs.
The notes
folder contains various sample notebooks that can guide users in tuning their own custom post-processing scripts. Specifically, the notebooks show how to tune various post-processing parameters, such as how tokens are grouped together, how line items are extracted, how template methods are used, and other common post-processing tasks.
Various scripts found in scripts
to ease burden of testing frontend functionalities, including pipeline benchmarking, pipeline deletion, and deletion of document sets.