Skip to content

Document Filters is an SDK for applications like content indexing, e-discovery, data migration, and feeding data into AI/ML models by extracting data from unstructured sources. It gives the ability to perform deep inspection, data extraction, output manipulation, and conversion for virtually any type of document, in any programming language.

License

Notifications You must be signed in to change notification settings

Hyland/DocumentFilters

Repository files navigation

Hyland Document Filters

Providing developers with everything needed for file inspection, extraction, and transformation, in one powerful software development kit (SDK).

Home Page | Documentation | Samples | Release Notes | Security Hub | Blog


Hyland’s Document Filters SDK equips software developers with powerful tools to embed rich document processing capabilities into their applications. It is ideal for enabling file inspection, data extraction, content manipulation, and document conversion across a wide array of document types and programming languages.

Key Features

  • Deep Content Inspection: Identify and extract data from documents, emails, archives, and container formats, analyzing all associated text and metadata.
  • Rendering and Annotation: Render high-definition content in a web-safe format with tools for redaction, annotations, and more.
  • Content Transformation: Export and convert content for use in other locations, replicate original files, and combine pages from different documents to create packets.
  • Advanced Filtering Platform: Deploy across 31 software platforms and architectures, supporting nearly any programming language and over 600 file formats.

Applications

Document Filters can be applied to:

  • Enhancing AI/ML models by structuring unstructured data
  • Content indexing
  • E-discovery
  • Data migration

Getting Started

In this repository, you'll find shared libraries and DLLs for releases since version 23.2. For SDK installers, samples, and documentation, visit the Hyland Community website.

Additional Resources

Document Filters Blog Document Filters Security Hub

In this repository, you'll find the shared libraries and DLLs for releases published since version 23.2. If you're searching for the SDK installers that include samples and documentation, kindly obtain them from the Hyland Community website.

Getting Started

Document Filters is callable from C#, Java, Python, C/C++ or any language that supports calling C APIs.

You can try Document Filters without a license key, in a feature-limited evaluation. See Document Filters Evaluation for details.

To use Document Filters without feature limitations, you will need either an evaluation license key or a full license key. You can request an evaluation license key from Hyland Software by selecting Request a free trial.

C# Python Java C++

Documentation

  • Getting Started contains details of integrating Document Filters with your language of choice.

  • API Documentation captures the details of calling Document Filters, from then low-level C API, through to object language.

  • Platforms, Formats and More as the stats on where you can run Document Filters, and what you can process with it.

Samples

This repo contains samples that demonstrating using Document Filters for different uses cases in different languages.

Task Projects
Extract text from a file The ConvertDocument sample demonstrates extracting plain text from over 600 file types.
Extract files from a container ExtractSubfile or ConvertDocuments shows how to extract sub-files from archives, containers, or other file types.
Convert a file to PDF ConvertDocumentToPDF and ConvertDocumentWithComments demonstrate rendering input files to create new PDF renditions.
Combine multiple files into one CombineDocuments sample shows how multiple documents can be combined into a single output.
Apply markup and annotations MarkupAnnotationsDemo demonstrates the markup API and modifying pages in the final output.
Create a redacted version of a document RedactionDemo demonstrates redacting content while rendering to a canvas, removing text and images from the output.

Check more samples here: C# | Java | Python | C++

License

Sample Code: MIT License | Release Binaries: Commercial License

Here you'll find a collection of sample code that is covered under the MIT License. This means that you are free to use, modify, and distribute the sample code in accordance with the terms specified in the MIT License.

However, please note that the release binaries provided in this repository are governed by a different commercial license. These binaries are intended for users who require pre-compiled and ready-to-use versions of our software. The commercial license grants specific permissions and restrictions for the use of the release binaries.

Please see the LICENSE for details.

About

Document Filters is an SDK for applications like content indexing, e-discovery, data migration, and feeding data into AI/ML models by extracting data from unstructured sources. It gives the ability to perform deep inspection, data extraction, output manipulation, and conversion for virtually any type of document, in any programming language.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •