- FastAPI backend for serving GLiNER models (NER).
- Gradio frontend (optional) for interactive use.
- Prometheus metrics endpoint (
/metrics
). - Configurable via YAML, CLI, or environment variables.
- Docker and Docker Compose support.
- ONNX inference support (including quantized models).
- API key authentication (optional).
- Custom metrics port and enable/disable option for Prometheus metrics.
For detailed documentation, see DeepWiki (
You can try the live demo of the GLiNER API container in it's Huggingface Space: GLiNER API Demo.
It uses a minimally changed image to make it work in the Huggingface Space environment.
You can either build the container yourself or use a prebuilt image from GitHub Container Registry.
docker run \
-p 8080:8080 \
-p 9090:9090 \
-v $(pwd)/config.yaml:/app/config.yaml \
-v $HOME/.cache/huggingface:/app/huggingface \
ghcr.io/freinold/gliner-api:latest
-v $(pwd)/config.yaml:/app/config.yaml
mounts your config file (edit as needed)-v $HOME/.cache/huggingface:/app/huggingface
mounts your Huggingface cache for faster model loading
docker build \
-f cpu.Dockerfile \
--build-arg IMAGE_CREATED="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--build-arg IMAGE_REVISION="$(git rev-parse HEAD)" \
--build-arg IMAGE_VERSION="$(git describe --tags --always)" \
-t gliner-api .
docker run --rm \
-p 8080:8080 \
-p 9090:9090 \
-v $(pwd)/config.yaml:/app/config.yaml \
-v $HOME/.cache/huggingface:/app/huggingface \
gliner-api
Edit compose.yaml
to select the config you want (see example_configs/
).
Then start:
docker compose up --build
Be sure to check the installation instructions first.
uv run main.py [OPTIONS]
Or with FastAPI CLI:
fastapi run main.py --host localhost
uv run main.py --help
Option | Description | Default |
---|---|---|
--use-case / --name |
Use case for the GLiNER model (application/domain) | general |
--model-id |
Huggingface model ID (browse models) | knowledgator/gliner-x-base |
--onnx-enabled |
Use ONNX for inference | False |
--onnx-model-path |
Path to ONNX model file | model.onnx |
--default-entities |
Default entities to detect | ['person', 'organization', 'location', 'date'] |
--default-threshold |
Default detection threshold | 0.5 |
--api-key |
API key for authentication (if set, required in requests) | null |
--host |
Host address | 0.0.0.0 |
--port |
Port | 8080 |
--metrics-enabled |
Enable Prometheus metrics endpoint | True |
--metrics-port |
Port for Prometheus metrics endpoint | 9090 |
--frontend-enabled |
Enable Gradio frontend | True |
Description | Path | Demo Link |
---|---|---|
Gradio Frontend (if enabled) | / |
Frontend |
API Docs (Swagger) | /docs |
Swagger UI |
API Docs (ReDoc) | /redoc |
ReDoc |
Prometheus Metrics | /metrics |
(no public demo link; available on metrics port if enabled) |
curl -X POST "http://localhost:8080/api/invoke" -H "Content-Type: application/json" -d '{"text": "Steve Jobs founded Apple in Cupertino."}'
Prerequisites:
- Python 3.12.11
- uv (for dependency management)
Install dependencies:
# CPU version
uv sync --extra cpu [--extra frontend]
# GPU version
uv sync --extra gpu [--extra frontend]
The frontend is optional, but encouraged for interactive use.
Install from source:
git clone https://github.com/freinold/gliner-api.git
cd gliner-api
uv sync --extra cpu # or --extra gpu
You can configure the app via:
config.yaml
(default, seeexample_configs/
)- CLI options (see above)
- Environment variables (prefix:
GLINER_API_
)
Example configs:
example_configs/general.yaml
(default NER)example_configs/pii.yaml
(PII detection)example_configs/medical.yaml
(medical NER)example_configs/general_onnx.yaml
(ONNX inference)example_configs/general_onnx_quantized.yaml
(quantized ONNX)
- FastAPI (API backend)
- Gradio (optional frontend)
- Uvicorn (ASGI server)
- Prometheus Client (metrics)
- Huggingface Hub (model loading)
- PyTorch (CPU/GPU inference)
- ONNX (optional, for ONNX models)
- uv (dependency management)
See LICENSE.