DataFog 4.2.0 - GLiNER Integration Release
Released: 2025-05-30
🚀 Major Features
GLiNER Integration
- Modern NER Engine: Added GLiNER (Generalist Named Entity Recognition) support
- Smart Cascading: Intelligent progression from regex → GLiNER → spaCy
- 32x Performance: GLiNER provides 32x faster NER compared to spaCy baseline
- PII-Specialized Models: Support for
urchade/gliner_multi_pii-v1
and other models
Engine Selection
from datafog.services.text_service import TextService
# New GLiNER engine
service = TextService(engine="gliner")
# Smart cascading (recommended)
service = TextService(engine="smart") # regex → GLiNER → spaCy
Performance Improvements
- 190x faster regex engine for structured PII (emails, phones, SSNs)
- Lightweight core: <2MB package with optional ML extras
- Memory optimization: Enhanced segfault handling and performance validation
🐛 Bug Fixes
- Fixed CI segmentation faults in test environments
- Resolved benchmark regression detection
- Improved dependency management for optional ML features
- Enhanced test stability across platforms
🔧 Infrastructure
- Comprehensive CI/CD improvements
- Enhanced GitHub Actions workflows
- Better error handling and diagnostics
- Sample notebooks and examples
📥 Installation
# Core package (lightweight)
pip install datafog
# With GLiNER support
pip install datafog[nlp-advanced]
# Everything included
pip install datafog[all]
📊 Performance Comparison
Engine | Speed vs spaCy | Accuracy | Use Case |
---|---|---|---|
regex |
190x faster | High (structured) | Emails, phones, SSNs |
gliner |
32x faster | Very High | Modern NER |
spacy |
1x (baseline) | Good | Traditional NLP |
smart |
60x faster | Highest | Best balance |
🔗 Links
- PyPI: https://pypi.org/project/datafog/4.2.0/
- Documentation: https://docs.datafog.ai
- GitHub: https://github.com/datafog/datafog-python