Skip to content

release: prepare v4.2.0 with GLiNER integration

Latest
Compare
Choose a tag to compare
@sidmohan0 sidmohan0 released this 31 May 03:36
· 46 commits to dev since this release

DataFog 4.2.0 - GLiNER Integration Release

Released: 2025-05-30

🚀 Major Features

GLiNER Integration

  • Modern NER Engine: Added GLiNER (Generalist Named Entity Recognition) support
  • Smart Cascading: Intelligent progression from regex → GLiNER → spaCy
  • 32x Performance: GLiNER provides 32x faster NER compared to spaCy baseline
  • PII-Specialized Models: Support for urchade/gliner_multi_pii-v1 and other models

Engine Selection

from datafog.services.text_service import TextService

# New GLiNER engine
service = TextService(engine="gliner")

# Smart cascading (recommended)
service = TextService(engine="smart")  # regex → GLiNER → spaCy

Performance Improvements

  • 190x faster regex engine for structured PII (emails, phones, SSNs)
  • Lightweight core: <2MB package with optional ML extras
  • Memory optimization: Enhanced segfault handling and performance validation

🐛 Bug Fixes

  • Fixed CI segmentation faults in test environments
  • Resolved benchmark regression detection
  • Improved dependency management for optional ML features
  • Enhanced test stability across platforms

🔧 Infrastructure

  • Comprehensive CI/CD improvements
  • Enhanced GitHub Actions workflows
  • Better error handling and diagnostics
  • Sample notebooks and examples

📥 Installation

# Core package (lightweight)
pip install datafog

# With GLiNER support
pip install datafog[nlp-advanced]

# Everything included
pip install datafog[all]

📊 Performance Comparison

Engine Speed vs spaCy Accuracy Use Case
regex 190x faster High (structured) Emails, phones, SSNs
gliner 32x faster Very High Modern NER
spacy 1x (baseline) Good Traditional NLP
smart 60x faster Highest Best balance

🔗 Links