Skip to content

Commit 72743c4

Browse files
committed
release: prepare v4.2.0 with GLiNER integration
Version Updates: - Bump version to 4.2.0 in __about__.py and setup.py - Add comprehensive CHANGELOG.MD entry for v4.2.0 Major Features in v4.2.0: - GLiNER integration with 32x performance improvement over spaCy - Smart cascading engine (regex → GLiNER → spaCy) - Enhanced CLI with --engine flags for model management - New nlp-advanced extra for GLiNER dependencies - 19 new test cases with comprehensive coverage This release significantly expands DataFog's NER capabilities while maintaining the lightweight core architecture and backward compatibility.
1 parent 6bedf40 commit 72743c4

File tree

3 files changed

+86
-2
lines changed

3 files changed

+86
-2
lines changed

CHANGELOG.MD

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,89 @@
11
# ChangeLog
22

3+
## [2025-05-29]
4+
5+
### `datafog-python` [4.2.0]
6+
7+
#### Major Features
8+
9+
- **GLiNER Integration**: Added modern Named Entity Recognition engine with GLiNER (Generalist Model for NER)
10+
- New `gliner` engine option in TextService providing 32x performance improvement over spaCy
11+
- PII-specialized model support (`urchade/gliner_multi_pii-v1`) for enhanced accuracy
12+
- Custom entity type configuration for domain-specific detection
13+
- Automatic model downloading and caching functionality
14+
15+
- **Smart Cascading Engine**: Introduced intelligent multi-engine approach
16+
- New `smart` engine that progressively tries regex → GLiNER → spaCy
17+
- Configurable stopping criteria based on entity count thresholds
18+
- Optimized for best accuracy/performance balance (60x average speedup)
19+
20+
- **Enhanced CLI Model Management**: Extended command-line interface
21+
- `--engine` flag support for `download-model` and `list-models` commands
22+
- GLiNER model discovery and management capabilities
23+
- Unified model management across spaCy and GLiNER engines
24+
25+
#### Architecture Improvements
26+
27+
- **Optional Dependencies**: Added new `nlp-advanced` extra for GLiNER dependencies
28+
- `pip install datafog[nlp-advanced]` for GLiNER + PyTorch + Transformers
29+
- Maintained lightweight core architecture (<2MB)
30+
- Graceful degradation when GLiNER dependencies unavailable
31+
32+
- **Engine Ecosystem**: Expanded from 3 to 5 annotation engines
33+
- `regex`: 190x faster, structured PII detection (core only)
34+
- `gliner`: 32x faster, modern NER with custom entities
35+
- `spacy`: Traditional NLP, comprehensive entity recognition
36+
- `smart`: Cascading approach for optimal accuracy/speed
37+
- `auto`: Legacy regex→spaCy fallback
38+
39+
#### Performance & Quality
40+
41+
- **Validated Performance**: Comprehensive benchmarking across all engines
42+
- GLiNER: 32x faster than spaCy with superior NER accuracy
43+
- Smart cascading: 60x average speedup with highest accuracy scores
44+
- Regex: Maintained 190x performance advantage
45+
46+
- **Comprehensive Testing**: Added 19 new test cases for GLiNER integration
47+
- Full coverage of GLiNER annotator functionality
48+
- Graceful degradation testing for missing dependencies
49+
- Smart cascading logic validation
50+
- Cross-engine integration testing
51+
52+
#### Documentation & Developer Experience
53+
54+
- **Updated Documentation**: Comprehensive guides and examples
55+
- README performance comparison table with all 5 engines
56+
- Engine selection guidance with use case recommendations
57+
- GLiNER model management and CLI usage examples
58+
- Installation options for different dependency combinations
59+
60+
- **Developer Guide**: Streamlined development documentation
61+
- Updated architecture overview with GLiNER integration
62+
- Performance requirements and testing strategies
63+
- Common development patterns and best practices
64+
65+
#### Breaking Changes
66+
67+
- **Engine Options**: New engine types added to TextService
68+
- Existing code using `engine="auto"` continues to work unchanged
69+
- New engines `gliner` and `smart` require `[nlp-advanced]` extra
70+
71+
#### Dependencies
72+
73+
- **New Optional Dependencies** (nlp-advanced extra):
74+
- `gliner>=0.2.5`
75+
- `torch>=2.1.0,<2.7`
76+
- `transformers>=4.20.0`
77+
- `huggingface-hub>=0.16.0`
78+
79+
#### Migration Guide
80+
81+
For users upgrading from v4.1.1:
82+
- All existing functionality remains unchanged
83+
- To use GLiNER: `pip install datafog[nlp-advanced]`
84+
- Smart cascading: `TextService(engine="smart")` for best balance
85+
- CLI: Use `--engine gliner` flag for GLiNER model management
86+
387
## [2025-05-05]
488

589
### `datafog-python` [4.1.1]

datafog/__about__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "4.1.1"
1+
__version__ = "4.2.0"

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
long_description = f.read()
66

77
# Use a single source of truth for the version
8-
version = "4.1.1"
8+
version = "4.2.0"
99

1010
project_urls = {
1111
"Homepage": "https://datafog.ai",

0 commit comments

Comments
 (0)