SCAPHRA

SpaCy component for scattered phrase matching.

You have documents such as In a hole in the ground there lived a Hobbit.
You want to match patterns like in holes live hobbits
Then you need this spaCy component!

Usage

phrasemap = {'hobbits': ['in', 'holes', 'live', 'hobbits']}
nlp.add_pipe("scaphra", config=dict(phrasemap=phrasemap))
doc = nlp("In a hole in the ground there lived a Hobbit")
# now doc.spans contains a SpanGroup with the matched tokens

See scaphra/example.py for multiple, full examples.

The matcher is a single SpaCy component which matches scattered phrases both using their lemmas and stems. This is important when the text quality is bad and relying on lemmata does not suffice. Also, in some languages (such as German) phrases are often non-contiguous. For example: Matching does not start should match Does it not always start well?.

This implementation should run reasonably fast (it uses a state-machine which memoizes all partial matches such that each text only needs to be traversed once). However, the computational cost rises when many, similar patterns are applied to large texts with many matches (runtime complexity is dependent on the number of patterns).

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
scaphra		scaphra
.dir-locals.el		.dir-locals.el
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCAPHRA

Usage

About

Releases 1

Packages

Languages

License

lavis-nlp/scaphra

Folders and files

Latest commit

History

Repository files navigation

SCAPHRA

Usage

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages