This is a prototype for named entity recognition (a kind of token classification). In the future, I'd like to add entity linking (also known as relation extraction) to it.
Compared to a previous implementation, it features the following:
- a custom dataset loader (easy to expand to entity linking)
- a custom model (easy to add a head for entity linking)
- the ability to classify not only full words, but tokens within words
- more unit tests for a quicker development
- a better class structure
See the Dockerfile
. Upon opening the folder in VSCode, it should ask you to open it within a container.
Simply execute...
python src.train.py
to run a training which overfits on 3 data itemspytest tests
to run all unit tests