NER + RE #2726

igormis · 2022-04-12T08:47:54Z

@alanakbik are there any tutorials on how to train NER together with Relation Extraction on top of it? What I need is the input data format, the training process and the inference.

alanakbik · 2022-04-13T08:00:02Z

@dobbersc could you help @igormis get started with RE in Flair?

dobbersc · 2022-04-14T14:06:11Z

Hey @igormis,

Unfortunately, there is no tutorial for the relation extractor training process like the other models, e.g. the sequence tagger. I'm currently working on another relation extractor architecture implementation and plan to add a tutorial. For now, you can train and use the existing relation extractor as follows:

from flair.data import Sentence
from flair.datasets import RE_ENGLISH_CONLL04
from flair.embeddings import TransformerWordEmbeddings
from flair.models import RelationExtractor
from flair.trainers import ModelTrainer


def train() -> None:
    # Hyperparameters
    transformer: str = 'xlm-roberta-large'
    learning_rate: float = 5e-5
    mini_batch_size: int = 8

    # Step 1: Create the training data

    # The relation extractor is *not* trained end-to-end.
    # A corpus for training the relation extractor requires annotated entities and relations.
    corpus: RE_ENGLISH_CONLL04 = RE_ENGLISH_CONLL04()

    # Print examples
    sentence: Sentence = corpus.test[0]
    print(sentence)
    print(sentence.get_spans('ner'))  # 'ner' is the entity label type
    print(sentence.get_relations('relation'))  # 'relation' is the relation label type

    # Step 2: Make the label dictionary from the corpus
    label_dictionary = corpus.make_label_dictionary('relation')
    label_dictionary.add_item('O')
    print(label_dictionary)

    # Step 3: Initialize fine-tunable transformer embeddings
    embeddings = TransformerWordEmbeddings(
        model=transformer,
        layers='-1',
        subtoken_pooling='first',
        fine_tune=True
    )

    # Step 4: Initialize relation classifier
    model: RelationExtractor = RelationExtractor(
        embeddings=embeddings,
        label_dictionary=label_dictionary,
        label_type='relation',
        entity_label_type='ner',
        entity_pair_filters=[  # Define valid entity pair combinations, used as relation candidates
            ('Loc', 'Loc'),
            ('Peop', 'Loc'),
            ('Peop', 'Org'),
            ('Org', 'Loc'),
            ('Peop', 'Peop')
        ]
    )

    # Step 5: Initialize trainer
    trainer: ModelTrainer = ModelTrainer(model, corpus)

    # Step 7: Run fine-tuning
    trainer.fine_tune(
        'conll04',
        learning_rate=learning_rate,
        mini_batch_size=mini_batch_size,
        main_evaluation_metric=('macro avg', 'f1-score')
    )


def predict_example() -> None:
    # Step 1: Load trained relation extraction model
    model: RelationExtractor = RelationExtractor.load('conll04/final-model.pt')

    # Step 2: Create sentences with entity annotations (as these are required by the relation extraction model)
    # In production, use another sequence tagger model to tag the relevant entities.
    sentence: Sentence = Sentence('On April 14, while attending a play at the Ford Theatre in Washington, '
                                  'Lincoln was shot in the head by actor John Wilkes Booth.')
    sentence[15:16].add_label(typename='ner', value='Peop', score=1.0)  # Lincoln -> Peop
    sentence[23:26].add_label(typename='ner', value='Peop', score=1.0)  # John Wilkes Booth -> Peop

    # Step 3: Predict
    model.predict(sentence)
    print(sentence.get_relations('relation'))


if __name__ == '__main__':
    train()
    predict_example()

In this example I've used an integrated dataset. You can also load you own, e.g. in the form of a ColumnCorpus.
Flair uses the comment section of the conllu format as indicator for the relations. The format is

# relations = head_start_id;head_end_id;tail_start_id;tail_end_id;relation|...

Example:

# global.columns = id form upos ner misc
# text = Larry Page and Sergey Brin founded Google.
# relations = 7;7;1;2;founded_by|7;7;4;5;founded_by
1	Larry	PROPN	B-PER	_
2	Page	PROPN	I-PER	_
3	and	CCONJ	O	_
4	Sergey	PROPN	B-PER	_
5	Brin	PROPN	I-PER	_
6	founded	VERB	O	_
7	Google	PROPN	B-ORG	SpaceAfter=No
8	.	PUNCT	O	_

I hope that this helps. :)

igormis · 2022-04-15T19:08:19Z

Hi @dobbersc it looks clear, tnx. I have only one question:

In production, use another sequence tagger model to tag the relevant entities. - It means that i should add the infered tags using some NER model and then infer for the relations?
PLease update when you will have the new architecture with the tutorial :).
Tnank you very much

dobbersc · 2022-04-16T00:34:34Z

In production, use another sequence tagger model to tag the relevant entities. - It means that i should add the infered tags using some NER model and then infer for the relations?

That is correct. The flair RelationExtractor does not handle end-to-end relation extraction, as it requires pre-tagged entities. One easy way is to use another sequence tagger, e.g. some NER model. Afterwards, you may predict using the relation extractor.

When using a sequence tagger to predict the relation entities, be sure to use the same label type and specified labels as in the entity_label_type and entity_pair_filters of the relation extractor model.

alanakbik · 2022-05-09T03:06:25Z

Closing since question is answered (thanks @dobbersc), but feel free to reopen if there are more questions!

geheim01 · 2022-08-01T16:16:27Z

Hey @igormis,

Unfortunately, there is no tutorial for the relation extractor training process like the other models, e.g. the sequence tagger. I'm currently working on another relation extractor architecture implementation and plan to add a tutorial. For now, you can train and use the existing relation extractor as follows:

from flair.data import Sentence
from flair.datasets import RE_ENGLISH_CONLL04
from flair.embeddings import TransformerWordEmbeddings
from flair.models import RelationExtractor
from flair.trainers import ModelTrainer


def train() -> None:
    # Hyperparameters
    transformer: str = 'xlm-roberta-large'
    learning_rate: float = 5e-5
    mini_batch_size: int = 8

    # Step 1: Create the training data

    # The relation extractor is *not* trained end-to-end.
    # A corpus for training the relation extractor requires annotated entities and relations.
    corpus: RE_ENGLISH_CONLL04 = RE_ENGLISH_CONLL04()

    # Print examples
    sentence: Sentence = corpus.test[0]
    print(sentence)
    print(sentence.get_spans('ner'))  # 'ner' is the entity label type
    print(sentence.get_relations('relation'))  # 'relation' is the relation label type

    # Step 2: Make the label dictionary from the corpus
    label_dictionary = corpus.make_label_dictionary('relation')
    label_dictionary.add_item('O')
    print(label_dictionary)

    # Step 3: Initialize fine-tunable transformer embeddings
    embeddings = TransformerWordEmbeddings(
        model=transformer,
        layers='-1',
        subtoken_pooling='first',
        fine_tune=True
    )

    # Step 4: Initialize relation classifier
    model: RelationExtractor = RelationExtractor(
        embeddings=embeddings,
        label_dictionary=label_dictionary,
        label_type='relation',
        entity_label_type='ner',
        entity_pair_filters=[  # Define valid entity pair combinations, used as relation candidates
            ('Loc', 'Loc'),
            ('Peop', 'Loc'),
            ('Peop', 'Org'),
            ('Org', 'Loc'),
            ('Peop', 'Peop')
        ]
    )

    # Step 5: Initialize trainer
    trainer: ModelTrainer = ModelTrainer(model, corpus)

    # Step 7: Run fine-tuning
    trainer.fine_tune(
        'conll04',
        learning_rate=learning_rate,
        mini_batch_size=mini_batch_size,
        main_evaluation_metric=('macro avg', 'f1-score')
    )


def predict_example() -> None:
    # Step 1: Load trained relation extraction model
    model: RelationExtractor = RelationExtractor.load('conll04/final-model.pt')

    # Step 2: Create sentences with entity annotations (as these are required by the relation extraction model)
    # In production, use another sequence tagger model to tag the relevant entities.
    sentence: Sentence = Sentence('On April 14, while attending a play at the Ford Theatre in Washington, '
                                  'Lincoln was shot in the head by actor John Wilkes Booth.')
    sentence[15:16].add_label(typename='ner', value='Peop', score=1.0)  # Lincoln -> Peop
    sentence[23:26].add_label(typename='ner', value='Peop', score=1.0)  # John Wilkes Booth -> Peop

    # Step 3: Predict
    model.predict(sentence)
    print(sentence.get_relations('relation'))


if __name__ == '__main__':
    train()
    predict_example()

In this example I've used an integrated dataset. You can also load you own, e.g. in the form of a ColumnCorpus. Flair uses the comment section of the conllu format as indicator for the relations. The format is

# relations = head_start_id;head_end_id;tail_start_id;tail_end_id;relation|...

Example:

# global.columns = id form upos ner misc
# text = Larry Page and Sergey Brin founded Google.
# relations = 7;7;1;2;founded_by|7;7;4;5;founded_by
1	Larry	PROPN	B-PER	_
2	Page	PROPN	I-PER	_
3	and	CCONJ	O	_
4	Sergey	PROPN	B-PER	_
5	Brin	PROPN	I-PER	_
6	founded	VERB	O	_
7	Google	PROPN	B-ORG	SpaceAfter=No
8	.	PUNCT	O	_

I hope that this helps. :)

Hello,

How do we handle sentences without relations?
Is there a convenient solution to label the conll format for sentences without relations

Our approach looked like this
# relations =

But we received the following error:

689 # parse relations if they are set
690 if comment.startswith("# relations = "):
--> 691 relations_string = comment.strip().split("# relations = ")[1]
692 for relation in relations_string.split("|"):
693 indices = relation.split(";")

IndexError: list index out of range

dobbersc · 2022-08-03T18:31:37Z

Since the if-check is if comment.startswith("# relations = "): you could omit the # relation = comment line entirely, for sentences without relations. Note that there is a space after the =. To indicate explicitly that this sentence has no relation, you could use # relation = (no space after the =).

geheim01 · 2022-08-04T13:48:59Z

Thanks a lot!

igormis added the question Further information is requested label Apr 12, 2022

alanakbik assigned dobbersc Apr 13, 2022

alanakbik closed this as completed May 9, 2022

dobbersc mentioned this issue Jun 18, 2022

Masked Relation Classifier #2748

Merged

geheim01 mentioned this issue Aug 2, 2022

Why does the Relation Extraction with own ColumnCorpus achieves bad results? #2883

Closed

shahafp mentioned this issue Sep 27, 2022

Why does the Relation Extraction with own ColumnCorpus is not find relation label #2949

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER + RE #2726

NER + RE #2726

igormis commented Apr 12, 2022

alanakbik commented Apr 13, 2022

dobbersc commented Apr 14, 2022

igormis commented Apr 15, 2022 •

edited

Loading

dobbersc commented Apr 16, 2022

alanakbik commented May 9, 2022

geheim01 commented Aug 1, 2022

dobbersc commented Aug 3, 2022 •

edited

Loading

geheim01 commented Aug 4, 2022

NER + RE #2726

NER + RE #2726

Comments

igormis commented Apr 12, 2022

alanakbik commented Apr 13, 2022

dobbersc commented Apr 14, 2022

igormis commented Apr 15, 2022 • edited Loading

dobbersc commented Apr 16, 2022

alanakbik commented May 9, 2022

geheim01 commented Aug 1, 2022

dobbersc commented Aug 3, 2022 • edited Loading

geheim01 commented Aug 4, 2022

igormis commented Apr 15, 2022 •

edited

Loading

dobbersc commented Aug 3, 2022 •

edited

Loading