Seralization of embeddings #3011

helpmefindaname · 2022-12-05T15:31:08Z

To be saver in regards of pickle, I propose to use a dict-format to store all properties required to recreate the embeddings (weights are stored with the model itself anyways).
This allows opening Flairmodels with incompatible parameters via torch.load(...) and therefore allows debugging version conflicts.

During development I also found & fixed the following issues:

DocumentLMEmbeddings were not providing the right names for their embeddings. So taking the correct usage of doc_lm_embedding.embedd(sentence);sentence.get_embeddings(doc_lm_embedding.get_names()) Would result into an empty tensor
Frozen FlairEmbeddings always use dropout: since the .train() method didn't call it's super method, the .eval() call in the __init__ was negated, leading to dropout staying enabled as that is the default.
Some Embeddings where not in .eval() mode after creating.
Add tests to embeddings that have no tests yet.
HashEmbeddings were returning an index error, unless each sentence had exactly one token (indexing unflattened array)
ElmoEmbeddings are deprecated as allennlp will stop support soon.
TextRegression model is now rightfully importable as from flair.models import TextRegressior

This also implements two classes AutoFlairModel and AutoFlairClassifier which can be used to to load any model, given that their type is clear.
Example usages are here:

from flair import AutoFlairClassifier
tagger = AutoFlairClassifier.load("ner-large")
tars = AutoFlairClassifier.load("tars-tagger")
tars.save("model.pt")
tars2 = AutoFlairClassifier.load("model.pt")
relation = AutoFlairClassifier.load("relation")
offensive = AutoFlairClassifier.load("de-offensive-language")
multi = MultitaskModel([offensive, tagger, relation])
multi.save("multi.pt")
multi2 = AutoFlairClassifier.load("multi.pt")
...

The difference between AutoFlairModel and AutoFlairClassifier is that AutoFlairClassifier is limited to only classifers (no text-regressor) while it provides stronger typing hints (all methods the Classifier provides extra, e.g.: predict)

Potential issues are:

current models do not contain class information. I added a simple method that tries to parse the content into the right class but that might fail. E.g. before changing the code from model = SequenceTagger.load("my-model.pt") to model = AutoFlairClassifier.load("my-model.pt") I would recommend loading it once and saving it again on the newest version.
MultitaskModel got a rework of the internal state, therefore older models cannot be loaded. I don't see this as an big issue as those were never released before. But it is something to be aware of.

helpmefindaname · 2022-12-12T17:15:36Z

…h embeddings

…Embeddings

…t the naming better and add loading of jit models

alanakbik · 2023-01-20T19:38:27Z

Hello @helpmefindaname this is really cool, thanks for creating this!

Some initial thoughts for discussion:

I wonder if AutoFlairClassifier and Classifier can/should be merged into a single class: for instance Classifier could be renamed to FlairClassifier and the auto load logic added directly here. It would make the logic less distributed and the syntax (slightly) more succinct for end-users. i.e. load any flair model with:

model = FlairClassifier.load("ner")

I also wonder if a convenience method for loading "pipelines" could be added. For instance, if users do

model = FlairClassifier.load("ner", "pos", "relations")

it would load a whole pipeline that when calling model.predict() would annotate ner, pos and relation information on a sentence.

alanakbik · 2023-01-25T12:55:16Z

flair/models/language_model.py

 from flair.data import Dictionary
 from flair.nn.recurrent import create_recurrent_layer


+@AutoFlairModel.register


Why is the LanguageModel registered as AutoFlairModel?

alanakbik · 2023-01-25T13:32:16Z

Thanks again for improving this @helpmefindaname! Regarding our discussion on whether/how to merge ModelRegisterMixin into the Model abstract base class I'll check if I can find any good way to do this.

helpmefindaname force-pushed the seralize_embeddings branch from 10d0726 to 3c5eabe Compare December 12, 2022 14:23

Benedikt Fuchs added 12 commits December 19, 2022 14:21

fix tests names

bc5525a

fix more embedding seralizations

aebf1c7

add embedding tests for Character embedding OneHot embeddings and Has…

a145a64

…h embeddings

add more embedding tests and fix muse embeddings not loading on windows

0898fc6

fix and serialize NILCEmbeddings

e8fb610

finalize token embeddings

1de7dda

implement seralization for Document CNN Embeddings and Document Pool …

42d14ab

…Embeddings

implement serialization for TFIDF Embeddings

57fc8c0

serialize sentence transformer

a51027d

fix onnx tutorial, rename "save_embedding" to "save_embeddings" to fi…

573a7d9

…t the naming better and add loading of jit models

fix textregressor loading embeddings

8ee8520

fix flair embedding can load decoder

f973806

helpmefindaname force-pushed the seralize_embeddings branch from d8b067b to f973806 Compare December 19, 2022 13:25

Benedikt Fuchs added 5 commits December 19, 2022 16:07

save onnx transformer embeddings

0bb8907

add legacy pickle loading for image embeddings

19be47f

fix has_decoder parameter

2017ab7

remove hard coded disabeling of gpu

5b480c4

add automodel for flair

1173264

helpmefindaname changed the title ~~WIP: Seralization of embeddings~~ Seralization of embeddings Dec 19, 2022

Benedikt Fuchs added 4 commits December 19, 2022 20:56

fix text regressor loading

c0e3fcf

fix typing and add missing loading of state dict

3d53912

fix auto model

5ca4527

fix loading jit and onnx embeddings

e6d057c

helpmefindaname mentioned this pull request Jan 9, 2023

Why Flair ner-english-large model is marked as unsafe in Huggingface? #3047

Closed

Merge branch 'master' into seralize_embeddings

95dadf2

alanakbik reviewed Jan 25, 2023

View reviewed changes

alanakbik approved these changes Jan 25, 2023

View reviewed changes

alanakbik merged commit dbc1569 into flairNLP:master Jan 25, 2023

alanakbik mentioned this pull request Jan 25, 2023

Refactoring of AutoModel logic #3067

Merged

helpmefindaname deleted the seralize_embeddings branch January 26, 2023 22:09

helpmefindaname mentioned this pull request Apr 28, 2023

Fix loading of (not so) old models #3229

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seralization of embeddings #3011

Seralization of embeddings #3011

helpmefindaname commented Dec 5, 2022 •

edited

Loading

helpmefindaname commented Dec 12, 2022 •

edited

Loading

alanakbik commented Jan 20, 2023

alanakbik Jan 25, 2023

alanakbik commented Jan 25, 2023

Seralization of embeddings #3011

Seralization of embeddings #3011

Conversation

helpmefindaname commented Dec 5, 2022 • edited Loading

helpmefindaname commented Dec 12, 2022 • edited Loading

alanakbik commented Jan 20, 2023

alanakbik Jan 25, 2023

Choose a reason for hiding this comment

alanakbik commented Jan 25, 2023

helpmefindaname commented Dec 5, 2022 •

edited

Loading

helpmefindaname commented Dec 12, 2022 •

edited

Loading