Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create packaged models from checkpoint #94

Merged
merged 37 commits into from
May 26, 2020
Merged

Conversation

AdrianKs
Copy link
Collaborator

Allows to create a packaged model, which only contains the model, entity/relation ids and the config.
It is not possible to resume training on a packaged model but it can be evaluated.

example command:
kge package checkpoint_best.pt

normal checkpoints also store entity/relation ids now

kge/job/train.py Outdated Show resolved Hide resolved
kge/job/train.py Outdated Show resolved Hide resolved
kge/job/train.py Outdated Show resolved Hide resolved
@rgemulla
Copy link
Member

rgemulla commented Apr 15, 2020

How are pacakged models used? Maybe we should rename KgeModel.load_from_checkpoint to KgeModel.load_from and support checkpoints and packages (including creation of required "virtual" datasets) there.

@rgemulla
Copy link
Member

TrainingJob.load should offload some functionality to KgeModel.load_from. And I guess KgeModel.load_from should accept both filenames and loaded files (to facilitate this).

@AdrianKs
Copy link
Collaborator Author

TrainingJob.load should offload some functionality to KgeModel.load_from. And I guess KgeModel.load_from should accept both filenames and loaded files (to facilitate this).

This could be done, but I am not sure if we should do it.
KgeModel.load_from creates a new model based on the config file provided in the checkpoint.
At the time we call TrainingJob.load we already have a model created which we just update with the parameters in the checkpoint.

If we want to move this to the load_from function we would also need to provide the new config file and there we unnecessary create a new model.
We would only offload the functionality of the model loading to load_from

This creates an overhead with no big advantages.

@rgemulla
Copy link
Member

I see. We still need load_from to work with packages (that's a key point of having packages). And we should try to minimize code duplication. Ideas welcome!

@AdrianKs
Copy link
Collaborator Author

AdrianKs commented Apr 16, 2020

This does not help much with the code duplication but with the designflaw having the evaluation job use the trainjob resume.
lets move most of the content from trainjob.resume to the new method job.resume.
In job.resume we check which checkpoint we need to load and load it.

trainjob.resume then looks something like this:
def resume(self, checkpoint_file) \ checkpoint = super().resume(checkpoint_file) \ self.load(checkpoint)
evaljob.resume then just looks the same
but the load functions can be different

this does not help much with the load and load_from distinction but in general would be a lot cleaner, even if there is maybe even more duplication.
but the train job types are more independent.

@AdrianKs
Copy link
Collaborator Author

actually also the self.load(checkpoint) call could be done in job.resume
then we only have a job specific load function.

@rgemulla
Copy link
Member

Sounds good. But note that resuming from a checkpoint generally may need to use the config from the checkpoint, not the one from the folder (which may not even exist). When cleaning up the API, this should be considered. We'd need sth. like TrainingJob.load_from(checkpoint) and KgeModel.load_from(checkpoint) as static methods. One advantage is that the training job then does not need to create a model first, it simply calls KgeModel.load_from. Also, the methods should accept both a filename or a loaded checkpoint in their "checkpoint" argument. This way, we'd only need to load the checkpoint once.

@AdrianKs
Copy link
Collaborator Author

AdrianKs commented Apr 16, 2020

in that case the function resume() could maybe completly be replaced by the static method load_from, since currently we create a new model and then call resume and afterwards run() anyways. This could be repaced by job = load_from(checkpoint) and afterwards job.run().

@rgemulla
Copy link
Member

rgemulla commented Apr 16, 2020 via email

@AdrianKs
Copy link
Collaborator Author

I built a first version to replace the method resume with load_from. This actually resulted in more changes than expected. Please have a look, if this is worth the changes.

@AdrianKs
Copy link
Collaborator Author

With these changes we could also make it possible to evaluate checkpoints without the config, I think

kge/cli.py Outdated Show resolved Hide resolved
kge/job/eval.py Outdated Show resolved Hide resolved
kge/job/eval.py Outdated Show resolved Hide resolved
kge/job/eval.py Outdated Show resolved Hide resolved
kge/job/grid_search.py Outdated Show resolved Hide resolved
kge/job/job_factory.py Outdated Show resolved Hide resolved
kge/job/search.py Outdated Show resolved Hide resolved
kge/job/train.py Outdated Show resolved Hide resolved
kge/job/train.py Show resolved Hide resolved
kge/model/kge_model.py Outdated Show resolved Hide resolved
@rgemulla
Copy link
Member

I think it's worth it. Perhaps some of the more generic checkpoint-handling code can be put into a seaparate package (kge.util.io?).

In general, I think we should have KgeModel.load_from_checkpoint(cp), Job.load_from_checkpoint(cp), Dataset.load_from_checkpoint(cp), and Config.load_from_checkpoint(cp). If we also had the reverse methods (save_to_checkpoint) everywhere, we could remove code clutter and code duplication.

@AdrianKs
Copy link
Collaborator Author

I tried to address most of your points.

kge/config.py Outdated Show resolved Hide resolved
kge/cli.py Outdated Show resolved Hide resolved
kge/cli.py Show resolved Hide resolved
kge/dataset.py Outdated Show resolved Hide resolved
kge/dataset.py Outdated Show resolved Hide resolved
kge/model/kge_model.py Outdated Show resolved Hide resolved
kge/model/kge_model.py Outdated Show resolved Hide resolved
kge/util/package.py Outdated Show resolved Hide resolved
kge/util/package.py Show resolved Hide resolved
kge/util/package.py Outdated Show resolved Hide resolved
@rgemulla
Copy link
Member

rgemulla commented Apr 23, 2020 via email

@AdrianKs
Copy link
Collaborator Author

It is now possible to evaluate without a config by calling EntityRankingJob.create_from(checkpoint).

If we later want to change the console api to enable evaluation without a config, we would have to find a way to still support additional commandline options without overwriting the checkpoint config with all the default values

@AdrianKs
Copy link
Collaborator Author

the loading of the dataset should now work as expected. In KgeModel.create_from we just always set preload_false. In that way we can create a model without the dataset. If the datasets are later on needed it will throw an IOError

with that everything should be adressed

Copy link
Member

@rgemulla rgemulla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. I added a few final points, mostly ,minor.

kge/cli.py Outdated Show resolved Hide resolved
kge/dataset.py Outdated Show resolved Hide resolved
kge/dataset.py Outdated
return checkpoint
meta_checkpoint = {}
for key in meta_keys:
meta_checkpoint[key] = self._map_indexes(None, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map_indexes seems to be incorrect here. Do you mean map_indexes? Also, why not just store meta["key"]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to map_indexes. If we just store meta["key"], we can not make sure, that we actually loaded this data. Could be that we didn't even read the entitiy_ids file, when calling this method.

kge/dataset.py Outdated Show resolved Hide resolved
kge/job/eval.py Outdated Show resolved Hide resolved
kge/job/job.py Outdated Show resolved Hide resolved
kge/job/job.py Outdated Show resolved Hide resolved
kge/job/job.py Outdated Show resolved Hide resolved
kge/model/kge_model.py Outdated Show resolved Hide resolved
kge/model/kge_model.py Outdated Show resolved Hide resolved
@rgemulla
Copy link
Member

What's the current state here? Can you rebase this off the current master?

@AdrianKs
Copy link
Collaborator Author

separated loading of checkpoint from create_from and rebased.
Should be done now.

@rgemulla
Copy link
Member

I made some revisions. Let me know your thoughts. If you are fine, we can merge this PR.

@AdrianKs
Copy link
Collaborator Author

Thanks, the revisions look good. I'd say we are ready to merge

@rgemulla rgemulla merged commit fde4565 into uma-pi1:master May 26, 2020
@rgemulla
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants