-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create packaged models from checkpoint #94
Conversation
How are pacakged models used? Maybe we should rename |
TrainingJob.load should offload some functionality to |
This could be done, but I am not sure if we should do it. If we want to move this to the load_from function we would also need to provide the new config file and there we unnecessary create a new model. This creates an overhead with no big advantages. |
I see. We still need load_from to work with packages (that's a key point of having packages). And we should try to minimize code duplication. Ideas welcome! |
This does not help much with the code duplication but with the designflaw having the evaluation job use the trainjob resume. trainjob.resume then looks something like this: this does not help much with the load and load_from distinction but in general would be a lot cleaner, even if there is maybe even more duplication. |
actually also the self.load(checkpoint) call could be done in job.resume |
Sounds good. But note that resuming from a checkpoint generally may need to use the config from the checkpoint, not the one from the folder (which may not even exist). When cleaning up the API, this should be considered. We'd need sth. like TrainingJob.load_from(checkpoint) and KgeModel.load_from(checkpoint) as static methods. One advantage is that the training job then does not need to create a model first, it simply calls KgeModel.load_from. Also, the methods should accept both a filename or a loaded checkpoint in their "checkpoint" argument. This way, we'd only need to load the checkpoint once. |
in that case the function resume() could maybe completly be replaced by the static method load_from, since currently we create a new model and then call resume and afterwards run() anyways. This could be repaced by job = load_from(checkpoint) and afterwards job.run(). |
Sounds like a good idea to me.
|
I built a first version to replace the method resume with load_from. This actually resulted in more changes than expected. Please have a look, if this is worth the changes. |
With these changes we could also make it possible to evaluate checkpoints without the config, I think |
I think it's worth it. Perhaps some of the more generic checkpoint-handling code can be put into a seaparate package (kge.util.io?). In general, I think we should have KgeModel.load_from_checkpoint(cp), Job.load_from_checkpoint(cp), Dataset.load_from_checkpoint(cp), and Config.load_from_checkpoint(cp). If we also had the reverse methods (save_to_checkpoint) everywhere, we could remove code clutter and code duplication. |
I tried to address most of your points. |
I think valid is a sensible default choice.
|
It is now possible to evaluate without a config by calling EntityRankingJob.create_from(checkpoint). If we later want to change the console api to enable evaluation without a config, we would have to find a way to still support additional commandline options without overwriting the checkpoint config with all the default values |
the loading of the dataset should now work as expected. In KgeModel.create_from we just always set preload_false. In that way we can create a model without the dataset. If the datasets are later on needed it will throw an IOError with that everything should be adressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good. I added a few final points, mostly ,minor.
kge/dataset.py
Outdated
return checkpoint | ||
meta_checkpoint = {} | ||
for key in meta_keys: | ||
meta_checkpoint[key] = self._map_indexes(None, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map_indexes seems to be incorrect here. Do you mean map_indexes? Also, why not just store meta["key"]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to map_indexes. If we just store meta["key"], we can not make sure, that we actually loaded this data. Could be that we didn't even read the entitiy_ids file, when calling this method.
What's the current state here? Can you rebase this off the current master? |
separated loading of checkpoint from create_from and rebased. |
- Renamed overwrite_config to config - Update device in config when loading checkpoint
I made some revisions. Let me know your thoughts. If you are fine, we can merge this PR. |
Thanks, the revisions look good. I'd say we are ready to merge |
Thanks! |
Allows to create a packaged model, which only contains the model, entity/relation ids and the config.
It is not possible to resume training on a packaged model but it can be evaluated.
example command:
kge package checkpoint_best.pt
normal checkpoints also store entity/relation ids now