Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug (?) - evaluation returns high scores when NaN-values are returned. #220

Open
Filco306 opened this issue Sep 2, 2021 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@Filco306
Copy link

Filco306 commented Sep 2, 2021

Hello! First of all, thank you for a great package. I have started using it to benchmark some models, but I think I have found a potential issue. As the the description states, if there are NaN-values produced during the evaluation, the model in question will produce high scores, which of course can be very misleading during an evaluation.

A way to reproduce this:

  1. Train a TransE-model, e.g., with this configuration file:
job.type: train
dataset.name: wnrr

train:
  optimizer: Adagrad
  optimizer_args:
    lr: 0.2

valid:
  every: 5
  metric: mean_reciprocal_rank_filtered

model: transe
lookup_embedder:
  dim: 100
  regularize_weight: 0.8e-7
  1. Train it and arrive at a model.
  2. Change the code in transe.py on lines 22-23, from
elif combine == "_po":
            out = -torch.cdist(o_emb - p_emb, s_emb, p=self._norm)

to

elif combine == "_po":
            out = -torch.cdist(o_emb/0, s_emb, p=self._norm)

This will give scores > 0.5 for all metrics, which is problematic of course. I know this is incorrect of course; this is not what I did when I discovered it but it is a simple example that shows can happens.

I think a callback during evaluation checking that no values are nan is perhaps in its place?

Thank you!

@rgemulla
Copy link
Member

rgemulla commented Sep 2, 2021

Thanks & yes, this sounds like a good idea and should probably directly integrated into the evaluation code. Are you willing to do a PR? It may suffice to only throw an error if the score of the correct triple is NaN (which is, I guess, the reason for this problem).

@Filco306
Copy link
Author

Filco306 commented Sep 2, 2021

Yes, I will have a look at it. I think further tests should also be built, I might have a look at that if I get the time.

@rgemulla
Copy link
Member

rgemulla commented Sep 3, 2021

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants