Bug (?) - evaluation returns high scores when NaN-values are returned. #220

Filco306 · 2021-09-02T08:11:22Z

Hello! First of all, thank you for a great package. I have started using it to benchmark some models, but I think I have found a potential issue. As the the description states, if there are NaN-values produced during the evaluation, the model in question will produce high scores, which of course can be very misleading during an evaluation.

A way to reproduce this:

Train a TransE-model, e.g., with this configuration file:

job.type: train
dataset.name: wnrr

train:
  optimizer: Adagrad
  optimizer_args:
    lr: 0.2

valid:
  every: 5
  metric: mean_reciprocal_rank_filtered

model: transe
lookup_embedder:
  dim: 100
  regularize_weight: 0.8e-7

Train it and arrive at a model.
Change the code in transe.py on lines 22-23, from

elif combine == "_po":
            out = -torch.cdist(o_emb - p_emb, s_emb, p=self._norm)

to

elif combine == "_po":
            out = -torch.cdist(o_emb/0, s_emb, p=self._norm)

This will give scores > 0.5 for all metrics, which is problematic of course. I know this is incorrect of course; this is not what I did when I discovered it but it is a simple example that shows can happens.

I think a callback during evaluation checking that no values are nan is perhaps in its place?

Thank you!

The text was updated successfully, but these errors were encountered:

rgemulla · 2021-09-02T09:45:47Z

Thanks & yes, this sounds like a good idea and should probably directly integrated into the evaluation code. Are you willing to do a PR? It may suffice to only throw an error if the score of the correct triple is NaN (which is, I guess, the reason for this problem).

Filco306 · 2021-09-02T17:34:20Z

Yes, I will have a look at it. I think further tests should also be built, I might have a look at that if I get the time.

rgemulla · 2021-09-03T09:20:53Z

Great, thanks!

rgemulla added the enhancement New feature or request label Sep 3, 2021

Filco306 mentioned this issue Sep 26, 2021

Added a checking to see whether any values are NaN in the scoring. #229

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug (?) - evaluation returns high scores when NaN-values are returned. #220

Bug (?) - evaluation returns high scores when NaN-values are returned. #220

Filco306 commented Sep 2, 2021 •

edited

Loading

rgemulla commented Sep 2, 2021

Filco306 commented Sep 2, 2021

rgemulla commented Sep 3, 2021

Bug (?) - evaluation returns high scores when NaN-values are returned. #220

Bug (?) - evaluation returns high scores when NaN-values are returned. #220

Comments

Filco306 commented Sep 2, 2021 • edited Loading

rgemulla commented Sep 2, 2021

Filco306 commented Sep 2, 2021

rgemulla commented Sep 3, 2021

Filco306 commented Sep 2, 2021 •

edited

Loading