Added repetition penalty to PPLM example #2436

IWillPull · 2020-01-07T14:11:31Z

It was giving awful results, so I added repetition penalty which improved things.

codecov-io · 2020-01-07T14:18:57Z

Codecov Report

Merging #2436 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #2436   +/-   ##
=======================================
  Coverage   73.24%   73.24%           
=======================================
  Files          87       87           
  Lines       14989    14989           
=======================================
  Hits        10979    10979           
  Misses       4010     4010

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74755c8...fcfb816. Read the comment docs.

julien-c · 2020-01-07T14:19:13Z

what do you think @w4nderlust @mimosavvy?

ashirviskas · 2020-01-08T07:18:42Z

[IWillPull here, writing from a personal acc]

Do not merge yet.

I think it's best to explain in the help text that this was not in the original paper and change the default value to 1.0 so it doesn't influence anything by default.

w4nderlust

We anted to add the repetition penalty ourselves, but you were faster :) Added two minor considerations. Anyway if before you were getting awful results it's likely because of sub-optimal parameter choices, as we obtained good results without the need for the repetition penalty. Anyway, actually having it will only make things even better.
Also, I would add a comment clarifying that this parameter was not described in the paper and thus it is 1 by default, just to avoid confusion for people looking at the code.

w4nderlust · 2020-01-08T17:33:14Z

examples/pplm/run_pplm.py

@@ -508,6 +518,13 @@ def generate_text_pplm(

        pert_logits, past, pert_all_hidden = model(last, past=pert_past)
        pert_logits = pert_logits[:, -1, :] / temperature  # + SMALL_CONST
+
+        for j in set(output_so_far[0].tolist()):


would rename j to token_idx for readability

w4nderlust · 2020-01-08T17:45:48Z

examples/pplm/run_pplm.py

+
+        for j in set(output_so_far[0].tolist()):
+            if pert_logits[0, j] < 0:
+                pert_logits[0, j] *= repetition_penalty


In the definition at the end of 4.1 in https://einstein.ai/presentations/ctrl.pdf there is no distinction between logits < 0 and > 0. I understand the reason why logits < 0 should be treated separately: dividing them by a penalty score would actually increase the value and thus increase the probability after softmax is computed, but I'd like a reference for it if possible, as in the < 0 case values may become really small. It is true that we are taking an exponential later so the differences in output between even distant values in the negative side are negligible, but again, if there's a reference for this it would be great to add it as a comment in the code.

Of course. I used this idea from transformers library itself as the repetition penalty was broken with GPT-2 model.

Here is the relevant PR with discussion:
#2303

Got it, thank you for pointing it out, I replied to that thread too: #2303 (comment)

…n_idx)

IWillPull · 2020-01-08T18:31:02Z

Thank you for your time reviewing this.

May I ask, why does the code quality fail? What did I miss?

IWillPull · 2020-01-08T18:35:03Z

if before you were getting awful results it's likely because of sub-optimal parameter choices, as we obtained good results without the need for the repetition penalty.

Could you share your optimal parameters?

julien-c · 2020-01-08T18:58:39Z

May I ask, why does the code quality fail? What did I miss?

Can you run make style as indicated in the contributing guidelines?

w4nderlust · 2020-01-08T21:55:01Z

if before you were getting awful results it's likely because of sub-optimal parameter choices, as we obtained good results without the need for the repetition penalty.

Could you share your optimal parameters?

The ones we reported on the paper work in most cases, but for some BOWs others may be better because of the size of the BOW and also the specific words contained in it (if they are really common or less common), but in general the reported ones are pretty consistent.
For the discriminators,it's a bit trickier as each of them is a bit its own thing, so I would suggest to start from the reported parameters for the discriminator and play a bit around using the suggestions of what kind of impact you could expect from each parameter that we reported in the paper, until you are happy.

IWillPull · 2020-01-09T08:32:35Z

May I ask, why does the code quality fail? What did I miss?

Can you run make style as indicated in the contributing guidelines?

@julien-c Thank you. I missed reading the guidelines before doing this PR, should I do a new one with proper branching?

julien-c · 2020-01-11T04:00:33Z

LGTM, thanks!

w4nderlust · 2020-01-13T18:43:34Z

@julien-c it didn't look entirely good to me. I explained my argument, that goes beyond repetition penalty for PPLM and is a general argument about repetition penalty (so applies to CTRL too) here: #2303 (comment)

julien-c · 2020-01-13T23:37:58Z

Aarg I misunderstood your comment then @w4nderlust, I'll ask for more explicit greenlight next time!

@IWillPull can you please open a new PR to fix/improve remaining points? Thanks!

w4nderlust · 2020-01-14T05:18:55Z

No problem @julien-c ! The repetition penalty as it is implemented in this PR is fine in the sense that it works exactly like the CTRL one and that worked for people so far.
What I think is that we should have a wider conversation including you, me, Thomas, Patrick and ideally also Nitish and Bryan from Salesforce about the best way to implement it for negative values *my suggestion is in the comment I linked, but it would be cool to have consensus about it).
I will send Nitish and Bryan an email, let's see what they think about it.

IWillPull · 2020-01-14T07:57:30Z

@julien-c Sure!

I will just wait for your (@w4nderlust and others) consensus as to not to make a mess of this.

dathath · 2020-02-01T01:24:17Z

@IWillPull

if before you were getting awful results it's likely because of sub-optimal parameter choices, as we obtained good results without the need for the repetition penalty.

Could you share your optimal parameters?

The GPT-2 LM itself, and the discriminators are different from what is reported in the paper. I think you need ~1.5 times the step-size/iterations for this version of GPT-2 LM/attribute models and other parameters should work as is.

If you are using the GPT-2 LM from the paper (which corresponds to a previous version of the Huggingface GPT-2 LM) and the discriminators from the paper, the listed parameters in the Appendix work quite well. Code/models for what's in the paper --> https://github.com/uber-research/PPLM/tree/master/paper_code

Also if repetition is a huge-problem, Table S19 from the paper might be relevant. I think this be an easy to fix help with the "awful" repetitions. Also, repetitions don't seem to be an issue if you're using the discriminator -- so I think a large part of the problem lies with the simple "BoW" loss as opposed to the decoding scheme.

Added repetition penalty

52b6472

Default PPLM repetition_penalty to neutral

375455b

w4nderlust reviewed Jan 8, 2020

View reviewed changes

Minor modifications to comply with reviewer's suggestions. (j -> toke…

1f3941a

…n_idx)

Formatted code with make style

fcfb816

julien-c merged commit a308502 into huggingface:master Jan 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added repetition penalty to PPLM example #2436

Added repetition penalty to PPLM example #2436

IWillPull commented Jan 7, 2020

codecov-io commented Jan 7, 2020 •

edited

Loading

julien-c commented Jan 7, 2020

ashirviskas commented Jan 8, 2020

w4nderlust left a comment •

edited

Loading

w4nderlust Jan 8, 2020

IWillPull Jan 8, 2020

w4nderlust Jan 8, 2020

IWillPull Jan 8, 2020

w4nderlust Jan 8, 2020

IWillPull commented Jan 8, 2020

IWillPull commented Jan 8, 2020

julien-c commented Jan 8, 2020

w4nderlust commented Jan 8, 2020

IWillPull commented Jan 9, 2020

julien-c commented Jan 11, 2020

w4nderlust commented Jan 13, 2020

julien-c commented Jan 13, 2020

w4nderlust commented Jan 14, 2020

IWillPull commented Jan 14, 2020

dathath commented Feb 1, 2020 •

edited

Loading

Added repetition penalty to PPLM example #2436

Added repetition penalty to PPLM example #2436

Conversation

IWillPull commented Jan 7, 2020

codecov-io commented Jan 7, 2020 • edited Loading

Codecov Report

julien-c commented Jan 7, 2020

ashirviskas commented Jan 8, 2020

w4nderlust left a comment • edited Loading

Choose a reason for hiding this comment

w4nderlust Jan 8, 2020

Choose a reason for hiding this comment

IWillPull Jan 8, 2020

Choose a reason for hiding this comment

w4nderlust Jan 8, 2020

Choose a reason for hiding this comment

IWillPull Jan 8, 2020

Choose a reason for hiding this comment

w4nderlust Jan 8, 2020

Choose a reason for hiding this comment

IWillPull commented Jan 8, 2020

IWillPull commented Jan 8, 2020

julien-c commented Jan 8, 2020

w4nderlust commented Jan 8, 2020

IWillPull commented Jan 9, 2020

julien-c commented Jan 11, 2020

w4nderlust commented Jan 13, 2020

julien-c commented Jan 13, 2020

w4nderlust commented Jan 14, 2020

IWillPull commented Jan 14, 2020

dathath commented Feb 1, 2020 • edited Loading

codecov-io commented Jan 7, 2020 •

edited

Loading

w4nderlust left a comment •

edited

Loading

dathath commented Feb 1, 2020 •

edited

Loading