Does llama only use decoders? Why don't you use a more efficient method? #37

gyunggyung · 2023-03-01T13:00:28Z

Thanks for sharing this really good material. I have a lot of questions.

First, I'd like to say that I hope you ignore much of the mockery. Everyone, including me, is a bunch of people who do crappy work and scream at their keyboards compared to you.

The model seems to only use decoders, why?

# https://github.com/facebookresearch/llama/blob/main/llama/model.py#L223
    def forward(self, tokens: torch.Tensor, start_pos: int):
        _bsz, seqlen = tokens.shape
        h = self.tok_embeddings(tokens)
        self.freqs_cis = self.freqs_cis.to(h.device)
        freqs_cis = self.freqs_cis[start_pos : start_pos + seqlen]

        mask = None
        if seqlen > 1:
            mask = torch.full((1, 1, seqlen, seqlen), float("-inf"), device=tokens.device)
            mask = torch.triu(mask, diagonal=start_pos + 1).type_as(h)

        for layer in self.layers:
            h = layer(h, start_pos, freqs_cis, mask)
        h = self.norm(h)
        output = self.output(h[:, -1, :])  # only compute last logits
        return output.float()

Is RMS the best way to go? I like the simplicity of it, but I'm curious.
For some tasks, compared to your model, Minerva outperforms. why? Is it just the one in the paper?
Why isn't the structure of your model described in the paper?
By any chance, what structure do you have in mind for your next model?
Amazon, Deepmind, and other great companies are showing that the encoder decoder structure is much better. Why do you guys only use decoders?
What model would you apply to Facebook, Instagram, Snapchat, etc.?
What do you think is your advantage over Bart or Prometheus? Especially over Bart, I don't know what it is, except full disclosure.
I sent an application to write the model. When will I be able to use it? I don't see a clear advantage yet.
What do you think of the derivative models that people have created? They are emerging very quickly.

Thank you so much. Your competition amuses me. I hope more companies continue to open up their models.

But I don't know why Yann LeCun was left out of the paper.

gyunggyung · 2023-03-04T00:16:21Z

@likethesky @Celebio @colesbury @pdollar
Can you answer?

jspisak · 2023-09-06T18:37:11Z

assigning to @AurRod for followup - these are some good questions but a bit deep for GitHub..

jspisak · 2023-09-06T18:42:29Z

Closing for now but feel free to open if needed.

gyunggyung mentioned this issue Mar 5, 2023

[20230305] Weekly AI ArXiv 만담 시즌2 - 8회차 jungwoo-ha/WeeklyArxivTalk#74

Open

albertodepaola added question General questions about using Llama2 model-usage issues related to how models are used/loaded miscellaneous does not fit an existing category, useful to determine whether we need further categorization labels Sep 6, 2023

amitsangani assigned amitsangani and unassigned amitsangani Sep 6, 2023

jspisak assigned TouvronHugo and AurRod and unassigned TouvronHugo Sep 6, 2023

jspisak closed this as completed Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does llama only use decoders? Why don't you use a more efficient method? #37

Does llama only use decoders? Why don't you use a more efficient method? #37

gyunggyung commented Mar 1, 2023

gyunggyung commented Mar 4, 2023

jspisak commented Sep 6, 2023

jspisak commented Sep 6, 2023

Does llama only use decoders? Why don't you use a more efficient method? #37

Does llama only use decoders? Why don't you use a more efficient method? #37

Comments

gyunggyung commented Mar 1, 2023

gyunggyung commented Mar 4, 2023

jspisak commented Sep 6, 2023

jspisak commented Sep 6, 2023