Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does llama only use decoders? Why don't you use a more efficient method? #37

Closed
gyunggyung opened this issue Mar 1, 2023 · 3 comments
Closed
Assignees
Labels
miscellaneous does not fit an existing category, useful to determine whether we need further categorization model-usage issues related to how models are used/loaded question General questions about using Llama2

Comments

@gyunggyung
Copy link

Thanks for sharing this really good material. I have a lot of questions.

First, I'd like to say that I hope you ignore much of the mockery. Everyone, including me, is a bunch of people who do crappy work and scream at their keyboards compared to you.

  1. The model seems to only use decoders, why?
# https://github.com/facebookresearch/llama/blob/main/llama/model.py#L223
    def forward(self, tokens: torch.Tensor, start_pos: int):
        _bsz, seqlen = tokens.shape
        h = self.tok_embeddings(tokens)
        self.freqs_cis = self.freqs_cis.to(h.device)
        freqs_cis = self.freqs_cis[start_pos : start_pos + seqlen]

        mask = None
        if seqlen > 1:
            mask = torch.full((1, 1, seqlen, seqlen), float("-inf"), device=tokens.device)
            mask = torch.triu(mask, diagonal=start_pos + 1).type_as(h)

        for layer in self.layers:
            h = layer(h, start_pos, freqs_cis, mask)
        h = self.norm(h)
        output = self.output(h[:, -1, :])  # only compute last logits
        return output.float()
  1. Is RMS the best way to go? I like the simplicity of it, but I'm curious.
  2. For some tasks, compared to your model, Minerva outperforms. why? Is it just the one in the paper?
  3. Why isn't the structure of your model described in the paper?
  4. By any chance, what structure do you have in mind for your next model?
  5. Amazon, Deepmind, and other great companies are showing that the encoder decoder structure is much better. Why do you guys only use decoders?
  6. What model would you apply to Facebook, Instagram, Snapchat, etc.?
  7. What do you think is your advantage over Bart or Prometheus? Especially over Bart, I don't know what it is, except full disclosure.
  8. I sent an application to write the model. When will I be able to use it? I don't see a clear advantage yet.
  9. What do you think of the derivative models that people have created? They are emerging very quickly.

Thank you so much. Your competition amuses me. I hope more companies continue to open up their models.

But I don't know why Yann LeCun was left out of the paper.

@gyunggyung
Copy link
Author

@likethesky @Celebio @colesbury @pdollar
Can you answer?

@albertodepaola albertodepaola added question General questions about using Llama2 model-usage issues related to how models are used/loaded miscellaneous does not fit an existing category, useful to determine whether we need further categorization labels Sep 6, 2023
@jspisak jspisak assigned TouvronHugo and AurRod and unassigned TouvronHugo Sep 6, 2023
@jspisak
Copy link
Contributor

jspisak commented Sep 6, 2023

assigning to @AurRod for followup - these are some good questions but a bit deep for GitHub..

@jspisak
Copy link
Contributor

jspisak commented Sep 6, 2023

Closing for now but feel free to open if needed.

@jspisak jspisak closed this as completed Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
miscellaneous does not fit an existing category, useful to determine whether we need further categorization model-usage issues related to how models are used/loaded question General questions about using Llama2
Projects
None yet
Development

No branches or pull requests

6 participants