Skip to content

Commit

Permalink
Fix off-by-one error in byte len computation
Browse files Browse the repository at this point in the history
  • Loading branch information
liujch1998 committed Sep 15, 2024
1 parent 499ac18 commit df15f3c
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions olmo/eval/downstream.py
Original file line number Diff line number Diff line change
Expand Up @@ -1782,8 +1782,8 @@ def prep_examples(self):
f"Sample doc from ({self.dataset_path}, {ds_name}):"
+ f"\ndoc_text: {doc_text}\ncontinuation: {continuation_str}"
)
cont_str_len = len(continuation_str) - 1 # continuation contain leading blank
cont_byte_len = len(continuation_str[1:].encode("utf-8"))
cont_str_len = len(continuation_str) # continuation does not contain leading blank
cont_byte_len = len(continuation_str.encode("utf-8"))
continuation = self.token_encode(continuation_str)

# query, remove last token from continuation, truncate from left is longer than model ctx length
Expand Down

0 comments on commit df15f3c

Please sign in to comment.