Skip to content

Commit

Permalink
only extra-subsample legal
Browse files Browse the repository at this point in the history
  • Loading branch information
markus583 committed May 14, 2024
1 parent 2e80a9d commit e6a88f8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion wtpsplit/train/train_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def prepare_dataset(
processed_chunk[args.text_column] = "\n".join(chunk)
processed_dataset.append(processed_chunk)
dataset = datasets.Dataset.from_list(processed_dataset)
if subsample:
if subsample and "legal" in dataset_name:
# 10k sentences -> 1k documents.
subsample = subsample // 10

Expand Down

0 comments on commit e6a88f8

Please sign in to comment.