Add columns support to JSON loader for selective key filtering #7652

ArjunJagdale · 2025-06-27T16:18:42Z

Fixes #7594
This PR adds support for filtering specific columns when loading datasets from .json or .jsonl files — similar to how the columns=... argument works for Parquet.

As suggested, support for the columns=... argument (previously available for Parquet) has now been extended to JSON and JSONL loading via load_dataset(...). You can now load only specific keys/columns and skip the rest — which should help in cases where some fields are unclean, inconsistent, or just unnecessary.

Example:

from datasets import load_dataset

dataset = load_dataset("json", data_files="your_data.jsonl", columns=["id", "title"])
print(dataset["train"].column_names)
# Output: ['id', 'title']

Summary of changes:

Added columns: Optional[List[str]] to JsonConfig
Updated _generate_tables() to filter selected columns
Forwarded columns argument from load_dataset() to the config
Added test case to validate behavior

Let me know if you'd like the same to be added for CSV or others as a follow-up — happy to help.

temp2

ArjunJagdale added 3 commits June 27, 2025 21:48

temp1

db75657

temp2

Update load.py

c7872cb

Update test_json.py

a0fedf5

ArjunJagdale changed the title ~~temp1~~ Add columns parameter to JSON loader to filter selected columns during loading Jun 27, 2025

ArjunJagdale changed the title ~~Add columns parameter to JSON loader to filter selected columns during loading~~ Add columns support to JSON loader for selective key filtering Jun 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add columns support to JSON loader for selective key filtering #7652

Add columns support to JSON loader for selective key filtering #7652

Uh oh!

ArjunJagdale commented Jun 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add columns support to JSON loader for selective key filtering #7652

Are you sure you want to change the base?

Add columns support to JSON loader for selective key filtering #7652

Uh oh!

Conversation

ArjunJagdale commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example:

Summary of changes:

Uh oh!

Uh oh!

ArjunJagdale commented Jun 27, 2025 •

edited

Loading