Weather Captioned - First Time Series - Text Multi-modal Dataset

Raw data and processing scripts of Weather Captioned Dataset in TGTSF

Data Source

The time series data is from the Max-Planck-Institut fur Biogeochemie, jena, WS Beutenberg site specifically. The raw data is in the folder.

We gather the weather forecast report from a publically avaliable weather forecast platform. The raw data is also in the folder.

About Caption

The caption is generated with raw data retrieved from a publically avaliable weather forecast source. **No time series is provided to the large language model. **

We provide all the caption we generated in . The caption is generated with GPT4 with the scripts in .

⚠ Caution: The caption of whole dataset can cost over 400 USD with GPT4!!!

Our caption may not be the optimal result. We encourage you to do your own version of captioning.

About Pre-embedding

We provide the pre-embedding of the news text in the dataset. You can download them here with gdown. The embedding is generated with the scripts in and .

Data Pipeline

We use two hashtable to manage the news data to make it aligned with the time series in temporal order. The embedding of news taxt is saved as npy file with the hash key as the filename.

Timestamp for time series segment -- [Date2Hash hashtable] --> News hash key list for the timestamp -- [Hash2Emb hashtable] --> Read the embedding from the npy file

You can use the hash2text hashtable to check the news text for the hash key.

Hash key -- [Hash2Text hashtable] --> News text

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
captions		captions
data_process_scripts		data_process_scripts
date2hash		date2hash
hash2text		hash2text
raw_data		raw_data
time_series		time_series
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weather Captioned - First Time Series - Text Multi-modal Dataset

Data Source

About Caption

About Pre-embedding

Data Pipeline

About

Releases

Packages

Languages

License

VEWOXIC/Weather-Captioned

Folders and files

Latest commit

History

Repository files navigation

Weather Captioned - First Time Series - Text Multi-modal Dataset

Data Source

About Caption

About Pre-embedding

Data Pipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages