Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use processor? #789

Closed
bbbzhai opened this issue Dec 30, 2021 · 9 comments
Closed

How to use processor? #789

bbbzhai opened this issue Dec 30, 2021 · 9 comments
Labels
question Further information is requested

Comments

@bbbzhai
Copy link

bbbzhai commented Dec 30, 2021

Hi,
I'm wondering how to use processor and what's the difference between processor and operator?
I've been reading about processor and confused on how to use it.
For instance, qlib.data.dataset.processor.MinMaxNorm.
I tried and it cannot be used like MinMaxNorm($close).
So what are the use case of processor?

More specifically, I wanted to use cross sectional ranking CSRankNorm($close). But I cannot achieve that. Is there anyway I can convert a processor to operator?

Thanks

@bbbzhai bbbzhai added the question Further information is requested label Dec 30, 2021
@bbbzhai
Copy link
Author

bbbzhai commented Dec 30, 2021

I'm happy to write some operator which can operate row-wise, if that's not currently supported.
Wanted to run by your guys first before I start, cause I suspect it's already supported but I overlooked.

@you-n-g
Copy link
Collaborator

you-n-g commented Jan 3, 2022

I add some docs about data processing.
#797

Stock-wise operators (I understand your row-wise as stock-wise) are not supported now.

Is your question answered?

@bbbzhai
Copy link
Author

bbbzhai commented Jan 3, 2022

I add some docs about data processing. #797

Stock-wise operators (I understand your row-wise as stock-wise) are not supported now.

Is your question answered?

Thanks for your comments. I was looking for stock-wise operators. Since it's not currently supported, I will try something else.
Thanks!

@bbbzhai bbbzhai closed this as completed Jan 3, 2022
@bbbzhai
Copy link
Author

bbbzhai commented Jan 17, 2022

btw, what's the difference between infer_processor and learn_processor?

Is this processor usable for inference Some processors are not usable for inference.

I see this statement from the docs. Could you provide an example to clarify the difference? Thanks.

@bbbzhai bbbzhai reopened this Jan 17, 2022
@bbbzhai
Copy link
Author

bbbzhai commented Jan 17, 2022

I wrote a processor to test for myself. I added a new column to the dataframe.
I also try to use DropCol processor
However, when I try to print it with dataset.prepare("train"), I couldn't see the column I added and the columns I dropped are still there.
Can anybody help?

@bbbzhai
Copy link
Author

bbbzhai commented Jan 18, 2022

I can see in checkpoint

for proc in proc_l:

that the df returned was dropped column and my new columns added.
But when I check with dataset.prepare("train"), it's not there anymore.

@bbbzhai
Copy link
Author

bbbzhai commented Jan 18, 2022

I realized the data that get train are the ones that after processed, but in dataset.prepare("train") it shows the original dataframe. Is this a bug?
I still don't understand learn_processor and infer_processor.

@bbbzhai
Copy link
Author

bbbzhai commented Jan 18, 2022

Further digging,
dataset.prepare('train', col_set=["feature", "label"], data_key='learn')
modifying the data_key will give the data after processing.

So far, my understanding is that:

  1. learn_processor process the data for training
  2. Infer_processor process the data for prediction.
  3. Some processors are good for both learn and infer, i.e. normalizations.
  4. Some variables are good for learn but not for infer. For this kind of variables, we train using variables that are processed, but when we do predictions, we don't use those variables. Can you give me a scenario that process like this?

@you-n-g
Copy link
Collaborator

you-n-g commented Jan 21, 2022

@bbbzhai
Sorry for the late response
You understand it correctly
I added some docs just now.

Can https://github.com/microsoft/qlib/pull/879/files answer your question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants