is xgboostclassifier imcomptabile with calibratedclassifier? #5887

zahs123 · 2020-07-13T16:19:16Z

all,

i do not know why i am getting the following error: ValueError: feature_names mismatch. this is what i am running:
to get my data:

target=df['status']
train = df.drop(columns=['status'])
x_train, x_valid, y_train, y_valid = train_test_split(train, target, stratify=target, random_state=42, test_size=0.2)
x_train, x_test, y_train, y_test = train_test_split(x_train, y_train, stratify=y_train, random_state=42, test_size=0.2)

and then i run grid search

kfolds = StratifiedKFold(3)
clf = GridSearchCV(models['XGBOOST'], params['XGBOOST'], cv=kfolds.split(x_train, y_train),
                       scoring='roc_auc', return_train_score=True)

clf.fit(x_train, y_train)

model = clf.best_estimator_

clf_isotonic = CalibratedClassifierCV(model, cv='prefit', method='isotonic')
clf_isotonic.fit(x_valid, y_valid)

but i get the above error. my x_valid and x_train have the same columns even when i fix the columns using:

#f_names = model.get_booster().feature_names
f_names = x_train.columns.tolist()
x_valid[f_names]

i do not understand why i am getting that error even when i fix the columns as above, i have tried doing x_valid.values but still no hope... they have the same features so i really do not know what is happening

The text was updated successfully, but these errors were encountered:

trivialfis · 2020-07-15T12:11:38Z

Which XGBoost version are you using?

zahs123 · 2020-07-16T12:43:05Z

its 1.0.2

trivialfis · 2020-07-24T03:42:47Z

Will investigate this.

trivialfis · 2020-07-28T11:05:42Z

Reproduced.

trivialfis · 2020-07-29T03:55:47Z

I looked into this briefly. sklearn converts pandas dataframe into numpy array during its validation (check_array function), so information like feature names are loss. Not sure if this is expected behaviour from skl.

trivialfis · 2020-07-29T04:03:42Z

There's a related issue in scikit-learn/scikit-learn#5523 .

zahs123 · 2020-07-29T10:15:15Z

i still can't figure this priblem out, my validation and train set have exact same features

trivialfis · 2020-07-29T11:09:47Z

@zahs123 There's a columns in pandas.DataFrame, which contains the name of each columns. Here are the events:

Scikit learn grid search pass the dataframe to XGBoost as it is, so XGBoost memorize the feature names from your dataframe during grid searching.
But scikit learn calibrate classifier removes the feature names by converting your dataframe to numpy array before passing it down to XGBoost. Hence this time when XGBoost got the data it's an array instead of dataframe.
The error happens when XGBoost try to compare the feature names in numpy array (which is generated automatically as array doesn't have feature names) to previously memorized feature names from pandas dataframe.

trivialfis self-assigned this Jul 24, 2020

trivialfis added the Blocking label Jul 28, 2020

trivialfis mentioned this issue Jul 29, 2020

Disable feature validation on sklearn predict prob. #5953

Merged

trivialfis closed this as completed in #5953 Jul 29, 2020

hcho3 mentioned this issue Sep 24, 2020

Re-enable feature validation in sklearn predict proba method by default #6158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is xgboostclassifier imcomptabile with calibratedclassifier? #5887

is xgboostclassifier imcomptabile with calibratedclassifier? #5887

zahs123 commented Jul 13, 2020 •

edited

Loading

trivialfis commented Jul 15, 2020

zahs123 commented Jul 16, 2020

trivialfis commented Jul 24, 2020

trivialfis commented Jul 28, 2020

trivialfis commented Jul 29, 2020

trivialfis commented Jul 29, 2020

zahs123 commented Jul 29, 2020

trivialfis commented Jul 29, 2020 •

edited

Loading

is xgboostclassifier imcomptabile with calibratedclassifier? #5887

is xgboostclassifier imcomptabile with calibratedclassifier? #5887

Comments

zahs123 commented Jul 13, 2020 • edited Loading

trivialfis commented Jul 15, 2020

zahs123 commented Jul 16, 2020

trivialfis commented Jul 24, 2020

trivialfis commented Jul 28, 2020

trivialfis commented Jul 29, 2020

trivialfis commented Jul 29, 2020

zahs123 commented Jul 29, 2020

trivialfis commented Jul 29, 2020 • edited Loading

zahs123 commented Jul 13, 2020 •

edited

Loading

trivialfis commented Jul 29, 2020 •

edited

Loading