How To Perform Standardization On The Data In Gridsearchcv?
How to perform standardizing on the data in GridSearchCV? Here is the code. I have no idea on how to do it. import dataset import warnings warnings.filterwarnings('ignore') import
Solution 1:
Demo:
from sklearn.pipelineimportPipelinefrom sklearn.model_selectionimport train_test_split
X_train, X_test, y_train, y_test = \
train_test_split(X, y, test_size=0.33)
pipe = Pipeline([
('scale', StandardScaler()),
('clf', LogisticRegression())
])
param_grid = [
{
'clf__solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
'clf__C': np.logspace(-3, 1, 5),
},
]
grid = GridSearchCV(pipe, param_grid=param_grid, cv=3, n_jobs=-1, verbose=2)
grid.fit(X_train, y_train)
Solution 2:
if you use refit=True than you can use the best model results from the GridSearchCV. you can use the cv_results to find the best row based on rank score. Using the best row then it is possible to extract the parameters. If your feature list becomes large than use RandomSearchCV to make predictions.
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=0.3)
pipe = Pipeline([
('scale', StandardScaler()),
('clf', LogisticRegression())
])
param_grid = [
{
'clf__solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
'clf__C': np.logspace(-3, 1, 5),
},
]
grid_class=GridSearchCV(
estimator=pipeline,
param_grid=parameter_grid,
scoring='accuracy',
n_jobs=4, #use 4 cores
cv=10, #10 folds
refit=True,
return_train_score=True)
grid_class.fit(X_train,y_train)
predictions=grid_class.predict(X_test)
cv_results_df=pd.DataFrame(grid_class.cv_results_)
best_row=cv_results_df[cv_results_df["rank_test_score"]==1]
print(best_row)
params_column = cv_results_df.loc[:, ['params']]
print(params_column)
Post a Comment for "How To Perform Standardization On The Data In Gridsearchcv?"