Skip to content Skip to sidebar Skip to footer

How To Force Scikit-learn Dictvectorizer Not To Discard Features?

Im trying to use scikit-learn for a classification task. My code extracts features from the data, and stores them in a dictionary like so: feature_dict['feature_name_1'] = feature_

Solution 1:

You should use fit_transform on the training set, and only transform on the test set.

Solution 2:

Are you making sure to call the previously built scaler and selector transforms on the test data?

scaler = preprocessing.StandardScaler().fit(trainingData)
selector = SelectPercentile(f_classif, percentile=90)
selector.fit(scaler.transform(trainingData), labelsTrain)
...
...
predicted = clf.predict(selector.transform(scaler.transform(testingData)))#

Post a Comment for "How To Force Scikit-learn Dictvectorizer Not To Discard Features?"