How To Force Scikit-learn Dictvectorizer Not To Discard Features?
Im trying to use scikit-learn for a classification task. My code extracts features from the data, and stores them in a dictionary like so: feature_dict['feature_name_1'] = feature_
Solution 1:
You should use fit_transform
on the training set, and only transform
on the test set.
Solution 2:
Are you making sure to call the previously built scaler and selector transforms on the test data?
scaler = preprocessing.StandardScaler().fit(trainingData)
selector = SelectPercentile(f_classif, percentile=90)
selector.fit(scaler.transform(trainingData), labelsTrain)
...
...
predicted = clf.predict(selector.transform(scaler.transform(testingData)))#
Post a Comment for "How To Force Scikit-learn Dictvectorizer Not To Discard Features?"