Loading A Dataset For Linear Svm Classification From A Csv File
Solution 1:
First, I think you have an error in your CSV in the first row:
25.3, 12.4, 2.35, 4.89. 1, 2.35, 5.65, 7, 6.24, 5.52, M
I just assumed it should be 4.89, 1, and not 4.89. 1.
Second, I recommend you to use pandas to read that CSV, and then do this:
import pandas as pd
data = pd.read_csv('prueba.csv', header=None, usecols=[i for i inrange(11)])
# the usecols=[i for i in range(11)] will create a list of numbers for your columns# that line will make a dataframe called data, which will contain your data.
l = [i for i inrange(10)]
X_train = data[l]
y_train = data[10]
This is the most easy way to have ready your data for any machine learning algorithm in scikit-learn.
Solution 2:
import pandas as pd
df = pd.read_csv(/path/to/csv, header=None, index_col=False)
x = df.iloc[:,:-1].values
y = df.iloc[:,-1:].values
Solution 3:
I think you should use pandas, which is a library that helps you with reading csv:
import pandas as pd
dataset = pd.read_csv('train.cvs')
Second you can use train_test_split
to automatically split the data:
X_train, X_test, y_train, y_test = train_test_split(
X, y, stratify=y, test_size=0.2)
This will split the data where X_train and X_test comprises of 80% of the data and y_train, y_test 20%. This can be changed with adjusting test_size
. stratify
will automatically make the ratio of classification count (M, B) equal in train and test, which is generally considered good practise in machine learning. This will generate random split each time. If you want the same split you can use random_state=(SEED)
as keyword argument.
After that you can continue on with the machine learning:
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, classification_report
# Important to scale
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
clf = SVC()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print(classification_report(y_test, pred))
print(confusion_matrix(y_test, pred))
Post a Comment for "Loading A Dataset For Linear Svm Classification From A Csv File"