Classification Accuracy Based On Single Feature Set

August 21, 2024 Post a Comment

I am trying to classify data based on prespecified labels. Got two columns and shown below: room_class room_cluster Standard single sea view Standard Del

Solution 1:

import random
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import numpy as np

##Based on your data
initial_room=["Standard single sea view","Deluxe twin Single","Suite Superior room ocean view","Superior Double twin","Deluxe Double room"]


##Based on your data created 100 data points##Its repeating
room_class=[initial_room[random.randint(0,len(initial_room)-1)] for i inrange(100)]

##Based on room_cluster
initial_cluster=["Standard","Deluxe","Suite","Superior"]

##Find intersection between room_class and room_cluster the matching word is the Y_Label
room_cluster=[''.join(list(set(each_room.split()).intersection(set(initial_cluster)))[0]) for each_room in room_class]


##Helps to embed 
embedding={}
index=0##For each unique word in the total room_class assign a unique number.for each_room in room_class:
    for each_word in each_room.split():
        if each_word notin embedding:
            embedding[each_word]=index
            index+=1##Find max_len of the room name
max_len=max([len(i.split()) for i in room_class])

##Needed for embedding the matrix
embedded_rooms=[]


##For each room in room_classfor each_room in room_class:
    embedded_room=[]
    for each_word in each_room.split():
        ##Each word assign that unique number
        embedded_room.append(embedding[each_word])

    #Get the length of the row
    room_len=len(embedded_room)

    ##If it is length max_len pad it with -1##Single for embedding I have already used 0 so I cant use itwhile(room_len<max_len):
        embedded_room.append(-1)
        room_len+=1##Append it to embedded rooms
    embedded_rooms.append(embedded_room)

Y=[]

##Embed Y based on same techniquefor each_cluster in room_cluster:
    Y.append(embedding[each_cluster])


X=np.array(embedded_rooms)


##Apply KNN
classifier = KNeighborsClassifier(n_neighbors=3)
classifier.fit(X,Y)

##Data for testing goes within this list
test=["Single Standard"]
test_label=["Standard"]


embed_tests=[]
##Convert the test to embedding #Use the same embeddingfor each_test in test:
    embed_test=[]
    for each_word in each_test.split():
        embed_test.append(embedding[each_word])
    ##Again Padding the data    
    n=len(embed_test)
    while(n<max_len):
        embed_test.append(-1)
        n+=1
    embed_tests.append(embed_test)  

#Predict the X_test
X_test=np.array(embed_tests)
predictions = classifier.predict(X_test)

##Convert class_labels to encoding
embed_test_label=[]
for each_classin test_label:
    embed_test_label.append(embedding[each_class])

##Print out the accuracyprint(accuracy_score(embed_test_label,predictions))

I have coded it roughly so bear it with me.

References:

Padding

Python Dictionary

Classification Accuracy Based On Single Feature Set

Solution 1:

Post a Comment for "Classification Accuracy Based On Single Feature Set"