Understanding Gensim Word2vec's Most_similar
Solution 1:
You can view exactly what most_similar()
does in its source code:
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py#L485
It's not quite "find points in the vector space that are as close as possible to the positive vectors and as far away as possible from the negative ones". Rather, as described in the original word2vec papers, it performs vector arithmetic: adding the positive vectors, subtracting the negative, then from that resulting position, listing the known-vectors closest to that angle.
That is sufficient to solve man : king :: woman :: ?
-style analogies, via a call like:
sims = wordvecs.most_similar(positive=['king', 'woman'],
negative=['man'])
(You can think of this as, "start at 'king'-vector, add 'woman'-vector, subtract 'man'-vector, from where you wind up, report ranked word-vectors closest to that point (while leaving out any of the 3 query vectors).")
Post a Comment for "Understanding Gensim Word2vec's Most_similar"