Using PCA to help visualize Word-Embeddings — Sklearn, Matplotlib
Quick, simple write up on using PCA to reduce word-embedding dimensions down to 2D so we visualize them in a scatter plot.
1. Setup
import matplotlib.pyplot as plt
import gensim.downloader as api
from sklearn.decomposition import PCAtransformer = api.load('glove-twitter-100')## add your own terms here
terms = [
'great',
'good',
'ok',
'worst',
'bad',
'awful',
'normal',
'fine',
'better',
'best'
]
2. Pull Embeddings
embeddings = [ transformer.wv[term] for term in terms ]
3. Run PCA
pca = PCA(n_components=2)
data = pca.fit_transform(embeddings).transpose()x, y = data[0], data[1]
4. Visualize
fig, ax = plt.subplots(figsize=(15, 8))ax.scatter(x, y, c='g')
for i, term in enumerate(terms):
ax.annotate(term, (x[i], y[i]))plt.xlabel('x')
plt.ylabel('y')
plt.show()
