Latent semantic analysis — LSA via Sklearn

Quick write up on using the CountVectorizer and TruncatedSVD from the Sklearn library, to compute Document-Term and Term-Topic matrices. After setting up our model, we try it out on simple, never before seen documents in order to label them.

Helper Methods

  • using these to simplify viewing a document-topic matrix


Document-Topic Matrix


Term-Topic Matrix

Hold-Out Documents

Document-Topic Matrix for Hold Outs
