Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Do the dot-product on the sparse matrix
When you have a large corpus, first making the matrix dense takes a lot of
memory. Doing the dot product first and then expanding the result is
more memory-efficient and still gives the same result
  • Loading branch information
Martijn van Beers committed May 19, 2017
commit e0b091e32335eb24332fe6ad89d7ac4ae945308d
4 changes: 2 additions & 2 deletions Chapter-6/document_similarity.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@
def compute_cosine_similarity(doc_features, corpus_features,
top_n=3):
# get document vectors
doc_features = doc_features.toarray()[0]
corpus_features = corpus_features.toarray()
doc_features = doc_features[0]
# compute similarities
similarity = np.dot(doc_features,
corpus_features.T)
similarity = similarity.toarray()[0]
# get docs with highest similarity scores
top_docs = similarity.argsort()[::-1][:top_n]
top_docs_with_score = [(index, round(similarity[index], 3))
Expand Down