I am trying to perform tfidf on a matrix. I would like to use gensim, but models.TfidfModel()
only works on a corpus and therefore returns a list of lists of varying lengths (I want a matrix).
The options are to somehow fill in the missing values of the list of lists, or just convert the corpus to a matrix
numpy_matrix = gensim.matutils.corpus2dense(corpus, num_terms=number_of_corpus_features)
Choosing the latter, I then try to convert this count matrix to a tf-idf weighted matrix:
def TFIDF(m):
#import numpy
WordsPerDoc = numpy.sum(m, axis=0)
DocsPerWord = numpy.sum(numpy.asarray(m > 0, 'i'), axis=1)
rows, cols = m.shape
for i in range(rows):
for j in range(cols):
amatrix[i,j] = (amatrix[i,j] / WordsPerDoc[j]) * log(float(cols) / DocsPerWord[i])
But, I get the error AttributeError: 'numpy.ndarray' object has no attribute 'A'
I copied the function above from another script. It was:
def TFIDF(self):
WordsPerDoc = sum(self.A, axis=0)
DocsPerWord = sum(asarray(self.A > 0, 'i'), axis=1)
rows, cols = self.A.shape
for i in range(rows):
for j in range(cols):
self.A[i,j] = (self.A[i,j] / WordsPerDoc[j]) * log(float(cols) / DocsPerWord[i])
Which I believe is where it's getting the A
from. However, I re-imported the function.
Why is this happening?
.A
is anp matrix
adttribute, turning it into an array. Try omitting it. – hpaulj Jun 22 at 11:27