Sign up ×
Stack Overflow is a community of 4.7 million programmers, just like you, helping each other. Join them; it only takes a minute:

I am trying to perform tfidf on a matrix. I would like to use gensim, but models.TfidfModel() only works on a corpus and therefore returns a list of lists of varying lengths (I want a matrix).

The options are to somehow fill in the missing values of the list of lists, or just convert the corpus to a matrix

numpy_matrix = gensim.matutils.corpus2dense(corpus, num_terms=number_of_corpus_features)

Choosing the latter, I then try to convert this count matrix to a tf-idf weighted matrix:

def TFIDF(m):
    #import numpy
    WordsPerDoc = numpy.sum(m, axis=0)
    DocsPerWord = numpy.sum(numpy.asarray(m > 0, 'i'), axis=1)
    rows, cols = m.shape
    for i in range(rows):
        for j in range(cols):
            amatrix[i,j] = (amatrix[i,j] / WordsPerDoc[j]) * log(float(cols) /     DocsPerWord[i])

But, I get the error AttributeError: 'numpy.ndarray' object has no attribute 'A'

I copied the function above from another script. It was:

def TFIDF(self):
    WordsPerDoc = sum(self.A, axis=0)        
    DocsPerWord = sum(asarray(self.A > 0, 'i'), axis=1)
    rows, cols = self.A.shape
    for i in range(rows):
       for j in range(cols):
          self.A[i,j] = (self.A[i,j] / WordsPerDoc[j]) * log(float(cols) / DocsPerWord[i])

Which I believe is where it's getting the A from. However, I re-imported the function.

Why is this happening?

share|improve this question
    
Show the complete traceback (i.e. the entire error message). That should show which line is triggering the error. – Warren Weckesser Jun 22 at 10:20
    
.A is a np matrix adttribute, turning it into an array. Try omitting it. – hpaulj Jun 22 at 11:27
up vote 0 down vote accepted

self.A is either an np.matrix or sparse matrix. For both A means, return a copy that is a np.ndarray. In other words, it converts the 2d matrix to a regular numpy array. If self is already an array, it would produce your error.

It looks like you have corrected that with your own version of TFIDF - except that uses 2 variables, m and amatrix instead of self.A.

I think you need to look more at the error message and stack, to identify where that .A is. Also make sure you understand where the code expects a matrix, especially a sparse one. And whether your own code differs in that regard.

I recall from other SO questions that one of the learning packages had switched to using sparse matrices, and that required adding .todense() to some of their code (which expected dense ones).

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.