Sign up ×
Stack Overflow is a community of 4.7 million programmers, just like you, helping each other. Join them; it only takes a minute:

I have an one-dimensional array with large strings in each of the elements. I am trying to use a CountVectorizer to convert text data into numerical vectors. However, I am getting an error saying:

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

mealarray contains large strings in each of the elements. There are 5000 such samples. I am trying to vectorize this as given below:

vectorizer = CountVectorizer(
    stop_words='english',
    ngram_range=(1, 1),  #ngram_range=(1, 1) is the default
    dtype='double',
)
data = vectorizer.fit_transform(mealarray)

The full stacktrace :

File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
    self.fixed_vocabulary_)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
    for feature in analyze(doc):
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
    return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
share|improve this question
    
Someone (without having the full stack trace, it's hard to tell who, either scikit or Numpy) is trying to treat a Numpy array as a string ("FOO".lower() returns "foo"). Are you sure mealarray's contents are strings, or that CountVectorizer wants an array of strings? – Ahmed Fasih Oct 14 '14 at 18:01
    
@AhmedFasih, just added full stack trace to the question ! – ashu Oct 14 '14 at 18:31

2 Answers 2

Got the answer to my question. Basically, CountVectorizer is taking lists (with string contents) as an argument rather than array. That solved my problem.

share|improve this answer

Check the shape of mealarray. If the argument to fit_transform is an array of strings, it must be a one-dimensional array. (That is, mealarray.shape must be of the form (n,).) For example, you'll get the "no attribute" error if mealarray has a shape such as (n, 1).

You could try something like

data = vectorizer.fit_transform(mealarray.ravel())
share|improve this answer
    
I tried it with ravel and got the following error. AttributeError: 'NoneType' object has no attribute 'lower'. The shape of mealarray is (5000,1) because I created it using "mealarray = np.empty((plen,1), dtype=object)" – ashu Oct 14 '14 at 18:27
    
OK, so you then populate the array afterwards. Then you must have a count of the actual number of words in mealarray, correct? Let's say it is nwords. Then pass mealarray[:nwords].ravel() to fit_transform(). (Although I wonder why you create the array with shape (plen,1) instead of just (plen,).) – Warren Weckesser Oct 14 '14 at 18:35
    
Note: In my previous comment, I assume that you fill mealarray from the beginning, with no indices containing None between indices containing words. – Warren Weckesser Oct 14 '14 at 18:42

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.