Numpy CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower'

Question

I have an one-dimensional array with large strings in each of the elements. I am trying to use a CountVectorizer to convert text data into numerical vectors. However, I am getting an error saying:

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

mealarray contains large strings in each of the elements. There are 5000 such samples. I am trying to vectorize this as given below:

vectorizer = CountVectorizer(
    stop_words='english',
    ngram_range=(1, 1),  #ngram_range=(1, 1) is the default
    dtype='double',
)
data = vectorizer.fit_transform(mealarray)

The full stacktrace :

File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
    self.fixed_vocabulary_)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
    for feature in analyze(doc):
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
    return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'

Someone (without having the full stack trace, it's hard to tell who, either scikit or Numpy) is trying to treat a Numpy array as a string ("FOO".lower() returns "foo"). Are you sure mealarray's contents are strings, or that CountVectorizer wants an array of strings? — Ahmed Fasih, Oct 14 '14 at 18:01

ashu · Answer 1 · 2014-10-14 18:57:25Z

up vote 3 down vote

Got the answer to my question. Basically, CountVectorizer is taking lists (with string contents) as an argument rather than array. That solved my problem.

answered Oct 14 '14 at 18:57

ashu

418

add a comment |

Warren Weckesser · Answer 2 · 2014-10-14 18:23:26Z

up vote 2 down vote

Check the shape of mealarray. If the argument to fit_transform is an array of strings, it must be a one-dimensional array. (That is, mealarray.shape must be of the form (n,).) For example, you'll get the "no attribute" error if mealarray has a shape such as (n, 1).

You could try something like

data = vectorizer.fit_transform(mealarray.ravel())

edited Oct 14 '14 at 18:23

answered Oct 14 '14 at 18:09

Warren Weckesser

28.5k32652

I tried it with ravel and got the following error. AttributeError: 'NoneType' object has no attribute 'lower'. The shape of mealarray is (5000,1) because I created it using "mealarray = np.empty((plen,1), dtype=object)" – ashu Oct 14 '14 at 18:27

OK, so you then populate the array afterwards. Then you must have a count of the actual number of words in mealarray, correct? Let's say it is nwords. Then pass mealarray[:nwords].ravel() to fit_transform(). (Although I wonder why you create the array with shape (plen,1) instead of just (plen,).) – Warren Weckesser Oct 14 '14 at 18:35

Note: In my previous comment, I assume that you fill mealarray from the beginning, with no indices containing None between indices containing words. – Warren Weckesser Oct 14 '14 at 18:42

add a comment |

asked	1 year ago
viewed	1873 times
active	1 year ago

current community

your communities

more stack exchange communities

Numpy CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower'

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python numpy scikit-learn text-classification or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Numpy CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower'

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python numpy scikit-learn text-classification or ask your own question.

Related

Hot Network Questions