CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower'

Question

I have a one-dimensional array with large strings in each of the elements. I am trying to use a CountVectorizer to convert text data into numerical vectors. However, I am getting an error saying:

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

mealarray contains large strings in each of the elements. There are 5000 such samples. I am trying to vectorize this as given below:

vectorizer = CountVectorizer(
    stop_words='english',
    ngram_range=(1, 1),  #ngram_range=(1, 1) is the default
    dtype='double',
)
data = vectorizer.fit_transform(mealarray)

The full stacktrace :

File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
    self.fixed_vocabulary_)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
    for feature in analyze(doc):
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
    return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'

Someone (without having the full stack trace, it's hard to tell who, either scikit or Numpy) is trying to treat a Numpy array as a string ("FOO".lower() returns "foo"). Are you sure mealarray's contents are strings, or that CountVectorizer wants an array of strings? — Ahmed Fasih
– Ahmed Fasih, Commented Oct 14, 2014 at 18:01

Warren Weckesser · Accepted Answer · 2014-10-14 18:23:26Z

24

Check the shape of mealarray. If the argument to fit_transform is an array of strings, it must be a one-dimensional array. (That is, mealarray.shape must be of the form (n,).) For example, you'll get the "no attribute" error if mealarray has a shape such as (n, 1).

You could try something like

data = vectorizer.fit_transform(mealarray.ravel())

edited Oct 14, 2014 at 18:23

answered Oct 14, 2014 at 18:09

Warren Weckesser

116k20 gold badges207 silver badges224 bronze badges

I tried it with ravel and got the following error. AttributeError: 'NoneType' object has no attribute 'lower'. The shape of mealarray is (5000,1) because I created it using "mealarray = np.empty((plen,1), dtype=object)"

ashu
– ashu

10/14/2014 18:27:11
Commented Oct 14, 2014 at 18:27
1

OK, so you then populate the array afterwards. Then you must have a count of the actual number of words in mealarray, correct? Let's say it is nwords. Then pass mealarray[:nwords].ravel() to fit_transform(). (Although I wonder why you create the array with shape (plen,1) instead of just (plen,).)

Warren Weckesser
– Warren Weckesser

10/14/2014 18:35:33
Commented Oct 14, 2014 at 18:35
Note: In my previous comment, I assume that you fill mealarray from the beginning, with no indices containing None between indices containing words.

Warren Weckesser
– Warren Weckesser

10/14/2014 18:42:53
Commented Oct 14, 2014 at 18:42
1

@WarrenWeckesser had similar problem, your ravel() solution worked for me. Thanks!

data_steve
– data_steve

02/14/2017 21:16:59
Commented Feb 14, 2017 at 21:16

Add a comment |

ashu · Accepted Answer · 2014-10-14 18:57:25Z

9

Got the answer to my question. Basically, CountVectorizer is taking lists (with string contents) as an argument rather than array. That solved my problem.

answered Oct 14, 2014 at 18:57

ashu

4892 gold badges5 silver badges13 bronze badges

Hi @ashu , can you please share the changes that you had made in the code. If incase you have that.

poPYtheSailor
– poPYtheSailor

03/02/2019 12:11:56
Commented Mar 2, 2019 at 12:11
That's close, but not exactly: it has to be a one-dimensional array/list

ben26941
– ben26941

02/26/2020 17:23:34
Commented Feb 26, 2020 at 17:23
self accepting your own answer without providing a complete explanation

ricoms
– ricoms

12/21/2023 03:53:58
Commented Dec 21, 2023 at 3:53

Add a comment |

Max Kleiner · Accepted Answer · 2018-07-18 16:40:49Z

3

A better solution is explicit call pandas series and pass it CountVectorizer():

>>> tex = df4['Text']
>>> type(tex)
<class 'pandas.core.series.Series'>
X_train_counts = count_vect.fit_transform(tex)

Next one won't work, cause its a frame and NOT series

>>> tex2 = (df4.ix[0:,[11]])
>>> type(tex2)
<class 'pandas.core.frame.DataFrame'>

answered Jul 18, 2018 at 16:40

Max Kleiner

1,6621 gold badge15 silver badges14 bronze badges

Add a comment |

Yudi Guzmán · Accepted Answer · 2021-10-04 11:58:19Z

2

I got the same error:

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

To solve this problem, I did the following:

Verify the dimension of the array with: name_of_array1.shape
I output is: (n,1) then use flatten() to convert an array of two-dimensional to one-dimensional: flat_array = name_of_array1.flatten()
Now, I can use CountVectorizer() because this works with list of one argument as a string.

answered Oct 4, 2021 at 11:58

Yudi Guzmán

714 bronze badges

Add a comment |

Collectives™ on Stack Overflow

CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower'

4 Answers 4

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Linked

Related