18

I have a one-dimensional array with large strings in each of the elements. I am trying to use a CountVectorizer to convert text data into numerical vectors. However, I am getting an error saying:

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

mealarray contains large strings in each of the elements. There are 5000 such samples. I am trying to vectorize this as given below:

vectorizer = CountVectorizer(
    stop_words='english',
    ngram_range=(1, 1),  #ngram_range=(1, 1) is the default
    dtype='double',
)
data = vectorizer.fit_transform(mealarray)

The full stacktrace :

File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
    self.fixed_vocabulary_)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
    for feature in analyze(doc):
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
    return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
2
  • 2
    Someone (without having the full stack trace, it's hard to tell who, either scikit or Numpy) is trying to treat a Numpy array as a string ("FOO".lower() returns "foo"). Are you sure mealarray's contents are strings, or that CountVectorizer wants an array of strings? Commented Oct 14, 2014 at 18:01
  • @AhmedFasih, just added full stack trace to the question ! Commented Oct 14, 2014 at 18:31

4 Answers 4

24

Check the shape of mealarray. If the argument to fit_transform is an array of strings, it must be a one-dimensional array. (That is, mealarray.shape must be of the form (n,).) For example, you'll get the "no attribute" error if mealarray has a shape such as (n, 1).

You could try something like

data = vectorizer.fit_transform(mealarray.ravel())
4
  • I tried it with ravel and got the following error. AttributeError: 'NoneType' object has no attribute 'lower'. The shape of mealarray is (5000,1) because I created it using "mealarray = np.empty((plen,1), dtype=object)" Commented Oct 14, 2014 at 18:27
  • 1
    OK, so you then populate the array afterwards. Then you must have a count of the actual number of words in mealarray, correct? Let's say it is nwords. Then pass mealarray[:nwords].ravel() to fit_transform(). (Although I wonder why you create the array with shape (plen,1) instead of just (plen,).) Commented Oct 14, 2014 at 18:35
  • Note: In my previous comment, I assume that you fill mealarray from the beginning, with no indices containing None between indices containing words. Commented Oct 14, 2014 at 18:42
  • 1
    @WarrenWeckesser had similar problem, your ravel() solution worked for me. Thanks! Commented Feb 14, 2017 at 21:16
9

Got the answer to my question. Basically, CountVectorizer is taking lists (with string contents) as an argument rather than array. That solved my problem.

3
  • Hi @ashu , can you please share the changes that you had made in the code. If incase you have that. Commented Mar 2, 2019 at 12:11
  • That's close, but not exactly: it has to be a one-dimensional array/list Commented Feb 26, 2020 at 17:23
  • self accepting your own answer without providing a complete explanation Commented Dec 21, 2023 at 3:53
3

A better solution is explicit call pandas series and pass it CountVectorizer():

>>> tex = df4['Text']
>>> type(tex)
<class 'pandas.core.series.Series'>
X_train_counts = count_vect.fit_transform(tex)

Next one won't work, cause its a frame and NOT series

>>> tex2 = (df4.ix[0:,[11]])
>>> type(tex2)
<class 'pandas.core.frame.DataFrame'>
2

I got the same error:

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

To solve this problem, I did the following:

  1. Verify the dimension of the array with: name_of_array1.shape
  2. I output is: (n,1) then use flatten() to convert an array of two-dimensional to one-dimensional: flat_array = name_of_array1.flatten()
  3. Now, I can use CountVectorizer() because this works with list of one argument as a string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.