Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free.

I have a numpy array where each cell of a specific row represents a value for a feature. I store all of them in an 100*4 matrix.

A     B   C
1000  10  0.5
765   5   0.35
800   7   0.09  

Any idea how I can normalize rows of this numpy.array where each value is between 0 and 1?

My desired output is:

A     B    C
1     1    1
0.765 0.5  0.7
0.8   0.7  0.18(which is 0.09/0.5)

Thanks in advance :)

share|improve this question
2  
Just to be clear: is it a NumPy array or a Pandas DataFrame? –  ajcr Apr 15 at 21:52
1  
When programming it's important to be specific: a set is a particular object in Python, and you can't have a set of numpy arrays. Python doesn't have a matrix, but numpy does, and that matrix type isn't the same as a numpy array/ndarray (which is itself different from Python's array type, which is not the same as a list). And none of these are pandas DataFrames.. –  DSM Apr 15 at 21:58
    
@ajcr sorry for the typos. I edited my question. Thanks –  nimafl Apr 15 at 21:59

1 Answer 1

up vote 6 down vote accepted

If I understand correctly, what you want to do is divide by the maximum value in each column. You can do this easily using broadcasting.

Starting with your example array:

import numpy as np

x = np.array([[1000,  10,   0.5],
              [ 765,   5,  0.35],
              [ 800,   7,  0.09]])

x_normed = x / x.max(axis=0)

print(x_normed)
# [[ 1.     1.     1.   ]
#  [ 0.765  0.5    0.7  ]
#  [ 0.8    0.7    0.18 ]]

x.max(0) takes the maximum over the 0th dimension (i.e. rows). This gives you a vector of size (ncols,) containing the maximum value in each column. You can then divide x by this vector in order to normalize your values such that the maximum value in each column will be scaled to 1.

share|improve this answer
    
I really appreciate your answer, I always have issues dealing with "axis" ! –  nimafl Apr 16 at 5:39
    
For reductions (i.e. .max(), .min(), .sum(), .mean() etc.), you just need to remember that axis specifies the dimension that you want to "collapse" during the reduction. If you want the maximum for each column, then you need to collapse the the row dimension. –  ali_m Apr 16 at 9:41

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.