Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up

Here's how it works:

Anybody can ask a question
Anybody can answer
The best answers are voted up and rise to the top

Subtract multiple columns in PANDAS DataFrame by a series (single column)

up vote 3 down vote favorite

Background

I have tons of very large pandas DataFrames that need to be normalized with the following operation; log2(data) - mean(log2(data))

Example Data

The example DataFrame my_df looks like this;

     iovrrx    nfinsu    mvdfjc    idjges    fubmrg    lvuhfv
0  0.987654  0.206104  0.802920  0.011157  0.860618  0.575871
1  0.706397  0.860083  0.939230  0.436194  0.557081  0.706964
2  0.043139  0.729435  0.597488  0.700998  0.974193  0.917758
3  0.316080  0.461547  0.844540  0.510143  0.908475  0.877330
4  0.828839  0.177670  0.610833  0.328238  0.327697  0.689756

Question

I have tried to perform the normalization operation noted above many different ways however the following code snippet is the only one that I have gotten to work;

log_div_ave = my_df.apply(np.log2).values.T - my_df.apply(np.log2).mean(axis=1).values

log_div_ave = pd.DataFrame(log_div_ave.T,columns=my_df.columns)

print(log_div_ave)

   iovrrx    nfinsu    mvdfjc    idjges    fubmrg    lvuhfv
0  1.667378 -0.593258  1.368628 -4.800610  1.468744  0.889117
1  0.056992  0.340988  0.467991 -0.638518 -0.285601  0.058149
2 -3.467018  0.612699  0.324830  0.555330  1.030127  0.944032
3 -0.941776 -0.395590  0.476099 -0.251165  0.581380  0.531053
4  0.933714 -1.288174  0.493400 -0.402633 -0.405015  0.668708

As you can see I'm converting the DataFrame to a numpy array and transposing it just so I can subtract by the mean of the data. I then have to transpose the resulting array then reconstitute it as a DataFrame. Is there a simpler way to do all of this?

edited Mar 2 at 5:23

asked Feb 27 at 15:04

James Draper

251113

add a comment |

Your Answer

Sign up or log in

Post as a guest

Name

Post as a guest

Name

discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged python python-3.x numpy pandas or ask your own question.

question feed

asked	28 days ago
viewed	79 times

current community

your communities

more stack exchange communities

Subtract multiple columns in PANDAS DataFrame by a series (single column)

Background

Example Data

Question

Your Answer

Browse other questions tagged python python-3.x numpy pandas or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Subtract multiple columns in PANDAS DataFrame by a series (single column)

Background

Example Data

Question

Can you help? Code Review Stack Exchange depends on everyone sharing their knowledge. If you're able to answer this question, please do!

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged python python-3.x numpy pandas or ask your own question.

Related

Hot Network Questions