Reading in a text file as a numpy array with float values

Question

I do not know python at all thus I have been unsuccessful in interpreting similar previous answers and using them.

I have a python script that I wish to execute in unix. The script uses an input file but I do not understand how to ensure that the input file is read as numpy float array.

My input file is called chk.bed and it has one column of numeric values

-bash-4.1$ # head chk.bed
7.25236
0.197037
0.189464
2.60056
0
32.721
11.3978
3.85692
0
0

The original script is -

from scipy.stats import gaussian_kde
import numpy as np

#assume "fpkm" is a NumPy array of log2(fpkm) values

kernel = gaussian_kde(fpkm)
xi = np.linspace(fpkm.min(), fpkm.max(), 100)
yi = kernel.evaluate(xi)
mu = xi[np.argmax(yi)]
U = fpkm[fpkm > mu].mean()
sigma = (U - mu) * np.sqrt(np.pi / 2)
zFPKM = (fpkm - mu) / sigma

What I could understand up until now is to make sure the script is reading the file so I included fpkm = open("chk.bed", 'r') in the code.

However on executing the code - I get the following error -

Traceback (most recent call last):

  File "./calc_zfpkm.py", line 10, in <module>
    kernel = gaussian_kde(fpkm)

  File "/usr/lib64/python2.6/site-packages/scipy/stats/kde.py", line 88, in __init__
    self._compute_covariance()

  File "/usr/lib64/python2.6/site-packages/scipy/stats/kde.py", line 340, in _compute_covariance
    self.factor * self.factor)

  File "/usr/lib64/python2.6/site-packages/numpy/lib/function_base.py", line 1971, in cov
    X = array(m, ndmin=2, dtype=float)

TypeError: float() argument must be a string or a number

This seems to suggest that I am not reading in the file correctly and so the function gaussian_kde() cannot read in the values as float.

Can you please help ?

Thanks !

Ashwini Chaudhary · Accepted Answer · 2014-07-25 12:13:43Z

up vote 1 down vote accepted

You're passing a file object to gaussian_kde but it expects a NumPy array, you need to use numpy.loadtxt first to load the data in an array:

>>> import numpy as np
>>> arr = np.loadtxt('chk.bed')
>>> arr
array([  7.25236 ,   0.197037,   0.189464,   2.60056 ,   0.      ,
        32.721   ,  11.3978  ,   3.85692 ,   0.      ,   0.      ])
>>> gaussian_kde(arr)
<scipy.stats.kde.gaussian_kde object at 0x7f7350390190>

answered Jul 25 '14 at 12:13

Ashwini Chaudhary

144k23234310

Hey - thanks for replying. However i now get the error - Traceback (most recent call last): File "./calc_zfpkm.py", line 7, in <module> fpkm = np.loadtxt('chk.bed') File "/usr/lib64/python2.6/site-packages/numpy/lib/npyio.py", line 796, in loadtxt items = [conv(val) for (conv, val) in zip(converters, vals)] ValueError: invalid literal for float(): NA – user2695213 Jul 25 '14 at 12:20

I also forgot to ask.. I need to output the file zfpkm as well which I understand it probably doesnt. Can you help with that as well ? – user2695213 Jul 25 '14 at 12:22

@undefined I just learnt something new. – Kasisnu Jul 25 '14 at 12:23

@user2695213 Have a look at numpy.savetxt. – Ashwini Chaudhary Jul 25 '14 at 12:39

But that would output the result. Do you have any suggestion for the above mentioned error ? – user2695213 Jul 25 '14 at 15:14

| show 1 more comment

Cankut çubuk · Answer 2 · 2015-02-06 18:00:28Z

Here you can find the

R script for zFPKM normalization.

I inspired from the python code which has given above and also at this link:https://www.biostars.org/p/94680/

install.packages("ks","pracma")

library(ks)

library(pracma)

/* fpkm is an example data */

fpkm <- c(1,2,3,4,5,6,7,8,4,5,6,5,6,5,6,5,5,5,5,6,6,78,8,89,8,8,8,2,2,2,1,1,4,4,4,4,4,4,4,4,4,4,4,3,2,2,3,23,2,3,23,4,2,2,4,23,2,2,24,4,4,2,2,4,4,4,2,2,4,4,2,2,4,2,45,5,5,5,3,2,2,4,4,4,4,4,4,4,4,4,3,2,2,3,23,2,3,23,4,2,2,4,23,2,2,24,4,4,2,2,4,4,4,2,2,4,4,2,2,4,2,45,5,5,5,3,2,2)

 xi=linspace(min(fpkm),max(fpkm),100)

fhat = kde(x=fpkm,gridsize=100,eval.points=xi)

/* here I put digits=0. if I you do not round the numbers(yi) the results are a little bit changing.*/

yi=round(fhat$estimate,digits=0)

mu=xi[which.max(yi)]

U=mean(fpkm[fpkm>mu])

sigma=(U-mu)* (sqrt(pi/2))

zFPKM = (fpkm - mu) / sigma

Btw, I have a question.

Can I apply the same approach to RPKM?

Cankut CUBUK

Computational Genomics Program - Systems Genomics Lab

Centro de Investigación Príncipe Felipe (CIPF)

C/ Eduardo Primo Yúfera nº3

46012 Valencia, Spain

http://bioinfo.cipf.es

asked	2 years, 10 months ago
viewed	478 times
active	2 years, 4 months ago

Reading in a text file as a numpy array with float values

2 Answers 2

R script for zFPKM normalization.

Btw, I have a question.

Can I apply the same approach to RPKM?

Your Answer

Not the answer you're looking for? Browse other questions tagged python arrays numpy or ask your own question.

Hot Network Questions

Reading in a text file as a numpy array with float values

2 Answers 2

R script for zFPKM normalization.

Btw, I have a question.

Can I apply the same approach to RPKM?

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python arrays numpy or ask your own question.

Related

Hot Network Questions