Turn lists of strings into Numpy array (Python)

Question

So I'm trying to extract some data from a text file. Currently I'm able to get the correct lines that contain the data, which in turn gives me an output looking like this:

[   0.2      0.148  100.   ]
[   0.3      0.222  100.   ]
[   0.4      0.296  100.   ]
[   0.5     0.37  100.  ]
[   0.6      0.444  100.   ]

So basically I have 5 lists with one string in each. However, as you can imagine I would like to get all of this into a numpy array with each string split into the 3 values. Like this:

[[0.2, 0.148, 100],
[0.3, 0.222, 100],
[0.4, 0.296, 100],
[0.5, 0.37, 100],
[0.6, 0.444, 100]]

But since the separator in the output is random, i.e. I don't know if it will be 3 spaces, 5 spaces or a tab, I'm kind of lost in how to do this.

UPDATE:

So the data looks a bit like this:

data_file = 

Equiv. Sphere Diam. [cm]: 6.9
Conformity Index: N/A
Gradient Measure [cm]: N/A

Relative dose [%]           Dose [Gy] Ratio of Total Structure Volume [%]
                0                   0                       100
              0.1               0.074                       100
              0.2               0.148                       100
              0.3               0.222                       100
              0.4               0.296                       100
              0.5                0.37                       100
              0.6               0.444                       100
              0.7               0.518                       100
              0.8               0.592                       100

Uncertainty plan: U1 X:+3.00cm   (variation of plan: CT1)
Dose Cover.[%]: 100.0
Sampling Cover.[%]: 100.0

Relative dose [%]           Dose [Gy] Ratio of Total Structure Volume [%]
                0                   0                       100
              0.1               0.074                       100
              0.2               0.148                       100
              0.3               0.222                       100
              0.4               0.296                       100
              0.5                0.37                       100
              0.6               0.444                       100

And the code to get the lines is:

with open(data_file) as input_data:
        # Skips text before the beginning of the interesting block:
        for line in input_data:
            if line.strip() == 'Relative dose [%]           Dose [Gy] Ratio of Total Structure Volume [%]':  # Or whatever test is needed
                break
        # Reads text until the end of the block:
        for line in input_data:  # This keeps reading the file
            if line.strip() == 'Uncertainty plan: U1 X:+3.00cm   (variation of plan: CT1)':
                break
            text_line = np.fromstring(line, sep='\t')
            print text_line

So the text before the data it self is random, so I can't just say "skip the first 5 lines", but the headers are always the same, and it ends at the same as well (before the next data begins). So I just need a way to get out the raw data, put it into a numpy array, and then I can play with it from there.

Hopefully it makes more sense now.

It doesn't have quotes, that's for sure. What is the correct term then if it is not a string ? — Denver Dang, yesterday
The first code is how I get the output when I print the lines. So by using append my idea was to put every list into a numpy array. But as stated, each list only contains "one" item, which is actually what I want to split up into 3 values for each list, and in turn end up with the array structure as seen in the second snippet of code. — Denver Dang, yesterday

hpaulj · Accepted Answer · 2017-03-13 17:12:30Z

With the print text_line, you are seeing arrays formatted as strings. They are formatted individually, so columns don't line up.

[   0.2      0.148  100.   ]
[   0.3      0.222  100.   ]
[   0.4      0.296  100.   ]
[   0.5     0.37  100.  ]
[   0.6      0.444  100.   ]

Instead of printing you could collect the values in a list, and concatenate that at the end.

Without actually testing, I think this would work:

data = []
with open(data_file) as input_data:
        # Skips text before the beginning of the interesting block:
        for line in input_data:
            if line.strip() == 'Relative dose [%]           Dose [Gy] Ratio of Total Structure Volume [%]':  # Or whatever test is needed
                break
        # Reads text until the end of the block:
        for line in input_data:  # This keeps reading the file
            if line.strip() == 'Uncertainty plan: U1 X:+3.00cm   (variation of plan: CT1)':
                break
            arr_line = np.fromstring(line, sep='\t')
            # may need a test on len(arr_line) to weed out blank lines
            data.append(arr_line)
data = np.vstack(data)

Another option is to collect the lines without parsing, and pass them to np.genfromtxt. In other words use your code as a filter to feed the numpy function just the right lines. It takes input from anything that feeds it lines - a file, a list, a generator.

def filter(input_data):
    # Skips text before the beginning of the interesting block:
    for line in input_data:
        if line.strip() == 'Relative dose [%]           Dose [Gy] Ratio of Total Structure Volume [%]':  # Or whatever test is needed
            break
    # Reads text until the end of the block:
    for line in input_data:  # This keeps reading the file
        if line.strip() == 'Uncertainty plan: U1 X:+3.00cm   (variation of plan: CT1)':
            break
        yield line
with open(data_file) as f:
    data = np.genfromtxt(filter(f))  # delimiter?
print(data)

Szabolcs · Answer 2 · 2017-03-13 13:23:09Z

up vote 1 down vote

Given a text file called tmp.txt like this:

   0.2      0.148  100.   
   0.3      0.222  100.   
   0.4      0.296  100.   
   0.5     0.37  100.  
   0.6      0.444  100.

The snippet:

with open('tmp.txt', 'r') as in_file:
    print [map(float, line.split()) for line in in_file.readlines()]

Will output:

[[0.2, 0.148, 100.0], [0.3, 0.222, 100.0], [0.4, 0.296, 100.0], [0.5, 0.37, 100.0], [0.6, 0.444, 100.0]]

Which is your desired one hopefully.

answered yesterday

Szabolcs

2968

The problem (I think) is, that I parse through an entire .txt file with a lot of content that are not just the values as seen. So I'm not quite sure if that procedure will work? (I've updated my question so it might make more sense) – Denver Dang yesterday

add a comment |

Sangbok Lee · Answer 3 · 2017-03-13 15:48:30Z

up vote 1 down vote

1) Add before with open:

import re
d_input = []

2) replace

        text_line = np.fromstring(line, sep='\t')
        print text_line

to

        d_input.append([float(x) for x in re.sub('\s+', ',', line.strip()).split(',')])

3) Add at the end:

d_array = np.array(d_input)

edited yesterday

answered yesterday

Sangbok Lee

643116

add a comment |

asked	yesterday
viewed	52 times
active	yesterday

Turn lists of strings into Numpy array (Python)

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged python arrays numpy or ask your own question.

Hot Network Questions

Turn lists of strings into Numpy array (Python)

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python arrays numpy or ask your own question.

Related

Hot Network Questions