I am new to Python. I have two data files in CSV format. I loaded the CSV files data into two NumPy arrays:
matrix1 = numpy.genfromtxt(fileName1)
matrix2 = numpy.genfromtxt(fileName2)
The rows and cols of both the matrices are unequal.
>>print(matrix1.shape)
(971, 4413)
>>print(matrix2.shape)
>>(5504, 4431)
I want to combine matrix1 and matrix2 in such a way:
mergedMatrix = [ matrix1, matrix2 ]
where I can access matrix1
from mergedMatrix
using index 0
and matrix2
using index 1
.
I tried to use numpy.concatenate
but it does not works on these two matrices. So I tried using pandas merge function after converting matrix1
and matrix2
into pandas DataFrames. However, it took a lot of time to do so and all the matrices were merged into a single linear array like [1, 2, 3,4,5...]
and I didn't had any way to distinguish between matrix1
and matrix2
in mergedMatrix
.
So I am using:
#mergedMatrix as a list
mergedMatrix = [matrix1, matrix2]
My data contains values like Inf
. If a column contains value Inf
in matrix1
the I want to delete that column as well as the corresponding column i.e. the column with the same column number in matrix2
.
Questions
- Is there a better way than to use a list
mergedMatrix
? - How can find if a
matrix1
column contains such values quickly without checking each element one by one and its column number?
Example:
matrix1 = [[1, 2, 3],
[3, inf,0],
[2 , inf, inf]]
matrix2 = [[0, 4, 2, 7],
[0, 1, 0.5, 3],
[1, 2, 3, 9]]
mergedMatrix = [[1, 2, 3],
[3, inf,0],
[2 , inf, inf],
[0, 4, 2, 7],
[0, 1, 0.5, 3],
[1, 2, 3, 9]]
The result should be:
mergedMatrix = [[1],
[3],
[2],
[0,7],
[0,3],
[1,9]]
removedMatrixCols = [[2, 3],
[inf,0],
[inf, inf],
[4, 2],
[1, 0.5],
[2, 3]]
Then I want to split the matrices:
newMatrix1 = [[1],
[3],
[2]]
newMatrix2 = [[0,7],
[0,3],
[1,9]]
removedCols1 = [[2, 3],
[inf,0],
[inf, inf]]
removedCols2 = [[4, 2],
[1, 0.5],
[2, 3]]
so that I can store them into CSV files separately.
np.random.rand()
). You could store your arrays in a list and access them bylist[0]
andlist[0]
– Moritz Jul 5 '15 at 12:37numpy.dstack([matrix1, matrix2])
and have a neat 3D matrix. – Evert Jul 5 '15 at 12:39