I have input data which I want to parse and extract values using awk/grep/sed:
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-3 0 1 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-2 28 29 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM1,toid=114,use=1,z=1
group-5 0 1 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM1,toid=114,use=1,z=1
group-2 29 30 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM2,toid=115,use=1,z=1
group-5 1 2 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM2,toid=115,use=1,z=1
group-2 10 11 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=DRD2,toid=158,use=1,z=1
group-3 1 2 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=DRD2,toid=158,use=1,z=1
group-2 11 12 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=EGF,toid=164,use=1,z=1
group-3 2 3 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=EGF,toid=164,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-3 12 13 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-2 0 1 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1A,toid=21,use=1,z=1
group-1 0 1 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1A,toid=21,use=1,z=1
group-2 1 2 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1B,toid=22,use=1,z=1
group-1 1 2 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1B,toid=22,use=1,z=1
group-2 2 3 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1D,toid=23,use=1,z=1
group-1 2 3 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1D,toid=23,use=1,z=1
Basically,I want to take the distinct values in "from=" and its "fromid" and "to=" and its "toid=" which can be seen below as to how the output should be:
The desired output.has to be the values in "from=" and "to=" joined row wise.Since from=ABCB11 is present many times but I want only it once,so as the value in "to=" has to be once in the output.
Whatever is present as fromid or toid ,I want all rows to have fromid,after taking distinct values from both. The format of the output can be interpreted from below output:
ABCB11 = fromid=4,from=ABCB11
ABCC8 = fromid=5,from=ABCC8
ACE = fromid=11,from=ACE
CHRM1 = fromid=114,from=CHRM1
CHRM2 = fromid=115,from=CHRM2
DRD2 = fromid=158,from=DRD2
EGF = fromid=164,from=EGF
ADRA1A = fromid=21,from=ADRA1A
ADRA1B = fromid=22,from=ADRA1B
ADRA1D = fromid=23,from=ADRA1D
I want to have exactly the same output as above,but I have a new input file,which is below:
ABCB11 4 ACE 11
ABCB11 4 CHRM1 114
ABCB11 4 CHRM2 115
ABCB11 4 DRD2 158
ABCB11 4 EGF 164
ABCC8 5 ACE 11
ABCC8 5 ADRA1A 21
ABCC8 5 ADRA1B 22
ABCC8 5 ADRA1D 23
ABCC8 5 CHRM1 114
Taking all the unique genes and creating the output.
to=
in output is different withto=
from input? – cuonglm Apr 23 '14 at 15:58