Efficient way to search array in text file by AWK

Question

I have one array SPLNO with approx 10k numbers.Now i want to search the subscriber number from MDN.TXT file (containing approx 1.5 lac record)from the array.if subscriber number found in array it will perform below operation.my issue is that it's taking more time because for one number it's search whole array of 10k records. therefore for 1.5 lac records it's looping around (1.5lac*10K). please suggest efficient ways.

Sample SPLNO.TXT:

918542054921|30|1|2
918542144944|12|1|2
918542155955|12|1|2
918542166966|12|1|2
918542255955|12|1|2
918542355955|12|1|2
918542455955|12|1|2
918542555955|12|1|2
918542955955|12|1|2

Sample MDN.TXT:

8542166966
8542355955
8542555955

awk -F"|"  'FNR==1 { ++counter}
counter==1 {SPLNOPULSE[$1]=$4;SPLNOAMT[$1]=$3;SPLNOMAXLEN[$1]=$2;next}
{
for ( mdn in SPLNOMAXLEN)
        {
         if ( ($1 ~ "^"mdn && length($1) <=SPLNOMAXLEN[mdn]) || ("91"$1 ~ "^"mdn && length("91"$1) <=SPLNOMAXLEN[mdn]) )
              {                              
                print found
               }
         else
                print not found
        }                             
 } ' SPLNO.TXT MDN.TXT

The usual approach to these questions is to post a sample input file and sample of expected output file. This can be within your question if short enough or as a link to a site such as pastebin if particularly big. — steve, Oct 19 '15 at 6:44
Hi steve, now sample data has been written in question itself. — user3548033, Oct 20 '15 at 9:43

steve · Answer 1 · 2015-10-20 16:24:02Z

up vote 0 down vote

Here's one approach, using perl.

#!/usr/bin/perl
# read the subscribers
open(A,"<","SPLNO.TXT");
while(<A>) {
 chomp;
 @a=split(/\|/,$_);
 $splnopulse{$a[0]}=$3;
 $splnoamt{$a[0]}=$2;
 $splnomaxlen{$a[0]}=$1;
}
close A;

# read the mdn, looking for matches
open(B,"<","MDN.TXT");
while(<B>) {
 chomp;
 @b=split(/\|/,$_);
 foreach $mdn (keys %splnomaxlen) {
  if($mdn eq $b[0] || "$mdn" eq "91" . $b[0]) {
   print "found\n";
  } else {
   print "not found\n";
  }
 }
}
close B;

answered Oct 20 '15 at 16:24

steve

3,615418

Thanks steve. could you please confirm how i can handle below condition in perl script which are written in awk. $1 ~ "^"mdn and "91"$1 ~ "^"mdn – user3548033 Oct 21 '15 at 8:16

add a comment |

BinaryZebra · Answer 2 · 2015-10-21 22:20:23Z

The algorithm of searching the whole file 2 for each line in file 1 has a time performance of m * n. Where m is the count of file 2 lines, and n is the count of file 1 lines. That becomes very slow rather quickly.
The solution is to first sort each file (that is a n*log(n) time) and then compare lines between the two files as this:

Make i=1 (file 1 line number) and j=1 (file 2 line number).
Compare a=(file 1)[line i] with b=(file 2)[line j].
if a<b; then increment i, return to 2 (check for end of file 1).
if a>b; then increment j, return to 2 (check for end of file 2).
if a=b; this is a match, print it, increment i.

That has an execution time of just: n + m (the time to read all lines).

The whole process, then, has an execution time of: n*log(n) + m*log(m) + n + m.
Which has a O(n) of: n * log(n) for n > m.

sorting is easy to do, just use the command sort for each file:

sort -t '|' -k 1 file01.csv > file01-sorted.csv

Then perform the procedure above in awk.

Edit: It just hit me that if all the 10k numbers of SPLNO are unique (no repeats). And MDN.TXT also has unique records. Then, concatenating both files and searching for repeated values will give you a solution as well. That works for simple equality. Regex matches will break this idea in most cases.

asked	5 months ago
viewed	100 times
active	5 months ago

current community

your communities

more stack exchange communities

Efficient way to search array in text file by AWK

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged text-processing awk array for or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Efficient way to search array in text file by AWK

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged text-processing awk array for or ask your own question.

Related

Hot Network Questions