Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I have one array SPLNO with approx 10k numbers.Now i want to search the subscriber number from MDN.TXT file (containing approx 1.5 lac record)from the array.if subscriber number found in array it will perform below operation.my issue is that it's taking more time because for one number it's search whole array of 10k records. therefore for 1.5 lac records it's looping around (1.5lac*10K). please suggest efficient ways.

Sample SPLNO.TXT:

918542054921|30|1|2
918542144944|12|1|2
918542155955|12|1|2
918542166966|12|1|2
918542255955|12|1|2
918542355955|12|1|2
918542455955|12|1|2
918542555955|12|1|2
918542955955|12|1|2

Sample MDN.TXT:

8542166966
8542355955
8542555955

awk -F"|"  'FNR==1 { ++counter}
counter==1 {SPLNOPULSE[$1]=$4;SPLNOAMT[$1]=$3;SPLNOMAXLEN[$1]=$2;next}
{
for ( mdn in SPLNOMAXLEN)
        {
         if ( ($1 ~ "^"mdn && length($1) <=SPLNOMAXLEN[mdn]) || ("91"$1 ~ "^"mdn && length("91"$1) <=SPLNOMAXLEN[mdn]) )
              {                              
                print found
               }
         else
                print not found
        }                             
 } ' SPLNO.TXT MDN.TXT
share|improve this question
2  
The usual approach to these questions is to post a sample input file and sample of expected output file. This can be within your question if short enough or as a link to a site such as pastebin if particularly big. – steve Oct 19 '15 at 6:44
    
Hi steve, now sample data has been written in question itself. – user3548033 Oct 20 '15 at 9:43

Here's one approach, using perl.

#!/usr/bin/perl
# read the subscribers
open(A,"<","SPLNO.TXT");
while(<A>) {
 chomp;
 @a=split(/\|/,$_);
 $splnopulse{$a[0]}=$3;
 $splnoamt{$a[0]}=$2;
 $splnomaxlen{$a[0]}=$1;
}
close A;

# read the mdn, looking for matches
open(B,"<","MDN.TXT");
while(<B>) {
 chomp;
 @b=split(/\|/,$_);
 foreach $mdn (keys %splnomaxlen) {
  if($mdn eq $b[0] || "$mdn" eq "91" . $b[0]) {
   print "found\n";
  } else {
   print "not found\n";
  }
 }
}
close B;
share|improve this answer
    
Thanks steve. could you please confirm how i can handle below condition in perl script which are written in awk. $1 ~ "^"mdn and "91"$1 ~ "^"mdn – user3548033 Oct 21 '15 at 8:16

The algorithm of searching the whole file 2 for each line in file 1 has a time performance of m * n. Where m is the count of file 2 lines, and n is the count of file 1 lines. That becomes very slow rather quickly.
The solution is to first sort each file (that is a n*log(n) time) and then compare lines between the two files as this:

  1. Make i=1 (file 1 line number) and j=1 (file 2 line number).
  2. Compare a=(file 1)[line i] with b=(file 2)[line j].
  3. if a<b; then increment i, return to 2 (check for end of file 1).
  4. if a>b; then increment j, return to 2 (check for end of file 2).
  5. if a=b; this is a match, print it, increment i.

That has an execution time of just: n + m (the time to read all lines).

The whole process, then, has an execution time of: n*log(n) + m*log(m) + n + m.
Which has a O(n) of: n * log(n) for n > m.

sorting is easy to do, just use the command sort for each file:

sort -t '|' -k 1 file01.csv > file01-sorted.csv

Then perform the procedure above in awk.

Edit: It just hit me that if all the 10k numbers of SPLNO are unique (no repeats). And MDN.TXT also has unique records. Then, concatenating both files and searching for repeated values will give you a solution as well. That works for simple equality. Regex matches will break this idea in most cases.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.