Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have the following array:

$masterlist=[$companies][$fieldsofcompany][0][$number]

The third dimension only exists if the field selected from $fieldsofcompany = position 2 which contains the numbers array. Other positions contain regular variables. The 3rd dimension is always 0 (the numbers array) or Null. Position 4 contains numbers.

I want to cycle through all companies and remove from the $masterlist all companies which contain duplicate numbers.

My current implementation is this code:

for($i=0;$i<count($masterlist);$i++)
    {   
        if($masterlist[$i][2][0][0] != null)

        $id = $masterlist[$i][0];

        for($j=0;$j<count($masterlist[$i][2][0]);$j++)
        {
            $number = $masterlist[$i][2][0][$j];

            $query = "INSERT INTO numbers VALUES('$id','$number')";
            mysql_query($query);
        }
    }

Which inserts numbers and associated IDs into a table. I then select unique numbers like so:

SELECT ID,number
FROM numbers
GROUP BY number
HAVING (COUNT(number)=1)

This strikes me as incredibly brain-dead. My question is what is the best way to do this? I'm not looking for code per se, but approaches to the problem. For those of you who have read this far, thank you.

share|improve this question

2 Answers 2

up vote 2 down vote accepted

For starters, you should prune the data before sticking it into the database.

Keep a look up table that keeps track of the 'number'.

If the number is not in the look up table then use it and mark it, otherwise if its in the look up table you can ignore it.

Using an array for the look up table and with keys being the 'number' you can use the isset function to test if the number has appeared before or not.

Example pseudo code:

if(!isset($lookupTable[$number])){
    $lookupTable[$number]=1;
    //...Insert into database...
}
share|improve this answer
    
This is probably what I am looking for, but I'll wait to see if someone else comes up with something that doesn't require an additional array write. Thanks. –  Edgar Velasquez Lim Jul 20 '11 at 6:41
    
@Edgar Velasquez Lim Well, if you have less than 1,000,000 unique numbers then you should be alright with this technique. And if you don't run this code frequently then it don't really matter at all. Using arrays and look ups with the key is very cheap in terms of resource usage. –  zaf Jul 20 '11 at 6:46
    
Agree, my interest is more academic than anything at this point. :) –  Edgar Velasquez Lim Jul 20 '11 at 6:49
    
@Edgar Velasquez Lim Academically this is the best technique ;) –  zaf Jul 20 '11 at 6:52
    
Converting numbers to strings for array keys to make hashing a little more expensive without any good reason is certainly not the best technique! :P (You could just use $lookupTable[$number] instead of $lookupTable["$number"]) –  Ferdinand Beyer Jul 20 '11 at 7:49

Now that I think I understand what you really want, you might want to stick with your two-pass approach but skip the MySQL detour.

In the first pass, gather numbers and duplicate companies:

$duplicate_companies = array();
$number_map = array();

foreach ($masterlist as $index => $company)
{
    if ($company[2][0][0] === null)
        continue;

    foreach ($company[2][0] as $number)
    {
        if (!isset($number_map[$number])
        {
            // We have not seen this number before, associate it
            // with the first company index.
            $number_map[$number] = $index;
        }
        else
        {
            // Both the current company and the one with the index stored
            // in $number_map[$number] are duplicates.
            $duplicate_companies[] = $index;
            $duplicate_companies[] = $number_map[$number];
        }
    }
}

In the second pass, remove the duplicates we have found from the master list:

foreach (array_unique($duplicate_companies) as $index)
{
    unset($masterlist[$index]);
}
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.