Array Merge without Duplicates

Question

I recently saw this Javascript code on StackOverflow for merging two arrays, and removing duplicates:

Array.prototype.unique = function() {
    var a = this.concat();
    for(var i=0; i<a.length; ++i) {
        for(var j=i+1; j<a.length; ++j) {
            if(a[i] === a[j])
                a.splice(j--, 1);
        }
    }
    return a;
};

var array1 = ["Vijendra","Singh"];
var array2 = ["Singh", "Shakya"];
var array3 = array1.concat(array2).unique();

While this code works, it is horribly inefficient (O(n^2)). Your challenge is to make an algorithm with less complexity.

The winning criteria is the solution with the least complexity, but ties will be broken by shortest length in characters.

Requirements:

Package all your code together in a function that meets the following requirements for "correctness:"

Input: Two arrays
Output: One array
Merges elements of both arrays together- Any element in either input array must be in the outputted array.
The outputted array should have no duplicates.
Order doesn't matter (unlike the original)
Any language counts
Don't use the standard library's array functions for detecting uniqueness or merging sets/arrays (although other things from the standard library is okay). Let me make the distinction that array concatenation is fine, but functions that already do all of the above are not.

How are we supposed to create or append to an array without using array functions? — Emil Vikström, Jan 2 at 1:11
@EmilVikström See my edit. I meant that you can't use array uniqueness functions. Sorry for being unclear. — troglodite, Jan 2 at 1:12
If one of the arrays has duplicates in it, do we remove them as well? For example, should merging [1, 2, 2, 3] and [2, 3, 4] return [1, 2, 2, 3, 4] or [1, 2, 3, 4]? — O-I, Jan 2 at 1:18
May I ask: Arrays of what? Can we assume simply integers or strings, or do we also have to allow more complex things like multilevel objects? — jawns317, Jan 2 at 1:30

Zach Leighton · Accepted Answer · 2014-01-02 12:22:45Z

up vote 3 down vote accepted

Perl

27 Characters

Simple Perl Hack

my @vals = ();
push @vals, @arr1, @arr2;
my %out;
map { $out{$_}++ } @vals;
my @unique = keys %out;

I'm sure someone could one-liner this.. and thus (Thanks Dom Hastings)

sub x{$_{$_}++for@_;keys%_}

edited Jan 2 at 12:22

answered Jan 2 at 4:59

Zach Leighton
1443

1

"Don't use the standard library's array functions for detecting uniqueness (although other things form the standard library is okay)" – Jan Dvorak Jan 2 at 7:17

How I am violating that rule? I'm not using unique functions? – Zach Leighton Jan 2 at 11:51

How does it work, then? Sorry, I can't read perl. If it reads the keys of a hash map - does that count as OK with that rule? I won't vote until convinced that it is. – Jan Dvorak Jan 2 at 11:54

It combines the arrays, loops over both and adds to a hash incrementing the value who's key is the current value in the array loop. Then it takes the keys of that hash, I have used this in some of my work.. So [1,1,2,3,4,4] becomes { 1 => 2, 2 => 1, 3 => 1, 4 => 2 } – Zach Leighton Jan 2 at 11:59

@ZachLeighton you can shorten the code to 27 chars with sub x{$_{$_}++for@_;keys%_} (in case it comes down to a tie!) and use as: z((1,2,3,4),(2,3,4,5,6)) – Dom Hastings Jan 2 at 12:13

show 3 more comments

George Reith · Answer 2 · 2014-01-03 16:05:25Z

up vote 7 down vote

JavaScript O(N) 131 124 116 92 (86?)

Golfed version:

function m(i,x){h={};n=[];for(a=2;a--;i=x)i.map(function(b){h[b]=h[b]||n.push(b)});return n}

Human readable golfed version:

function m(i,x) {
   h = {}
   n = []
   for (a = 2; a--; i=x)
      i.map(function(b){
        h[b] = h[b] || n.push(b)
      })
   return n
}

I could use concat like so and do it in 86 characters:

function m(i,x){h={};n=[];i.concat(x).map(function(b){h[b]=h[b]||n.push(b)});return n}

But I am not sure if it is still O(N) based on this JsPerf: http://jsperf.com/unique-array-merging-concat-vs-looping as the concat version is marginally faster with smaller arrays but slower with larger arrays (Chrome 31 OSX).

In practice do this (golf is full of bad practices):

function merge(a1, a2) {
   var hash = {};
   var arr = [];
   for (var i = 0; i < a1.length; i++) {
      if (hash[a1[i]] !== true) {
        hash[a1[i]] = true;
        arr[arr.length] = a1[i];
      }
   }
   for (var i = 0; i < a2.length; i++) {
      if (hash[a2[i]] !== true) {
        hash[a2[i]] = true;
        arr[arr.length] = a2[i];
      }
   }
   return arr;
}
console.log(merge([1,2,3,4,5],[1,2,3,4,5,6]));

I'm not great at computing complexity but I believe this is O(N). Would love if someone could clarify.

Edit: Here is a version that takes any number of arrays and merges them.

function merge() {
   var args = arguments;
   var hash = {};
   var arr = [];
   for (var i = 0; i < args.length; i++) {
      for (var j = 0; j < args[i].length; j++) {
        if (hash[args[i][j]] !== true) {
          arr[arr.length] = args[i][j];
          hash[args[i][j]] = true;
        }
      }
    }
   return arr;
}
console.log(merge([1,2,3,4,5],[1,2,3,4,5,6],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7,8]));

edited Jan 3 at 16:05

answered Jan 2 at 1:12

George Reith
4795

This is almost exactly what I was gonna post in a couple of seconds :-( Yes, it is amortized linear time if hash tables are implemented with amortized constant time for insertion and searching (which is common in many languages, don't know specifically about JS). – Emil Vikström Jan 2 at 1:14

@EmilVikström Thanks for that I believe JavaScript does but don't have proof on it. Apologies for having fast fingers, slowed yourself down with comments :P – George Reith Jan 2 at 1:17

This is a great approach. However, could you also provide a "code-golf" style solution in addition to your nicely formatted version? Seeing that multiple people have thought of this as the right approach, there's probably going to be a tie at O(N). – troglodite Jan 2 at 1:21

@cloudcoder2000 Ok, I wanted to print a full version as the code-golf version is likely to be less efficient in practice. – George Reith Jan 2 at 1:24

1

@cloudcoder2000 They are not fully independent so the worst case is not O(A*B) (Not using N because it's confusing). It would be that if every input array (every A) had the same amount of elements (B) as it is actually O(SUM(B) FOR ALL A), which can be rewritten as O(N) when defining N as the count of elements of all array inputs. – meiamsome Jan 2 at 4:17

show 18 more comments

Keith Randall · Answer 3 · 2014-01-02 06:14:08Z

up vote 4 down vote

Python 2.7, 38 chars

F=lambda x,y:{c:1 for c in x+y}.keys()

Should be O(N) assuming a good hash function.

Wasi's 8 character set implementation is better, if you don't think it violates the rules.

answered Jan 2 at 6:14

Keith Randall
12k1656

Nice! Comprehensions in Python can be so elegant and powerful. – O-I Jan 2 at 17:23

add comment

tomdemuyt · Answer 4 · 2014-01-02 02:35:49Z

Array.prototype.unique = function()
{
  var o = {},i = this.length
  while(i--)o[this[i]]=true
  return Object.keys(o)
}

A function that would take n arrays could be the following:

function m()
{
  var o={},a=arguments,c=a.length,i;
  while(c--){i=a[c].length;while(i--)o[a[c][i]] = true} 
  return Object.keys(o);
}

Golfed, I think this should work ( 117 chars )

function m(){var o={},a=arguments,c=a.length,i;while(c--){i=a[c].length;while(i--)o[a[c][i]]=1}return Object.keys(o)}

Update If you want to keep the original type, you could

function m()
{
  var o={},a=arguments,c=a.length,f=[],g=[];
  while(c--)g.concat(a[c])
  c = g.length      
  while(c--){if(!o[g[c]]){o[g[c]]=1;f.push(g[c])}}
  return f
}

or golfed 149:

function m(){var o={},a=arguments,c=a.length,f=[],g=[];while(c--)g.concat(a[c]);c= g.length;while(c--){if(!o[g[c]]){o[g[c]]=1;f.push(g[c])}}return f}

This still can cast some doubts, if you want to distinguish 123 and '123', then this would not work..

Thanks for the answer. It is impressively short, however this only does half the problem. You also need to include in the solution the actual merging part (even if its the same as in the original example) and put it all together in one function. Also, could you provide "golfed" version in addition to this (as it is O(N))? — troglodite, Jan 2 at 2:16
This casts all members into strings. e.g. m([1,2,3,4,5],[2,3,4,5,6],[2,3,4,5,6,7]) becomes ["1", "2", "3", "4", "5", "6", "7"] — George Reith, Jan 2 at 2:22

Wasi · Answer 5 · 2014-01-02 05:18:02Z

up vote 2 down vote

python,46

def A(a,b):print[i for i in b if i not in a]+a

Or, using set operation simply

python, 8

set(a+b)

edited Jan 2 at 5:18

answered Jan 2 at 5:02

Wasi
1,0141420

Sorry it wasn't clear, using set operations is also cheating. – troglodite Jan 2 at 19:09

add comment

xfix · Answer 6 · 2014-01-02 14:34:52Z

Perl

23 bytes, if we only count the code block inside subroutine. Could be 21, if overwriting global values is allowed (it would remove my from the code). It returns elements in random order, because order doesn't matter. As for complexity, on average it's O(N) (depends on number of hash collisions, but they are rather rare - in worst case it can be O(N²) (but this shouldn't happen, because Perl can detect pathological hashes, and changes the hash function seed when it detects such behavior)).

use 5.010;
sub unique{
    my%a=map{$_,1}@_;keys%a
}
my @a1 = (1, 2, 3, 4);
my @a2 = (3, 4, 5, 6);
say join " ", unique @a1, @a2;

Output (also showing randomness):

/tmp $ perl unique.pl 
2 3 4 6 1 5
/tmp $ perl unique.pl 
5 4 6 2 1 3

Kyle Kanos · Answer 7 · 2014-01-02 15:50:07Z

Fortran: 282 252 233 213

Golfed version:

function f(a,b,m,n) result(d);integer::m,n,a(m),b(n),c(m+n);integer,allocatable::d(:);j=m+1;c(1:m)=a(1:m);do i=1,n;if(.not.any(b(i)==c(1:m)))then;c(j)=b(i);j=j+1;endif;enddo;allocate(d(j-1));d=c(1:j-1);endfunction

Which not only looks infinitely better but will actually compile (too long a line in its golfed form) with the human-readable form:

function f(a,b,m,n) result(d)
  integer::m,n,a(m),b(n),c(m+n)
  integer,allocatable::d(:)
  j=m+1;c(1:m)=a(1:m)
  do i=1,n
     if(.not.any(b(i)==c(1:m)))then
        c(j)=b(i);j=j+1
     endif
  enddo
  allocate(d(j-1))
  d=c(1:j-1)
end function

This should be O(n) as I copy a into c and then check each b against all of c. The last step is to eliminate the garbage that c will contain since it is uninitialized.

O-I · Answer 8 · 2014-01-02 18:47:47Z

One way in Ruby

To keep within the rules outlined above, I would use a similar strategy as the JavaScript solution and use a hash as an intermediary.

merged_arr = {}.tap { |hash| (arr1 + arr2).each { |el| hash[el] ||= el } }.keys

Essentially, these are the steps I'm going through in the line above.

Define a variable merged_arr that will contain the result
Initialize an empty, unnamed hash as an intermediary to put unique elements in
Use Object#tap to populate the hash (referenced as hash in the tap block) and return it for subsequent method chaining
Concatenate arr1 and arr2 into a single, unprocessed array
For each element el in the concatenated array, put the value el in hash[el] if no value of hash[el] currently exists. The memoization here (hash[el] ||= el) is what ensures the uniqueness of elements.
Fetch the keys (or values, as they are the same) for the now populated hash

This should run in O(n) time. Please let me know if I've made any inaccurate statements or if I can improve the above answer either for efficiency or readability.

Possible improvements

Using memoization is probably unnecessary given that the keys to the hash are going to be unique and the values are irrelevant, so this is sufficient:

merged_arr = {}.tap { |hash| (arr1 + arr2).each { |el| hash[el] = 1 } }.keys

I really love Object#tap, but we can accomplish the same result using Enumerable#reduce:

merged_arr = (arr1 + arr2).reduce({}) { |arr, val| arr[val] = 1; arr }.keys

You could even use Enumberable#map:

merged_arr = Hash[(arr1 + arr2).map { |val| [val, 1] }].keys

How I would do it in practice

Having said all that, if I were asked to merge two arrays arr1 and arr2 such that the result merged_arr has unique elements and could use any Ruby method at my disposal, I would simply use the set union operator which is intended for solving this exact problem:

merged_arr = arr1 | arr2

A quick peek at the source of Array#|, though, seems to confirm that using a hash as an intermediary seems to be the acceptable solution to performing a unique merge between 2 arrays.

"Don't use the standard library's array functions for detecting uniqueness (although other things form the standard library is okay)" — Jan Dvorak, Jan 2 at 11:55
How am I violating that rule in the second example? Memoization is being performed on a hash. Is that not allowed either? — O-I, Jan 2 at 13:54

Murta · Answer 9 · 2014-01-02 19:59:14Z

up vote 2 down vote

Mathematica 10 Chars

Union[a,b]

Example:

a={1,2,3,4,5};
b={1,2,3,4,5,6};
Union[a,b]

{1, 2, 3, 4, 5, 6}

Mathematica2 43 Chars

Sort@Join[a, b] //. {a___, b_, b_, c___} :> {a, b, c}

edited Jan 2 at 19:59

answered Jan 2 at 3:21

Murta
2514

4

I think this would go in the category of using standard library array methods. – troglodite Jan 2 at 4:02

Hi @cloudcoder2000. No need to call some specific library to use Union in Mathematica. – Murta Jan 2 at 10:44

3

In my opinion, using a builtin function to do exactly what the question is asking to do is cheating. – xfix Jan 2 at 14:37

ok ok .. the second code do not use Union. – Murta Jan 2 at 16:57

1

I guess Tally[Join[a, b]][[;; , 1]] would also be cheating ;-) BTW you could save chars by using single-letter variables. – Yves Klett Jan 2 at 19:22

show 1 more comment

zamnuts · Answer 10 · 2014-01-02 08:23:55Z

PHP, ~~69/42~~ 68/41 chars

Including the function declaration is 68 characters:

function m($a,$b){return array_keys(array_flip($a)+array_flip($b));}

Not including the function declaration is 41 characters:

array_keys(array_flip($a)+array_flip($b))

Bertrand · Answer 11 · 2014-01-03 16:03:08Z

up vote 1 down vote

Javascript 86

Golfed version:

function m(a,b){var h={};return a.concat(b).filter(function(v){return h[v]?0:h[v]=1})}

Readable version:

function merge(a, b) {
  var hash = {};
  return a.concat(b).filter(function (val) {
    return hash[val] ? 0 : hash[val] = 1;
  });
}

edited Jan 3 at 16:03

answered Jan 2 at 19:45

Bertrand
1113

1

This ignores falsey values... m([1,0,0,0,0],[0,1,0]) returns [1]. – George Reith Jan 3 at 12:30

1

Change h[v]=v to h[v]=1. – George Reith Jan 3 at 12:36

Well spotted @GeorgeReith! We went from 86 to 84 :) – Bertrand Jan 3 at 12:42

It's still 86, I think you got confused because you removed 2 characters from the readable version not the golfed one. – George Reith Jan 3 at 15:37

add comment

Florent · Answer 12 · 2014-01-03 17:11:19Z

up vote 0 down vote

JavaScript 60

I'm using ES6 generator.
The following is testable using Google's Traceur REPL.

m=(i,j)=>{h={};return[for(x of i.concat(j))if(!h[x])h[x]=x]}

answered Jan 3 at 17:11

Florent
1,754211

add comment

asked	6 months ago
viewed	1083 times
active	6 months ago

current community

your communities

more stack exchange communities

Array Merge without Duplicates

12 Answers

Perl

27 Characters

JavaScript O(N) 131 124 116 92 (86?)

Python 2.7, 38 chars

python,46

python, 8

Perl

Fortran: 282 252 233 213

One way in Ruby

Possible improvements

How I would do it in practice

Mathematica 10 Chars

Mathematica2 43 Chars

JavaScript 60

Your Answer

Not the answer you're looking for? Browse other questions tagged code-challenge algorithm or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Array Merge without Duplicates

12 Answers

Perl

27 Characters

JavaScript O(N) 131 124 116 92 (86?)

Python 2.7, 38 chars

python,46

python, 8

Perl

Fortran: 282 252 233 213

One way in Ruby

Possible improvements

How I would do it in practice

Mathematica 10 Chars

Mathematica2 43 Chars

JavaScript 60

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged code-challenge algorithm or ask your own question.

Related

Hot Network Questions