Elementary Algorithms :: Array Difference
UPDATED
In an interview I was posed a problem: given 2 arrays a and b write a program that will print the elements of a that are not in b.
The easiest (incorrect solution) in PHP:
1 2 3 4 5 6 7 8 9 10 11 12 13 | #!/usr/bin/php <? $a = array(1, 2, 3, 4, 9, 20, 30, 82, 57); $b = array(1, 2, 3, 82, 57, 100, 32); $c = array(); foreach($a as $aval) { if(!in_array($aval, $b)) $c[] = $aval; } print_r($c); ?> |
Now as I had stated in the interview this is NOT the best way to accomplish this. Of course that would all depend on how in_array is implemented in PHP. If in_array is simply a short cut to a true false foreach loop then we have a problem. If n and m represent the number of elements of a and b respectively iterations = n * m. Or O(mn). Not efficient. Since this is PHP I am guessing this is the case (a simple sequential search).
So what is a better solution (best?) to allow for better performance as n and m increase? This is the best I could come up with after a quick look at a couple of algorithms.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | #!/usr/bin/php <? $a = array(3, 1, 2, 4, 9, 20, 30, 82, 57); $b = array(1, 2, 3, 82, 57, 100, 32); //binary search function function binSearch($ar, $n, $value) { $l = 0; $r = $n; while($r >= $l) { $mid = ($l + $r)/2; if($ar[$mid] == $value) { return true; } else if ($ar[$mid]< $value) { $l = $mid + 1; } else { $r = $mid - 1; } } return false; } //sort the first array used in the bin search sort($b); $n = count($b); $m = count($a); $c = array(); for($i=0;$i<$m;$i++) { if(!binSearch($b, $n, $a[$i])) { $c[] = $a[$i]; } } print_r($c); ?> |
Removing the negation operator prior to the function call will change the program to array intersection vs array diff. The math for this binary search solution is O(m log
So consider the following:
m = 8
n = 8
Solution 1:
(mn) = 64
Solution 2:
16*log2(8) = 48
Basically due to the first array being sorted and chunking we are able to do fewer comparisons instead of a comparison for each element. As the size of the chunk decreases the worst case is that the value doesn’t exist.
It also occurs to me that you will want to count each array and sort the shortest and use that as the first argument and the longer array[val] as the second to further reduce the number of iterations. Of course the order of the arguments matters for this problem (in a not in b). I suppose you could flip flop logic/operator in/on the binary search call/function though to account for that as well.
Anyway this is basic CS, everyone probably knows this but I couldn’t come up with it off the top of my head, and the problem was driving me nuts cycling in the back of my mind so I had to write it out.
UPDATE
I still kept thinking about this so here is another based on array merge and comparing next array elements. Since this is a diff and not an intersection we need to merge the first array twice. Any value that shows up twice in the merged array is in a and not in b.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | #!/usr/bin/php <? $a = array(3, 1, 2, 4, 9, 20, 30, 82, 57); $b = array(1, 2, 3, 82, 57, 100, 32); $c = array_merge($a, $b); $c = array_merge($a, $c); $d = array(); sort($c); $i = 0; while($i <= count($c)) { if($c[$i+1] == $c[$i] && $c[$i+2] == $c[$i]) { $i+=3; } else if ($c[$i+1] == $c[$i]) { $d[] = $c[$i]; $i+=2; } else { $i++; } } print_r($d); ?> |
The math ends up being something like ~O(m + m + n).