Determine if a string has all unique characters

Question

I'm practicing algorithms (for future interviews) while learning about Big O notation and was wondering if there's a faster or more optimal way of writing an algorithm to see if a string has all unique characters (assuming you can't use additional data structures).

- (BOOL)containsUniqueCharacters:(NSString *)string {

    //Make sure that it's case insensitve
    string = [string uppercaseString];

    //Create a buffer for our string
    NSUInteger length = [string length];
    unichar buffer[string.length + 1];
    [string getCharacters:buffer range:NSMakeRange(0, length)];

    //Iterate through each of our characters
    for (int i = 0; i < string.length; i++) {

        //Iterate through every other character and compare it with our current character
        for (int n = 0; n < length; n++) {

            //Compare current character with every other character
            if (buffer[i] == buffer[n] && i != n) {
                return NO;
            }
        }
    }

    return YES;
}

If I'm correct, is the time complexity for this \$O(n^2)\$, and the space complexity \$O(1)\$?

Steven Lu · Answer 1 · 2014-09-22 23:30:06Z

up vote 5 down vote

I would suggest an algorithmically better O(n) solution by using a buffer or hashtable or other such structure to record whether or not you have "seen" a character.

For example if you are dealing with a basic string containing char, then you can get away with a bitset of size 256. Each character you see marks a bit. If the bit was already marked, return YES.

Furthermore, if the size of the string you're testing is longer than 256, then that is an automatic YES as well.

Since you are dealing with unicode for which the possible values a character can take are very many, then you will want to use a hash table. It will be dramatically faster than even any O(n log n) sorting method.

answered 7 hours ago

Steven Lu
1513

+1 for the suggestion of the automatic longer-than-256 string, but the assumption for this program was "(assuming you can't use additional data structures)." which makes the hash table out-of-scope. Without that restriction it would be a great suggestion. – rolfl♦ 6 hours ago

That's an absurd restriction. Well, a bit set that is as large as the set of unicode codepoints is impractical as it would likely be much larger than even the entire string being processed. But technically speaking, it is still a constant factor. Which means O(1) space. I'm all for big-O notation, don't get me wrong... – Steven Lu 4 hours ago

add a comment |

rolfl · Answer 2 · 2014-09-22 22:46:14Z

You are right on the time complexity, but wrong on the space complexity.

because you copy the data from the String to the buffer, you incur an \$O(n)\$ space complexity too.

vnp has suggested an alternate algorithm that scales at better than \$O(n^2)\$, but you can also significantly improve your current algorithm. Your current system compares each character with every other character twice.... Consider the word "and", you will compare a with n and d, then compare n with a and d, but you have already done the n and a comparison.

What you need to do is alter your inner loop to restrict the range of the comparison. Consider the following:

//Iterate through each of our characters
// **Note, start from 1
for (int i = 1; i < string.length; i++) {

    unichar letter = [string characterAtIndex:i];

    //Iterate through all previous other characters and compare it with our current character
    for (int n = 0; n < i; n++) {

        //Compare current character with every other character
        if (letter == [string characterAtIndex:n]) {
            return NO;
        }
    }
}

The above code still checks each letter, but only does it once. It is smart about the indices, so you only compare previous letters with the current, and it does half the comparisons. Also, by using a starting offset of 1, and a < comparison on the n loop, you never compare the character against itself.

For smallish strings (less than 100 characters or so), this \$O(n^2)\$ algorithm, using no additional space (other than the letter variable), will be very fast still.

Wow this is awesome. Thank you. I love the fact that you explain how using the < comparison prevents the loop from comparing the character against itself, that's something I had to overcome originally using a second condition in my if statement. Your entire answer is extremely helpful, thank you. — KingPolygon, 8 hours ago
Plural of index is indices. Sorry it just gives me shivers. — vnp, 7 hours ago
@vnp - absolutely right, except when talking about database indexes, it is common to use that plural... and I come from a DB background, and it shows... ;p — rolfl♦, 7 hours ago

vnp · Answer 3 · 2014-09-22 22:21:25Z

up vote 1 down vote

Your analysis is correct. Sorting an original string, followed by a linear scan to fund duplicates, reduces the time complexity to \$O(n \log n))\$, for a price of \$O(\log n)\$ space. If the string modification is not allowed, an additional \$O(n)\$ space penalty is incurred.

answered 8 hours ago

vnp
5,609325

1

Space is still O(n) – Steven Lu 7 hours ago

add a comment |

asked	today
viewed	142 times
active	today

current community

your communities

more stack exchange communities

Determine if a string has all unique characters

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged algorithm strings objective-c ios or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Determine if a string has all unique characters

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged algorithm strings objective-c ios or ask your own question.

Related

Hot Network Questions