I'm working on a TopCoder problem which seeks to find the letters preventing an input sentence from being a pangram in all caps, alphabetical order.

MissingLetters.h

#ifndef _MISSING_LETTERS_H_
#define _MISSING_LETTERS_H_

#include <string>
#include <set>
#include <utility>

class MissingLetters {
  public:
    MissingLetters() {}
    std::string getMissingLetters(std::string sentence);    
};

// Find the difference between two sets, set(lowercase(sentence)) & set([a-z])

std::string MissingLetters::getMissingLetters(std::string sentence) {
   std::set<char> sentence_set;
   std::string missing_letters = "";


   // Put each letter in the sentence into a set, O(n log n)
   for(size_t i = 0; i < sentence.size(); ++i) {
      sentence_set.insert(toupper(sentence[i]));      
   }

   // For each letter [A-Z], add letters that aren't in the set, O(1)
   for(char letter='A'; letter <= 'Z'; ++letter) {

      if(sentence_set.find(letter) == sentence_set.end()) {
         missing_letters+=letter;      
      }
   }

   return missing_letters;
}

#endif // _MISSING_LETTERS_H_

I'd like this to be as fast as possible. My method, getMissingLetters is currently \$O(n \log{n})\$ running-time, where \$n\$ is the length of the the sentence.

Any other implementations, including different data structures, would be great if they're faster.

Any tips?

share|improve this question
1  
note that the set inserts and removes in log n, so its actually O(n log n). If the sentences are large, you can stop inserting if the length of the set is 26, as you already have all letters – juvian Sep 17 '15 at 23:16
    
@juvian Edited. Thanks for the tip. – erip Sep 17 '15 at 23:17
up vote 2 down vote accepted

Here is the implementation I would have done for this problem. It uses a boolean array instead of a map to access in O(1) instead of O(log n):

#include <iostream>

using namespace std;

string missingLetters(string s){
    string ms = "";

    int dif = 65; // int('A') == 65
    bool arr[26] = {0}; // we haven't found any letters yet
    int count=0; // 0 distinc letters found


    for(int i=0;i<s.length();i++){
        int ascii = toupper(s[i]);
            if(ascii >= 65 && ascii <= 90){ // if the upperCase is a letter between A and Z
                if(arr[ascii - dif] == 0){ // if we haven't found it yet
                    arr[ascii - dif] = 1; // we found it
                    count++; // add 1 to unique letters found
                    if(count == 26) { // we found all, no need to keep looking
                        cout << "yay stopped processing at pos " << i << "/" <<s.length()<<endl;
                        break;
                    }
                }
            }
    }

    for(int i=0;i<26;i++){//for each letter
        if(arr[i] == 0){//if its missing
            ms+=char(i+dif);//add to missing
        }
    }

    return ms;
}

int main(){
    string s = "The quick brown fox jumps over a lazy dog. The quick brown fox jumps over a lazy dog"; 
    cout << missingLetters(s)<<endl; // gives yay stopped processing at pos 40/84
    s = "mising a lot";
    cout << missingLetters(s); // gives BCDEFHJKPQRUVWXYZ
}
share|improve this answer

Since the size of the alphabet is fixed, you don't need a set, a boolean array will be enough:

  • Initialize a boolean array, with size of the alphabet, all values false
  • For each letter in the input string, set the flag to true for index letter - 'A'
  • After the entire input string is processed, iterate over the boolean array, if the value is false for an index, then append the character index + 'A' to the list of missing letters

The time complexity of this alternative algorithm is \$O(n)\$ where \$n\$ is the size of the input string, space complexity is \$O(1)\$, for the size of the alphabet.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.