-3

I need to find all sub-strings from the given array of strings and group them.

Additional condition:

If string S1 contains string S2, S1 contains S3, S2 contains S4 - all them should be in one group.

Example:

Given array: Hello, Hello John, Hi, Hi Bob, Hell, Hi all

Result output:

Group 1: Hello, Hello John, Hell

Group 2: Hi, Hi Bob, Hi All

2
  • And where do you have problems? Commented Apr 15, 2016 at 9:13
  • 1
    In my current implementation (Brute-force) I'm facing with N*N complexity (which is expected) and this doesn't work with huge arrays. Commented Apr 15, 2016 at 9:28

1 Answer 1

1
  • Build a trie on the array of strings
  • For each array entry, walk the trie and if the current node marks a word, print it (under the same group as the current string). Do some bookkeeping to avoid printing the same word many times.

Time complexity to build the tire is O(|w1| + ... + |wn|) where |wi| is the length of string wi; so it's linear in the sum of the lengths of the strings. Space complexity is bounded by the same expression but is much lower when there's lots of common prefixes (which is what happens in practice).

The query step has time complexity linear in the length of the string---just traverse the branch that corresponds to the string. (Maybe you can mark off strings that you've visited along the way---and are thus prefixed of the current string---so that you won't even traverse them later. Visiting longer strings first helps you bring the time complexity further down.)

Here's a struct to get you started:

typedef struct node_t_ node_t;
struct node_t_ {
    node_t c *children[ALPHABET_SIZE];
    char kIsLeaf; // set to 1 if represents a word
    char ch; // character stored in the leaf (redundant)
}

Inserting is easy. You start with non-NULL root which stores zero character (represents the empty string).

Insertion:

 void insert(const char* str) {
    node_t* current = root;
    while (*str != '\0') {
        if (current->children[*str] == NULL) {
            create new node;
        }
        current = current->children[*str++];
    }
    current->kIsLeaf = 1;
}

The other procedures are very similar. Trie is a very elegant, simple to implement and easy to use data structure.

Sign up to request clarification or add additional context in comments.

2 Comments

But what should we do for instance for keys like "Hello world", "world"? They also should be in one group as 1st string contains second one, but in Trie they will be in different pathes.
And if it's in the middle, such as "hello" and "Well, hello there!" ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.