Given a set T of characters and a string S, find the minimum window in S which will contain all the characters in T in complexity O(n). eg, Minimum window is Note: If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S. |
This is an interesting problem and interesting problems have multiple approaches and the best approach is usually simple and beautiful. In this post, I illustrate my approach when I first attempt this problem. My first approach is more complicated and is not the best solution (runs in O(N lg M) time). Later in this post, I explain the best approach which runs in O(N) time with the help of images. Hint: To solve this problem efficiently, below are the two key points we need to consider:
We definitely need the help from a hash table here. Hash table is able to tell us if a letter exist in T in O(1) time. An O(N lg M) solution: In this case, the remedy is to maintain queues (instead of table) that correspond to each unique character in T. For example, assume that T = "AABC", when you first see 'A', push its position into the 'A' queue (which originally is empty). When you see 'A' again, push its position to the back of 'A' queue. The third time you see 'A', you would pop the front element, and push its position to the back of 'A' queue. By popping the element, we do not include window that wrap around a sub-window. This approach works but the difficulty is two-fold:
The way I solve the above problem is to maintain a sorted map that maps indices to character so that we can grab the minimum and maximum position in O(1) time. But there is an extra cost for doing this. Each time you pop an element from the queue, you would also need to update the map by deleting the corresponding element and inserting a new element. To check if the window satisfies the constraint, we check the map's size; a valid window is found if the map's size is equal to T's length. The complexity of the method is O(N lg M), where N is the length of S, and M is the length of T. The extra lg M term is due to the extra cost of deleting and inserting an element in the map, where each costs O(lg M) time in the worst case (Note that M is the map's maximum size. Why?)
Best solution: Notice how complicated the above solution is. It uses a hash table, a queue, and a sorted map. During an interview, the problems tend to be short and the solution usually can be coded in about 50 lines of code. So be sure that you say out loud what you are thinking and keep communication opened with the interviewer. Check if your approach is unnecessary complex, he/she might be able to give you guidance. The last thing you want to do is to get stuck in a corner and keep silent. To help illustrate this approach, I use a different example: S = "acbbaca" and T = "aba". The idea is mainly based on the help of two pointers (begin and end position of the window) and two tables (needToFind and hasFound) while traversing S. needToFind stores the total count of a character in T and hasFound stores the total count of a character met so far. We also use a count variable to store the total characters in T that's met so far (not counting characters where hasFound[x] exceeds needToFind[x]). When count equals T's length, we know a valid window is found. Each time we advance the end pointer (pointing to an element x), we increment hasFound[x] by one. We also increment count by one if hasFound[x] is less than or equal to needToFind[x]. Why? When the constraint is met (that is, count equals to T's size), we immediately advance begin pointer as far right as possible while maintaining the constraint. How do we check if it is maintaining the constraint? Assume that begin points to an element x, we check if hasFound[x] is greater than needToFind[x]. If it is, we can decrement hasFound[x] by one and advancing begin pointer without breaking the constraint. On the other hand, if it is not, we stop immediately as advancing begin pointer breaks the window constraint. Finally, we check if the minimum window length is less than the current minimum. Update the current minimum if a new minimum is found. Essentially, the algorithm finds the first window that satisfies the constraint, then continue maintaining the constraint throughout. ![]() i) S = "acbbaca" and T = "aba". ![]() ii) The first minimum window is found. Notice that we cannot advance begin pointer as hasFound['a'] == needToFind['a'] == 2. Advancing would mean breaking the constraint. ![]() iii) The second window is found. begin pointer still points to the first element 'a'. hasFound['a'] (3) is greater than needToFind['a'] (2). We decrement hasFound['a'] by one and advance begin pointer to the right. ![]() iv) We skip 'c' since it is not found in T. Begin pointer now points to 'b'. hasFound['b'] (2) is greater than needToFind['b'] (1). We decrement hasFound['b'] by one and advance begin pointer to the right. ![]() v) Begin pointer now points to the next 'b'. hasFound['b'] (1) is equal to needToFind['b'] (1). We stop immediately and this is our newly found minimum window. Both the begin and end pointers can advance at most N steps (where N is S's size) in the worst case, adding to a total of 2N times. Therefore, the run time complexity must be in O(N).
Test Cases: Is this taking care of duplicate char in T? Since you are using needToFind[256], I assume not. Good solution, 2 pointers, sliding window plus assisting table solve the issue in a simple way. @RisingWill, doesn't the example indicate the T has duplicate characters? Of course it has taken care of duplicate characters. |
A small improvement: add an array to keep track of visited chars in S that are also in T so that left pointer can jump directly to next spot. This will not change the big-o time though.
|