Problem

Suppose we abstract our file system by a string in the following manners:

The string dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext represents:

dir
    subdir1
    subdir2
        file.ext

a directory dir contains an empty sub-directories subdir1 and a sub-directory subdir2 containing a file file.ext.

The string dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext represents:

dir
    subdir1
        file1.ext
        subsubdir1
    subdir2
        subsubdir2
            file2.ext

a directory dir contains two sub-directories subdir1 and subdir2. subdir1 contains a file file1.extand an empty second-level sub-directory subsubdir1. subdir2 contains a second-level sub-directory subsubdir2 containing a file file2.ext.

We are interested in finding the longest (number of characters) absolute path to a file within our file system. That is, for example, in the second example above, the longest absolute path is dir/subdr2/subsubdir2/file.ext, and its length is 30.

Given a string representing the file system in the above format, return the length of the longest absolute path to file in the abstracted file system. Simply put, that is the length of a valid path that ends with a file, and a file contains . and an extension.

Time complexity required: O(n) where n is the size of the input string.

Notice that a/aa/aaa/file1.txt is not the path you want, if there is aaaaaaaaaaaaaaaaaaaaa/sth.png.

Any advice on code bugs, performance improvements in terms of algorithm time complexity, or code style advice is highly appreciated.

def find_longest_path(source):
    items = source.split('\n')
    prefix = []
    prefix_length = []
    max_length_so_far = -1
    max_path_so_far = ''
    for path in items:
        i = 0
        while path[i] == '\t':
            i += 1
        if len(prefix) > i:
            prefix = prefix[:i]
            prefix_length = prefix_length[:i]
        prefix.append(path[i:])
        prefix_length.append(len(path[i:]))
        if sum(prefix_length) > max_length_so_far:
            max_length_so_far = sum(prefix_length)
            max_path_so_far = '/'.join(prefix)
    return max_path_so_far

if __name__ == "__main__":
    path_name = "dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext"
    print find_longest_path(path_name)
    path_name = "dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext"
    print find_longest_path(path_name)
share|improve this question

The problem statement indicates that path separators should be taken into account when computing the length. You might add a test case to test for this case either by crafting a particular input or by returning the longest path along with longest count that you can verify.

Some of the code could be made more concise.

filename = path.split('\t')[-1]    # last of list, assumes no tabs in filenames
depth = len(path) - len(filename)  # yields number of tabs

Rather than repeatedly computing the sum of the paths, you could create a cumulative_sum.

cumulative_len = cumulative_len[:depth]
if depth > 0:
    # last element contains sum of entire path to depth
    # +1 to account for the preceding path separator
    cumulative_len.append(cumulative_len[-1] + len(filename) + 1)
else:
    # the problem statement does not include an initial path separator
    cumulative_len.append(len(filename))

Rather than joining the path for intermediate maximums that will be discarded, just store the list of path elements and join on final delivery.

You might consider asserting on the desired results so you get clearer feedback of failure.

def check(input, expected):
    longest_path = find_longest_path(input)
    assert input == expected, "Expected {} but got {}".format(expected, longest_path)
    print(longest_path)

if __name__ == "__main__":
    check("dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext", 
          "dir/subdir2/subsubdir2/file2.ext")

    check("dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext",
          "dir/subdir2/file.ext")
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.