Take the 2-minute tour ×
Programming Puzzles & Code Golf Stack Exchange is a question and answer site for programming puzzle enthusiasts and code golfers. It's 100% free, no registration required.

A common optimization to save space in binaries is to merge string literals where one literal is the suffix of another. For instance, a binary with the string literals

a: foobar
b: bar
c: barbaz
d: foobarbaz
e: baz

might contain the following string literal pool (# representing the \0-terminator):

foobar#foobarbaz#

with the symbols a, b, c, and d having the following values relative to the beginning of the string pool:

a:  0
b:  3
c: 10
d:  7
e: 13

In this task, you have to compute the minimal size of a string pool for a given set of input strings.

Input

The input is a series of up to 999 strings each comprising up to 80 ASCII characters (not including the newline) in the range from 32 to 127 inclusive and then a single newline character.

Output

Find the shortest string such that each of the input strings (including the terminating newlines) are substrings of that string. The output shall be the length of that shortest string. Do not output the string, only its length.

Scoring

This challenge is code golf, standard loopholes apply. The solution with the least length in octets wins.

Examples

  1. Input:

    foobar
    bar
    barbaz
    foobarbaz
    baz
    

    shortest string, # representing newline:

    foobar#foobarbaz#
    

    length: 17

  2. Input:

    foobar
    foobaz
    foobarbaz
    barbaz
    

    shortest string, # representing newline:

    foobar#foobaz#foobarbaz#
    

    length: 24

share|improve this question
1  
And 80-character test case would be good. Also, is there any difference between "octet" and "byte"? Otherwise I'm not sure what the benefit of using the obscurer term is. –  Martin Büttner 14 hours ago
1  
@MartinBüttner On some machines, a byte has more or less than 8 bits (cf. Knuth's MIX). Octet is the standard word to refer to an 8 bit quantity, byte refers to the least addressable unit of the particular machine you're working on. The 80 character limit is just there so people can work with fixed arrays and so I can't say “this is invalid because it breaks with very long input.” –  FUZxxl 14 hours ago
1  
Are all input strings pairwise distinct? –  Alexey Burdin 10 hours ago
    
@AlexeyBurdin No. –  FUZxxl 4 hours ago

3 Answers 3

Pyth, 20 18 bytes

hljb-{.zsmteM./d.z

Demonstration.

{ can be removed if duplicates are not allowed.

Explanation:

hljb-{.zsmteM./d.z
                .z     The input, as a list of strings.
         m             Map each strings to
             ./d       all possible partitions of the string into separate strings.
           eM          take the last element of each, giving all suffixes.
          t            Remove the first suffix, giving all suffixes other than
                       the string itself.
        s              Sum, combining the list of lists into a single list.
    -{.z               From the set of input strings, remove all suffixes.
                       This is the list of strings in the minimal segment.
  jb                   Join the strings together on newlines.
 l                     Take the length of the resulting string.
h                      Add one and print.
share|improve this answer

CJam, 22 bytes

qN%Nf+_&:G{Gs\/,3<},s,

_& can be removed if no string occurs multiple times.

Try it online.

share|improve this answer

python 2, 132

Just to start a race:

def f(s):
    l=set(s.split('\n'))
    for x in l:
        for y in l:
            if x!=y and x.endswith(y):l.remove(y)
    return sum(len(x)+1 for x in l)

It works:

>>> f(r'''foobar
foobaz
foobarbaz
barbaz''')
24
>>> f(r'''foobar
bar
barbaz
foobarbaz
baz
''')
17
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.