Is it a prefix code?

Question

In information theory, a "prefix code" is a dictionary where none of the keys are a prefix of another. In other words, this means that none of the strings starts with any of the other.

For example, {"9", "55"} is a prefix code, but {"5", "9", "55"} is not.

The biggest advantage of this, is that the encoded text can be written down with no separator between them, and it will still be uniquely decipherable. This shows up in compression algorithms such as Huffman coding, which always generates the optimal prefix code.

Your task is simple: Given a list of strings, determine whether or not it is a valid prefix code.

Your input:

Will be a list of strings in any reasonable format.
Will only contain printable ASCII strings.
Will not contain any empty strings.

Your output will be a truthy/falsey value: Truthy if it's a valid prefix code, and falsey if it isn't.

Here are some true test cases:

["Hello", "World"]                      
["Code", "Golf", "Is", "Cool"]
["1", "2", "3", "4", "5"]
["This", "test", "case", "is", "true"]          

["111", "010", "000", "1101", "1010", "1000", "0111", "0010", "1011", 
 "0110", "11001", "00110", "10011", "11000", "00111", "10010"]

Here are some false test cases:

["4", "42"]                             
["1", "2", "3", "34"]                   
["This", "test", "case", "is", "false", "t"]
["He", "said", "Hello"]
["0", "00", "00001"]
["Duplicate", "Duplicate", "Keys", "Keys"]

This is code-golf, so standard loopholes apply, and shortest answer in bytes wins.

Do you want a consistent truthy value or could it be e.g. "some positive integer" (which might vary between different inputs). — Martin Ender♦, May 2 '16 at 6:50
Evil corner case: do you guarantee that none of the keys is the empty string? — Peter Taylor, May 2 '16 at 11:22
@Joba It depends on what your keys are. If you have 0, 00, 1, 11 all as keys, this is not a prefix-code because 0 is a prefix of 00, and 1 is a prefix of 11. A prefix code is where none of the keys starts with another key. So for example, if your keys are 0, 10, 11 this is a prefix code and uniquely decipherable. 001 is not a valid message, but 0011 or 0010 are uniquely decipherable. — DJMcMayhem, May 2 '16 at 16:10

isaacg · Accepted Answer · 2016-05-02 01:46:55Z

up vote 10 down vote accepted

Pyth, 8 bytes

.AxM.PQ2

Test suite

Take all 2 element permutations of the input, map each to the index of one string in the other (0 for a prefix) and return whether all results are truthy (non-zero).

answered May 2 '16 at 1:46

isaacg

31k441155

add a comment |

xnor · Answer 2 · 2016-05-02 09:00:08Z

up vote 10 down vote

Haskell, 37 bytes

f l=[x|x<-l,y<-l,zip x x==zip x y]==l

Each element x of l is repeated once for every element that it's a prefix of, which is exactly once for a prefix-free list, giving the original list. The prefix property is checked by zipping both lists with x, which cuts off elements beyond the length of x.

answered May 2 '16 at 9:00

xnor

52.9k11113280

This is an elegant solution (+1) – Michael Klein May 2 '16 at 17:39

add a comment |

Marv · Answer 3 · 2016-05-02 13:16:25Z

up vote 9 down vote

Java, 128 127 126 125 124 121 bytes

(Thanks @Kenny Lau, @Maltysen, @Patrick Roberts, @Joba)

Object a(String[]a){for(int i=0,j,l=a.length;i<l;i++)for(j=0;j<l;)if(i!=j&a[j++].startsWith(a[i]))return 1<0;return 1>0;}

Ungolfed

Object a(String[] a) {
    for (int i = 0, j, l = a.length; i < l; i++) 
        for (j = 0; j < l;) 
            if (i != j & a[j++].startsWith(a[i])) return 1<0;
    return 1>0;
}

Output

[Hello, World]
true

[Code, Golf, Is, Cool]
true

[1, 2, 3, 4, 5]
true

[This, test, case, is, true]
true

[111, 010, 000, 1101, 1010, 1000, 0111, 0010, 1011, 0110, 11001, 00110, 10011, 11000, 00111, 10010]
true

[4, 42]
false

[1, 2, 3, 34]
false

[This, test, case, is, false, t]
false

[He, said, Hello]
false

[0, 00, 00001]
false

[Duplicate, Duplicate, Keys, Keys]
false

edited May 2 '16 at 13:16

answered May 2 '16 at 1:56

Marv

71927

1

This is so verbose. I love it! – DJMcMayhem May 2 '16 at 1:58

1

idk bout java, but would & work instead of &&? – Maltysen May 2 '16 at 2:18

1

Right, saves another byte. In Java, using bitwise operators with boolean operands behave just like normal logical operators, except they don't short circuit which isn't needed in this case. – Marv May 2 '16 at 2:21

Couldn't you just change the return type of the function to int and return 0 and 1? That would save several bytes. Also I forget if this is valid in Java but if you declare i, j, and l inside the outer for loop that would save one byte from one less semicolon. – Patrick Roberts May 2 '16 at 7:15

3

@Joba Pretty sure that isn't valid since indexOf returns -1 when the string isn't found; it would need to be indexOf(a[i])==0 in which case there are no savings. – Pokechu22 May 2 '16 at 14:49

| show 5 more comments

xnor · Answer 4 · 2016-05-02 09:14:29Z

Python 2, 48 51 bytes

lambda l:all(1/map(a.find,l).count(0)for a in l)

For each element a of l, the function a.find finds the index of the first occurrence of a in the input string, giving -1 for an absence. So, 0 indicates a prefix. In a prefix-free list, mapping this function returns only a single 0 for a itself. The function checks that this is the case for every a.

51 bytes:

lambda l:[a for a in l for b in l if b<=a<b+'~']==l

Replace ~ with a character with ASCII code 128 or higher.

For each element a in l, a copy is included for each element that's a prefix of it. For a prefix-free list, the only such element is a itself, so this gives the original list.

I don't think the second one works unless you include the utf-8 shebang or use python 3, since python 2 is dumb about encoding. Or you could replace '~' with ord(128) — DJMcMayhem, May 2 '16 at 16:59

Martin Ender · Answer 5 · 2016-05-02 08:53:04Z

CJam, 14 bytes

q~$W%2ew::#0&!

Test suite.

Explanation

q~   e# Read and evaluate input.
$    e# Sort strings. If a prefix exists it will end up directly in front 
     e# of a string which contains it.
W%   e# Reverse list.
2ew  e# Get all consecutive pairs of strings.
::#  e# For each pair, find the first occurrence of the second string in the first.
     e# If a prefix exists that will result in a 0, otherwise in something non-zero.
0&   e# Set intersection with 0, yielding [0] for falsy cases and [] for truthy ones.
!    e# Logical NOT.

Patrick Roberts · Answer 6 · 2016-05-02 15:55:33Z

JavaScript ES6, 65 43 40 bytes

a=>!/(.*)\1/.test(''+a.sort().join``)
      ^            ^               ^ embedded NUL characters

My previous solution, which handled string arrays of all UTF-8 characters:

a=>!/[^\\]("([^"]*\\")*[^\\])",\1/.test(JSON.stringify(a.sort()))

I was able to avoid JSON.stringify since the challenge specifies printable ASCII characters only.

Test

f=a=>!/(\0.*)\1/.test('\0'+a.sort().join`\0`) // since stackexchange removes embedded NUL characters

O.textContent += 'OK: '+
[["Hello", "World"]                      
,["Code", "Golf", "Is", "Cool"]
,["1", "2", "3", "4", "5"]
,["This", "test", "case", "is", "true"]          
,["111", "010", "000", "1101", "1010", "1000", "0111", "0010", "1011", 
 "0110", "11001", "00110", "10011", "11000", "00111", "10010"]
].map(a=>f(a)) 

O.textContent += '\nKO: '+
[["4", "42"]                             
,["1", "2", "3", "34"]                   
,["This", "test", "case", "is", "false", "t"]
,["He", "said", "Hello"]
,["0", "00", "00001"]
,["Duplicate", "Duplicate", "Keys", "Keys"]
].map(a=>f(a))

<pre id=O></pre>

Michael Klein · Answer 7 · 2016-05-02 06:11:23Z

Haskell, 49 bytes

g x=[1|z<-map((and.).zipWith(==))x<*>x,z]==(1<$x)

This has a couple parts:

-- Are two lists (or strings) equal for their first min(length_of_1,length_of_2) elements, i.e. is one the prefix of the other?
(and.).zipWith(==)

-- Check whether one element is the prefix of the other, for all pairs of elements (including equal pairs)
map((and.).zipWith(==))x<*>x

-- This is a list of 1's of length (number of elements that are the prefix of the other)
[1|z<-map((and.).zipWith(==))x<*>x,z]

-- This is the input list, with all the elements replaced with 1's
(1<$x)

If the two lists are equal, then an element is only the prefix of itself, and it's valid.

Pokechu22 · Answer 8 · 2016-05-02 15:01:12Z

Java, 97 bytes

Object a(String[]a){for(String t:a)for(String e:a)if(t!=e&t.startsWith(e))return 1<0;return 1>0;}

Uses most of the tricks found in @Marv's answer, but also makes use of the foreach loop and string reference equality.

Unminified:

Object a(String[]a){
    for (String t : a)
        for (String e : a)
            if (t != e & t.startsWith(e))
                return 1<0;
    return 1>0;
}

Note that, as I said, this uses string reference equality. That does mean that the code can behave oddly due to String interning. The code does work when using arguments passed from the command line, and also when using something read from the command line. If you want to hardcode the values to test, though, you'd need to manually call the String constructor to force interning to not occur:

System.out.println(a(new String[] {new String("Hello"), new String("World")}));
System.out.println(a(new String[] {new String("Code"), new String("Golf"), new String("Is"), new String("Cool")}));
System.out.println(a(new String[] {new String("1"), new String("2"), new String("3"), new String("4"), new String("5")}));
System.out.println(a(new String[] {new String("This"), new String("test"), new String("case"), new String("is"), new String("true")}));
System.out.println(a(new String[] {new String("111"), new String("010"), new String("000"), new String("1101"), new String("1010"), new String("1000"), new String("0111"), new String("0010"), new String("1011"), new String("0110"), new String("11001"), new String("00110"), new String("10011"), new String("11000"), new String("00111"), new String("10010")}));
System.out.println(a(new String[] {new String("4"), new String("42")}));
System.out.println(a(new String[] {new String("1"), new String("2"), new String("3"), new String("34")}));
System.out.println(a(new String[] {new String("This"), new String("test"), new String("case"), new String("is"), new String("false"), new String("t")}));
System.out.println(a(new String[] {new String("He"), new String("said"), new String("Hello")}));
System.out.println(a(new String[] {new String("0"), new String("00"), new String("00001")}));
System.out.println(a(new String[] {new String("Duplicate"), new String("Duplicate"), new String("Keys"), new String("Keys")}));

Martin Ender · Answer 9 · 2016-05-02 08:53:25Z

Retina, 19 bytes

Byte count assumes ISO 8859-1 encoding.

O`.+
Mm1`^(.+)¶\1
0

The input should be linefeed-separated. Output is 0 for falsy and 1 for truthy.

Try it online! (Slightly modified to support multiple space-separated test cases instead.)

Explanation

O`.+

Sort the lines in the input. If a prefix exists it will end up directly in front of a string which contains it.

Mm1`^(.+)¶\1

Try to match (M) a complete line which is also found at the beginning of the next line. The m activates multiline mode such that ^ matches line beginnings and the 1 ensures that we only count at most one match such that the output is 0 or 1.

To swap 0 and 1 in the result, we count the number of 0s.

Winny · Answer 10 · 2016-05-02 02:25:38Z

up vote 1 down vote

Racket, 70 bytes

(λ(l)(andmap(λ(e)(not(ormap(curryr string-prefix? e)(remv e l))))l))

answered May 2 '16 at 2:25

Winny

900213

add a comment |

Cyoce · Answer 11 · 2016-05-02 08:28:15Z

up vote 1 down vote

Python, 58 55 bytes

lambda l:sum(0==a.find(b)for a in l for b in l)==len(l)

edited May 2 '16 at 8:28

Cyoce

1,632922

answered May 2 '16 at 7:23

DJMcMayhem

21.8k477204

a.index(b)==0 is a bit shorter. Alternatively, you could do 0**sum(a.index(b)for a in l for b in l). – Mego May 2 '16 at 7:46

@Mego That doesn't work because index throws an exception when b isn't found. And because it should be ==, not >=. However, find works. (And it's shorter too!) – DJMcMayhem May 2 '16 at 7:54

Whoops, I meant to type find. Sleepy brain is sleepy. The second version should also work with find. – Mego May 2 '16 at 8:13

@Mego I'm not sure if I get the second version. Wouldn't that always return 0? – DJMcMayhem May 2 '16 at 8:14

@Mego That only works if every string is the same. The reason we compare it to len(l) is since we're iterating through all bs on each a, there will always be at least one match per a. So we check if the number of matches is the same as the number of elements. – DJMcMayhem May 2 '16 at 8:32

| show 1 more comment

edc65 · Answer 12 · 2016-05-02 12:26:37Z

JavaScript (ES6), 52 54

Edit 2 bytes saved thx @Neil

a=>!a.some((w,i)=>a.some((v,j)=>i-j&&!w.indexOf(v)))

Test

f=a=>!a.some((w,i)=>a.some((v,j)=>i-j&&!w.indexOf(v)))

O.textContent += 'OK: '+
[["Hello", "World"]                      
,["Code", "Golf", "Is", "Cool"]
,["1", "2", "3", "4", "5"]
,["This", "test", "case", "is", "true"]          
,["111", "010", "000", "1101", "1010", "1000", "0111", "0010", "1011", 
 "0110", "11001", "00110", "10011", "11000", "00111", "10010"]
].map(a=>f(a)) 

O.textContent += '\nKO: '+
[["4", "42"]                             
,["1", "2", "3", "34"]                   
,["This", "test", "case", "is", "false", "t"]
,["He", "said", "Hello"]
,["0", "00", "00001"]
,["Duplicate", "Duplicate", "Keys", "Keys"]
].map(a=>f(a))

<pre id=O></pre>

DavidC · Answer 13 · 2016-05-02 15:39:24Z

up vote 1 down vote

Mathematica 75 69 68 bytes

Loquacious as usual. But Martin B was able to reduce the code by 7 bytes.

Method 1: Storing output in an `Array`

(68 bytes)

f@a_:=!Or@@(Join@@Array[a~Drop~{#}~StringStartsQ~a[[#]]&,Length@a])

f@{"111", "010", "000", "1101", "1010", "1000", "0111", "0010", "1011", "0110", "11001", "00110", "10011", "11000", "00111", "10010"}

True

f@{"He", "said", "Hello"}

False

Method 2: Storing output in a `List`

(69 bytes)

f@a_:=!Or@@Flatten[a~Drop~{#}~StringStartsQ~a[[#]]&/@Range@Length@a]

edited May 2 '16 at 15:39

answered May 2 '16 at 14:19

DavidC

21.8k23792

The precedence rules should make a~Drop~{#}~StringStartsQ~a[[#]] work. Also Array should save some bytes over Length, especially because it will allow you to use Join@@ instead of Flatten@ (provided, you're using Flatten only for a single level). – Martin Ender♦ May 2 '16 at 14:28

Thanks for the suggestion. I'll look into Array later. – DavidC May 2 '16 at 14:47

add a comment |

lad2025 · Answer 14 · 2016-05-06 20:35:28Z

PostgreSQL, 186, 173 bytes

WITH y AS(SELECT * FROM t,LATERAL unnest(c)WITH ORDINALITY s(z,r))
SELECT y.c,EVERY(u.z IS NULL)
FROM y LEFT JOIN y u ON y.i=u.i AND y.r<>u.r AND y.z LIKE u.z||'%' GROUP BY y.c

Output:

No live demo this time. http://sqlfiddle.com supports only 9.3 and to run this demo 9.4 is required.

How it works:

Split string array with number and name it y
Get all y
LEFT OUTER JOIN to the same derived table based on the same i(id), but with different oridinal that start with prefix y.z LIKE u.z||'%'
Group result based on c (initial array) and use EVERY grouping function. If every row from second table IS NULL it means there is no prefixes.

Input if someone is interested:

CREATE TABLE t(i SERIAL,c text[]);

INSERT INTO t(c)
SELECT '{"Hello", "World"}'::text[]
UNION ALL SELECT  '{"Code", "Golf", "Is", "Cool"}'
UNION ALL SELECT  '{"1", "2", "3", "4", "5"}'
UNION ALL SELECT  '{"This", "test", "case", "is", "true"}'         
UNION ALL SELECT  '{"111", "010", "000", "1101", "1010", "1000", "0111", "0010", "1011","0110", "11001", "00110", "10011", "11000", "00111", "10010"}'
UNION ALL SELECT  '{"4", "42"}'
UNION ALL SELECT  '{"1", "2", "3", "34"}'                   
UNION ALL SELECT  '{"This", "test", "case", "is", "false", "t"}'
UNION ALL SELECT  '{"He", "said", "Hello"}'
UNION ALL SELECT  '{"0", "00", "00001"}'
UNION ALL SELECT  '{"Duplicate", "Duplicate", "Keys", "Keys"}';

EDIT:

SQL Server 2016+ implementation:

WITH y AS (SELECT *,z=value,r=ROW_NUMBER()OVER(ORDER BY 1/0) FROM #t CROSS APPLY STRING_SPLIT(c,','))
SELECT y.c, IIF(COUNT(u.z)>0,'F','T')
FROM y LEFT JOIN y u ON y.i=u.i AND y.r<>u.r AND y.z LIKE u.z+'%' 
GROUP BY y.c;

LiveDemo

Note: It is comma separated list, not real array. But the main idea is the same as in PostgreSQL.

EDIT 2:

Actually WITH ORDINALITY could be replaced:

WITH y AS(SELECT *,ROW_NUMBER()OVER()r FROM t,LATERAL unnest(c)z)
SELECT y.c,EVERY(u.z IS NULL)
FROM y LEFT JOIN y u ON y.i=u.i AND y.r<>u.r AND y.z LIKE u.z||'%' GROUP BY y.c

SqlFiddleDemo

A Simmons · Answer 15 · 2016-11-16 16:35:34Z

up vote 1 down vote

Mathematica, 41 bytes

!Or@@StringStartsQ@@@Reverse@Sort@#~Subsets~{2}&

answered Nov 16 '16 at 16:35

A Simmons

2,7151417

add a comment |

Maltysen · Answer 16 · 2016-05-02 01:33:41Z

up vote 0 down vote

Pyth - 13 12 bytes

!ssmm}d._k-Q

Try it online here.

edited May 2 '16 at 1:33

answered May 2 '16 at 1:28

Maltysen

18.5k340101

add a comment |

ezrast · Answer 17 · 2016-05-02 04:10:35Z

up vote 0 down vote

Ruby, 48 bytes

Uses arguments as input and stdout as output.

p !$*.map{a,*b=$*.rotate!
a.start_with? *b}.any?

answered May 2 '16 at 4:10

ezrast

46113

add a comment |

Jacob · Answer 18 · 2016-05-02 09:38:20Z

up vote 0 down vote

Scala, 71 bytes

(s:Seq[String])=>(for{x<-s;y<-s}yield x!=y&&x.startsWith(y)).forall(!_)

answered May 2 '16 at 9:38

Jacob

930621

add a comment |

rnso · Answer 19 · 2016-11-16 18:08:51Z

Racket 130 bytes

(define g #t)(for((n(length l)))(for((i(length l))#:unless(= i n))(when(string-prefix?(list-ref l i)(list-ref l n))(set! g #f))))g

Ungolfed:

(define(f l)
  (define g #t)
  (for ((n (length l)))
    (for ((i (length l)) #:unless (= i n))
      (when (string-prefix? (list-ref l i) (list-ref l n))
        (set! g #f))))g)

Testing:

(f [list "Hello" "World"])             
(f [list "Code" "Golf" "Is" "Cool"])
(f [list "1" "2" "3" "4" "5"])
(f [list "This" "test" "case" "is" "true"])          
(f [list "111" "010" "000" "1101" "1010" "1000" "0111" "0010" "1011" 
         "0110" "11001" "00110" "10011" "11000" "00111" "10010"])

(f [list "4" "42"])                             
(f [list "1" "2" "3" "34"])                   
(f [list "This" "test" "case" "is" "false" "t"])
(f [list "He" "said" "Hello"])
(f [list "0" "00" "00001"])
(f [list "Duplicate" "Duplicate" "Keys" "Keys"])

Output:

#t
#t
#t
#t
#t
#f
#f
#f
#f
#f
#f

asked	8 months ago
viewed	1786 times
active	1 month ago

current community

your communities

more stack exchange communities

Is it a prefix code?

19 Answers 19

Pyth, 8 bytes

Haskell, 37 bytes

Java, 128 127 126 125 124 121 bytes

(Thanks @Kenny Lau, @Maltysen, @Patrick Roberts, @Joba)

Ungolfed

Output

Python 2, 48 51 bytes

CJam, 14 bytes

Explanation

JavaScript ES6, 65 43 40 bytes

Test

Haskell, 49 bytes

Java, 97 bytes

Retina, 19 bytes

Explanation

Racket, 70 bytes

Python, 58 55 bytes

JavaScript (ES6), 52 54

Mathematica 75 69 68 bytes

Method 1: Storing output in an `Array`

Method 2: Storing output in a `List`

PostgreSQL, 186, 173 bytes

Mathematica, 41 bytes

Pyth - 13 12 bytes

Ruby, 48 bytes

Scala, 71 bytes

Racket 130 bytes

Your Answer

Not the answer you're looking for? Browse other questions tagged code-golf string compression decision-problem or ask your own question.

Visit Chat

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Is it a prefix code?

19 Answers 19

Pyth, 8 bytes

Haskell, 37 bytes

Java, 128 127 126 125 124 121 bytes

(Thanks @Kenny Lau, @Maltysen, @Patrick Roberts, @Joba)

Ungolfed

Output

Python 2, 48 51 bytes

CJam, 14 bytes

Explanation

JavaScript ES6, 65 43 40 bytes

Test

Haskell, 49 bytes

Java, 97 bytes

Retina, 19 bytes

Explanation

Racket, 70 bytes

Python, 58 55 bytes

JavaScript (ES6), 52 54

Mathematica 75 69 68 bytes

Method 1: Storing output in an Array

Method 2: Storing output in a List

PostgreSQL, 186, 173 bytes

Mathematica, 41 bytes

Pyth - 13 12 bytes

Ruby, 48 bytes

Scala, 71 bytes

Racket 130 bytes

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged code-golf string compression decision-problem or ask your own question.

Visit Chat

Linked

Related

Hot Network Questions

Method 1: Storing output in an `Array`

Method 2: Storing output in a `List`