Compact coding of integer into bitstring

Question

I want to compactly code positive integers x into bits, in a manner allowing decoding back into the original integers for a stateless decoder knowing the maximum value m of each x; it shall be possible to uniquely decode the concatenation of encodings, as is the case in Huffman coding.
_{[The above introduction motivates the rest, but is not part of the formal definition]}

Notation: for any non-negative integer i, let n(i) be the number of bits necessary to represent i in binary; that is the smallest non-negative integer k such that i>>k == 0

  i :   0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  ..
n(i):   0   1   2   2   3   3   3   3   4   4   4   4   4   4   4   4   5   5   5   ..

I want a function F(x,m) defined for 0<x and x<=m, with output a string of '0' or '1', having these properties:

F(x,m) has length less than 2*n(x) or 2*n(m)-1, whichever is smaller.
If x<y then:
- F(x,m) is no longer than F(y,m);
- F(x,m) and F(y,m) differ at some position over the length of F(x,m);
- there's a 0 in F(x,m) at the first such position.
When for a certain m properties 1 and 2 do not uniquely define F(x,m) for all positive x at most m, we reject any encoding giving a longer F(x,m) than some otherwise acceptable encoding, for the smallest x for which the length do not match.

Note: In the above, implicitly, 0<x, 0<y, 0<m, x<=m, and y<=m, so that F(x,m) and F(y,m) are defined.

It is asked the shortest program that, given any x and m meeting the above constraints and up to 9 decimal digits, outputs an F(x,m) consistent with the above rules. The following C framework (or its equivalent in other languages) is not counted:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define C(c) putchar((c)?'1':'0') // output '0' or '1'
#define S(s) fputs((s),stdout)    // output a string
typedef unsigned long t;          // at least 32 bits
void F(t x,t m);                  // prototype for F
int main(int argc,char *argv[]){
 if(argc==3)F((t)atol(argv[1]),(t)atol(argv[2]));
 return 0;}
void F(t x,t m) {   // counting starts on next line
}                   // counting ends before this line

Comment: Property 1 aggressively bounds the encoded length; property 2 formalizes that unambiguous decoding is possible, and canonicalizes the encoding; I assert (without proof) this is enough to uniquely define the output of F when m+1 is a power of two, and that property 3 is enough to uniquely define F for other m.

Here is a partial table (handmade; the first version posted was riddled with errors, sorry):

  x     :   1       2       3       4       5       6       7       8       9       10      11      12      13      14      15
F(x,1)  : empty
F(x,2)  :   0       1
F(x,3)  :   0       10      11
F(x,4)  :   0       10      110     111
F(x,5)  :   0       10      110     1110    1111
F(x,6)  :   0       100     101     110     1110    1111
F(x,7)  :   0       100     101     1100    1101    1110    1111
F(x,8)  :   0       100     101     110     11100   11101   11110   11111

F(x,15) :   0       100     101     11000   11001   11010   11011   111000  111001  111010  111011  111100  111101  111110  111111

"Compact coding?" How is this more compact than the canonical base-2 representation of the integer? :) — Martin Büttner♦, Jul 9 '14 at 15:42
@m.buettner: Problem with the canonical representation in base 2 is that 3 then 7, 7 then 3, 1 then 15, and a number of other things, all encode as 11111, making decoding impossble. With the proposed coding, the concatenation of several outputs can be uniquely decoded (including with the maximum m changing dynamically). I tried to clarify that. — fgrieu, Jul 9 '14 at 15:50
I disagree with your table. For F(x,5) you have 0,100,101,110,111. But, the encoding 0,10,110,1110,1111 satisfies (1) and (2), and because 10 is shorter than 100, it causes your encoding for F(x,5) to be rejected. — nneonneo, Jul 11 '14 at 16:01
Oh but for F(x,8) your table is still wrong. 0,100,101,110,11100,11101,11110,11111 is better for F(4,8) (110 vs 1100). — nneonneo, Jul 14 '14 at 17:05
Nope. That's illegal because F(7,8) must be less than 6 digits. — nneonneo, Jul 15 '14 at 13:47

nneonneo · Accepted Answer · 2014-07-18 02:19:56Z

Python 3, 163 bytes

n=int.bit_length
R=range
def F(x,m,c=0):
 for i in R(x):L=2*n(m)-2;D=1<<L;r=n(D-c-sum(D>>min(2*n(x+1)-1,L)for x in R(i+1,m)))-1;v=c;c+=1<<r
 return bin(v)[2:L+2-r]

Ignore the c=0 parameter, that's a golf trick.

This greedily chooses the shortest possible representation for each number, subject to the condition that the remaining numbers are still representable. Therefore, by construction, it exactly satisfies the desired properties. It is actually not that hard to modify this code to support a different set of coding rules, as a result.

For example, here are the outputs up to m=15:

m\x |   1       2       3       4       5       6       7       8       9       10      11      12      13      14      15
----+----------------------------------------------------------------------------------------------------------------------------
1   |   
2   |   0       1
3   |   0       10      11
4   |   0       10      110     111
5   |   0       10      110     1110    1111
6   |   0       100     101     110     1110    1111
7   |   0       100     101     1100    1101    1110    1111
8   |   0       100     101     110     11100   11101   11110   11111
9   |   0       100     101     110     11100   11101   11110   111110  111111
10  |   0       100     101     1100    1101    11100   11101   11110   111110  111111
11  |   0       100     101     1100    1101    11100   11101   111100  111101  111110  111111
12  |   0       100     101     1100    11010   11011   11100   11101   111100  111101  111110  111111
13  |   0       100     101     1100    11010   11011   11100   111010  111011  111100  111101  111110  111111
14  |   0       100     101     11000   11001   11010   11011   11100   111010  111011  111100  111101  111110  111111
15  |   0       100     101     11000   11001   11010   11011   111000  111001  111010  111011  111100  111101  111110  111111

isaacg · Answer 2 · 2014-07-17 07:09:15Z

Python, 171

def F(x,m):
 a=0;b=[9]
 while len(b)<m:
    if 4**len(bin(len(b)))*2>64*a<4**len(bin(m)):
     if(a&a+1)**a:b+=[a];a+=1
     else:a*=2
    else:a=b.pop()*2
 return bin((b+[a])[x])[2:]

Note that lines that appear to begin with 4 spaces actually begin with a tab.

Testing, with bitpwner's test script:

         1        2        3        4        5        6        7        8        9        10       11       12       13       14       15      
F(x,1)   0       
F(x,2)   0        1       
F(x,3)   0        10       11      
F(x,4)   0        10       110      111     
F(x,5)   0        10       110      1110     1111    
F(x,6)   0        100      101      110      1110     1111    
F(x,7)   0        100      101      1100     1101     1110     1111    
F(x,8)   0        100      101      110      11100    11101    11110    11111   
F(x,9)   0        100      101      110      11100    11101    11110    111110   111111  
F(x,10)  0        100      101      1100     1101     11100    11101    11110    111110   111111  
F(x,11)  0        100      101      1100     1101     11100    11101    111100   111101   111110   111111  
F(x,12)  0        100      101      1100     11010    11011    11100    11101    111100   111101   111110   111111  
F(x,13)  0        100      101      1100     11010    11011    11100    111010   111011   111100   111101   111110   111111  
F(x,14)  0        100      101      11000    11001    11010    11011    11100    111010   111011   111100   111101   111110   111111  
F(x,15)  0        100      101      11000    11001    11010    11011    111000   111001   111010   111011   111100   111101   111110   111111

Explanation:

This is all based on the observation that between two consecutive elements of the code, F(x,m) and F(x+1,m), we always add one to the binary number, then multiply by two some number of times. If these steps are followed, then it is a valid code. The rest simply tests to make sure it is short enough.

Golfs: 175 -> 171: changed 2**(2*... to 4**

+1. Amazing. You figured out the pattern. – bitpwner Jul 17 '14 at 0:28 — bitpwner, Jul 17 '14 at 0:28
Nit: F(1,1) should be the empty string. – nneonneo Jul 18 '14 at 2:16 — nneonneo, Jul 18 '14 at 2:16

bitpwner · Answer 3 · 2014-07-17 00:47:40Z

Python - 370

Creates a huffman tree, balances it to fit the rules, then does a walk through the tree to get the final values.

def w(n,m,c,p=""):
    try:[w(n[y],m,c,p+`y`)for y in 1,0]
    except:c[n[0]]=p
d=lambda x:len(x)>1and 1+d(x[1])
v=lambda x,y:v(x[1],y-1)if y else x
def F(x,m):
    r=[m];i=j=0
    for y in range(1,m):r=[[m-y],r]
    while d(r)>len(bin(m))*2-6-(m==8):g=v(r,i);g[1],g[1][0]=g[1][1],[g[1][0],g[1][1][0]];i,j=[[i+1+(d(g)%2<1&(1<i<5)&(m%7<1)),j],[j+1]*2][d(g)<5]
    c={};w(r,m,c);return c[x]

For a more succinct solution that is based on the emergent pattern, look at isaacg's answer.
It's a good example of how a completely different approach can solve the problem.

Test:

chars = 8
maxM = 15
print " "*chars,
for m in range(1,maxM+1):
    p = `m`
    print p+" "*(chars-len(p)),
print
for m in range(1,maxM+1):
    p = "F(x,"+`m`+")"
    print p+" "*(chars-len(p)),
    for x in range(1,maxM+1):
        try:
            q = `F(x,m)`[1:-1]
            print q+" "*(chars-len(q)),
        except:
            print
            break

Results:

         1        2        3        4        5        6        7        8        9        10       11       12       13       14       15      
F(x,1)           
F(x,2)   0        1       
F(x,3)   0        10       11      
F(x,4)   0        10       110      111     
F(x,5)   0        10       110      1110     1111    
F(x,6)   0        100      101      110      1110     1111    
F(x,7)   0        100      101      1100     1101     1110     1111    
F(x,8)   0        100      101      1100     1101     1110     11110    11111   
F(x,9)   0        100      101      1100     1101     1110     11110    111110   111111  
F(x,10)  0        100      101      1100     1101     11100    11101    11110    111110   111111  
F(x,11)  0        100      101      1100     1101     11100    11101    111100   111101   111110   111111  
F(x,12)  0        100      101      11000    11001    11010    11011    11100    11101    11110    111110   111111  
F(x,13)  0        100      101      11000    11001    11010    11011    11100    11101    111100   111101   111110   111111  
F(x,14)  0        100      101      11000    11001    11010    11011    11100    111010   111011   111100   111101   111110   111111  
F(x,15)  0        100      101      11000    11001    11010    11011    111000   111001   111010   111011   111100   111101   111110   111111

As pointed by nneonneo, F(7,8) of 111110 is wrong, for that has length 6, and "less than 2*n(x)" implies less than 6. — fgrieu, Jul 15 '14 at 14:51

asked	1 year ago
viewed	853 times
active	1 year ago

current community

your communities

more stack exchange communities

Compact coding of integer into bitstring

3 Answers 3

Python 3, 163 bytes

Python, 171

Python - 370

Your Answer

Not the answer you're looking for? Browse other questions tagged code-golf or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Compact coding of integer into bitstring

3 Answers 3

Python 3, 163 bytes

Python, 171

Python - 370

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged code-golf or ask your own question.

Related

Hot Network Questions