I want to compactly code positive integers x
into bits, in a manner allowing decoding back into the original integers for a stateless decoder knowing the maximum value m
of each x
; it shall be possible to uniquely decode the concatenation of encodings, as is the case in Huffman coding.
[The above introduction motivates the rest, but is not part of the formal definition]
Notation: for any non-negative integer i
, let n(i)
be the number of bits necessary to represent i
in binary; that is the smallest non-negative integer k
such that i>>k == 0
i : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ..
n(i): 0 1 2 2 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 ..
I want a function F(x,m)
defined for 0<x
and x<=m
, with output a string of '0
' or '1
', having these properties:
F(x,m)
has length less than2*n(x)
or2*n(m)-1
, whichever is smaller.- If
x<y
then:F(x,m)
is no longer thanF(y,m)
;F(x,m)
andF(y,m)
differ at some position over the length ofF(x,m)
;- there's a
0
inF(x,m)
at the first such position.
- When for a certain
m
properties 1 and 2 do not uniquely defineF(x,m)
for all positivex
at mostm
, we reject any encoding giving a longerF(x,m)
than some otherwise acceptable encoding, for the smallestx
for which the length do not match.
Note: In the above, implicitly, 0<x
, 0<y
, 0<m
, x<=m
, and y<=m
, so that F(x,m)
and F(y,m)
are defined.
It is asked the shortest program that, given any x
and m
meeting the above constraints and up to 9 decimal digits, outputs an F(x,m)
consistent with the above rules. The following C framework (or its equivalent in other languages) is not counted:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define C(c) putchar((c)?'1':'0') // output '0' or '1'
#define S(s) fputs((s),stdout) // output a string
typedef unsigned long t; // at least 32 bits
void F(t x,t m); // prototype for F
int main(int argc,char *argv[]){
if(argc==3)F((t)atol(argv[1]),(t)atol(argv[2]));
return 0;}
void F(t x,t m) { // counting starts on next line
} // counting ends before this line
Comment: Property 1 aggressively bounds the encoded length; property 2 formalizes that unambiguous decoding is possible, and canonicalizes the encoding; I assert (without proof) this is enough to uniquely define the output of F
when m+1
is a power of two, and that property 3 is enough to uniquely define F
for other m
.
Here is a partial table (handmade; the first version posted was riddled with errors, sorry):
x : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
F(x,1) : empty
F(x,2) : 0 1
F(x,3) : 0 10 11
F(x,4) : 0 10 110 111
F(x,5) : 0 10 110 1110 1111
F(x,6) : 0 100 101 110 1110 1111
F(x,7) : 0 100 101 1100 1101 1110 1111
F(x,8) : 0 100 101 110 11100 11101 11110 11111
F(x,15) : 0 100 101 11000 11001 11010 11011 111000 111001 111010 111011 111100 111101 111110 111111
11111
, making decoding impossble. With the proposed coding, the concatenation of several outputs can be uniquely decoded (including with the maximumm
changing dynamically). I tried to clarify that. – fgrieu Jul 9 '14 at 15:50