Was trying to work out a function to map the size, in base-2 bits of a number, to the minimum number of base-10 digits needed to represent the number (and to useful: if the function has an error the error *must* be upward). I couldn’t work out anything, so I wrote out the numbers to see if there was any obvious pattern. And hey presto:

bits | Min base-10 | |||
---|---|---|---|---|

0 | 1 | 2 | 3 | 1 |

4 | 5 | 6 | 2 | |

7 | 8 | 9 | 3 | |

10 | 11 | 12 | 13 | 4 |

14 | 15 | 16 | 5 | |

17 | 18 | 19 | 6 | |

20 | 21 | 22 | 23 | 7 |

24 | 25 | 26 | 8 | |

27 | 28 | 29 | 9 | |

30 | 31 | 32 | 33 | 10 |

34 | 35 | 36 | 11 | |

37 | 38 | 39 | 12 | |

40 | 41 | 42 | 43 | 13 |

44 | 45 | 46 | 14 | |

47 | 48 | 49 | 15 | |

50 | 51 | 52 | 53 | 16 |

etc.. |

A sufficient function must then be, using C-ish notation, and assuming ‘/’ is an integer divide:

f(x) -> (x%10 <= 3 ? (3x / 10 + 1)

: (x % 10 <= 6 ? (3x / 10 + 2)

: (3x / 10 + 3))

Update: f(x) -> (x % 10 <= 6 ? (3x / 10 + 2 : (3x / 10 + 3))

What I’m wondering though is:

- Was there a better way to work out the function?
- Is there a neater way of expressing it?

*Update*: As cjb points out:

2 = 10^n

=> n = log(2)/log(10)

=> 2^p = (10^n)^p

=> 2^p = 10^(np)

ergo, for any given p, ceiling(np) digits are required

As I’m a bit dense, cjb had to re-explain it to me in terms of information content. To paraphrase (hopefully not too badly):

1 digit of base-10 has log2(10) bits of information, and similarly 1 digit of base-2 has log10(2) (ie 1/log2(10)) decimal units (aka ‘hartley‘) of information. Therefore, n binary bits have nlog10(2) hartley’s of information. Round this up to get the minimum required decimal digits.

cjb also pointed out a transcription error in the function (left out the multiply by 3). He has some other interesting points I’m still trying to get my head around :).

*Update2*: CJB points out there’s a problem with my table, and hence with my function. The below again paraphrases his explanation. My table assumes that for every additional 10 bits, that this corresponds perfectly with an additional 3 decimal digits (hence the recurring pattern). I.e. that the ratio is 10:3. However, this isn’t the case – as we can see from above, the ratio is actually:

1 : log10(2) = 3.322

This is obviously quite close to, but not the same as the 10:3 ratio in the table . Therefore the table diverges very slowly from the actual answer. To find where it diverges sufficiently to throw off the table:

the divergence ratio is 1024:1000, for every 3 rows.

so the divergence reaches 1 at: 3/log(1.024) ~= 291.2

so the columns of the table become out of sync around 291.

And lo and behold:

table:

292*3/10+1 = 88

293*3/10+1 = 88

by nlog2(p):

292*l(2)/l(10) = 87.90

293*l(2)/l(10) = 88.20 (oops, too big for 88)

Discrepencies actually occur between the 2 methods prior to this, at 103, 113, 123, … I don’t have an explanation for those as yet. Just removing the <= 3 band, and having those values use the <=6/+2 band allows the integer-only method to work up to about 1073 bits.

Cjb has a better method, more precise and robust and integer only:

f(x) = 30103x/100000 + 1;

which is good up to about 13300 bits!