On the simple task of generating random numbers, part 1

Random number / random bytes

I'm currently implementing the next generation RNG (random number generator) for Replay Poker. This entails a log of complicated and interesting stuff but in the end it comes down to this, in our case:

Seed the OpenSSL::Random pRNG (pseudo random number generator) from a true random source every once in a while and use it to generate all the random numbers you need.

Truly random

The main point here, for those of you not that accurate on RNGs etc, is that true random sources are often slow and we need a lot of random numbers. The pRNG is good enough to count as a random source all on its own and it's just the detail of security that really means you need a true random number to start from.

The thing is that a pRNG is just a formula. If you know what values it started with and how many values it has calculated since you can reverse engineer all the previous numbers as well as predict every future ones.

A RNG on the other hand is something that gets its data from a physically random source. Such as radioactive decay, atmospheric noise, quantum uncertainty or some other provably random source.

So if you start your randomness formula, pRNG, off with a value from a true random source you are pretty much set for random numbers for as long as you like ...

A naïve assumption

So that is what we did. The one remaining thing to do is the make this a random number generator. OpenSSL::Random returns random bytes, as many as you like, but it doesn't turn them into a number for you.

Let us look at my testcase:

rng = Replay::Poker::Random.new
hash = {}

# To get some orderly keys in the hash
91.times do |index|
  hash[ index ] = []
end

100000.times.with_object do
  number = rng.get( 10..100 )
  hash[ number ] += 1
end

One would imagine that a decent random implementation in this case would indeed provide you with a fairly even distribution of numbers between 10 and 100, at least that's what I assumed of my implementation. Alas, what I got was not quite that:

{
  10=>1253,
  11=>1167,
  12=>1192,
  13=>1202,
  14=>1171,
  15=>745,
  16=>1228,
  17=>1175,
  18=>1158,
  19=>1174,
  20=>763,
  21=>1089,
  22=>1159,
  23=>1162,
  24=>1148,
  25=>1126,
  26=>769,
  27=>1147,
  28=>1165,
  29=>1172,
  30=>1159,
  31=>800,
  32=>1152,
  33=>1135,
  34=>1167,
  35=>1185,
  36=>825,
  37=>1185,
  38=>1158,
  39=>1136,
  40=>1207,
  41=>1139,
  42=>794,
  43=>1182,
  44=>1114,
  45=>1175,
  46=>1236,
  47=>735,
  48=>1186,
  49=>1150,
  50=>1140,
  51=>1190,
  52=>829,
  53=>1109,
  54=>1172,
  55=>1155,
  56=>1177,
  57=>1211,
  58=>787,
  59=>1160,
  60=>1160,
  61=>1151,
  62=>1130,
  63=>756,
  64=>1177,
  65=>1186,
  66=>1168,
  67=>1119,
  68=>802,
  69=>1218,
  70=>1211,
  71=>1201,
  72=>1138,
  73=>1207,
  74=>818,
  75=>1235,
  76=>1139,
  77=>1091,
  78=>1210,
  79=>785,
  80=>1192,
  81=>1183,
  82=>1142,
  83=>1196,
  84=>758,
  85=>1192,
  86=>1147,
  87=>1106,
  88=>1165,
  89=>1217,
  90=>775,
  91=>1232,
  92=>1183,
  93=>1232,
  94=>1161,
  95=>794,
  96=>1134,
  97=>1158,
  98=>1189,
  99=>1215,
  100=>812
}

There are these odd deficits every 5th or 6th number. This is definitely not what you want in your fair and balanced random number generator. Investigations ensued.

False leads

So my first assumption, that will prove to be correct, was that I f**ked something up. My second assumption was that ... I had no idea where the problem where. Look, here is the code:

# In class Replay::Poker::Random
def get( range )
  normalised_max = range.end - range.begin
  number = get_number( normalised_max )

  bytes = bytes_to_represent_number( normalised_max )
  max_number_for_bytes = 2**(8*bytes) - 1
  factor = (normalised_max + 1) / (max_number_for_bytes + 1).to_f

  number * factor + range.begin
end

Two things needed to understand this code: get_number( number ) gives you a number between 0 and the upper max of the byte range required to cover number. So say in our case I want a number between 10 and 100 in the end. To make it simpler we normalise that to a number between 0 and 90, so 90 is what gets sent to get_number.

Now 90 is not at a byte limit, 255,511 and 1023 are, and get_numberactually only randomises in whole bytes, it's the intention that get should be used if you require a specific number range instead of, basically, raw bytes cast as a number. So what we will get back from get_numer is a number between 0 and 255.

Moving on to the second thing: bytes_to_represent_number( number ) is the method that tells you how many whole bytes you need, minimum, to represent number. For 91 its 1, you need at least one whole byte if you want to represent 91. For 1000 it would be 2 bytes and so on.

With that in the bag have another look at the code. Let's go line by line real quick:

Line 1: normalised_max = range.end - range.begin gives us a "range" with a lower limit of zero.

Line 2: number = get_number( normalised_max ) gets the random number, in this case between 0 and 255.

Line 3: bytes = bytes_to_represent_number( normalised_max ) gives 1 for our example.

Line 4: max_number_for_bytes = 2**(8*bytes) - 1 calculates the highest number that is possible on that many bytes, so 255 for 1 byte.

Line 5: factor = (normalised_max + 1) / (max_number_for_bytes + 1).to_f since our random number is too large we need to scale it down to fit the range we have, this calculates the scaling factor we need.

Line 6: number * factor + range.begin and finally we scale the random number we got and push it up to match the requested range, instead of a zero based range, and return it.

Start the giggles

I'm sure some of you have already spotted my mistake. But I spent some time happily chasing floating point errors (it's always a floating point error) while ignoring the nagging voice in the back of my head telling me that a floating point error wouldn't look like this.

So after I finally got over my floating point error phase I had at least done the following: 1. Thoroughly disproved to myself that it was, infact, a floating point error. 2. Proven that the distribution of the raw bytes (the return from line 2 above) had a perfectly random distribution. 3. Talked to two of my colleagues about the issue without coming up with something new. 4. Officially run out of ideas.

I resigned to just leave it alone and hope that inspiration struck late at night as it often does. But just when I was about to move on I had another idea.

Proof your math

What happens if, instead of random values, I consecutively step through the values between 0 and 255 and see where they end up if sorted by the same logic I'm using on the random values now?

Well in short this happens:

{
  0=>[  0,   1,   2],
  1=>[  3,   4,   5],
  2=>[  6,   7,   8],
  3=>[  9,  10,  11],
  4=>[ 12,  13,  14],
  5=>[ 15,  16],
  6=>[ 17,  18,  19],
  7=>[ 20,  21,  22],
  8=>[ 23,  24,  25],
  9=>[ 26,  27,  28],
 10=>[ 29,  30],
 11=>[ 31,  32,  33],
 12=>[ 34,  35,  36],
 13=>[ 37,  38,  39],
 14=>[ 40,  41,  42],
 15=>[ 43,  44,  45],
 16=>[ 46,  47],
 17=>[ 48,  49,  50],
 18=>[ 51,  52,  53],
 19=>[ 54,  55,  56],
 20=>[ 57,  58,  59],
 21=>[ 60,  61],
 22=>[ 62,  63,  64],
 23=>[ 65,  66,  67],
 24=>[ 68,  69,  70],
 25=>[ 71,  72,  73],
 26=>[ 74,  75],
 27=>[ 76,  77,  78],
 28=>[ 79,  80,  81],
 29=>[ 82,  83,  84],
 30=>[ 85,  86,  87],
 31=>[ 88,  89,  90],
 32=>[ 91,  92],
 33=>[ 93,  94,  95],
 34=>[ 96,  97,  98],
 35=>[ 99, 100, 101],
 36=>[102, 103, 104],
 37=>[105, 106],
 38=>[107, 108, 109],
 39=>[110, 111, 112],
 40=>[113, 114, 115],
 41=>[116, 117, 118],
 42=>[119, 120],
 43=>[121, 122, 123],
 44=>[124, 125, 126],
 45=>[127, 128, 129],
 46=>[130, 131, 132],
 47=>[133, 134, 135],
 48=>[136, 137],
 49=>[138, 139, 140],
 50=>[141, 142, 143],
 51=>[144, 145, 146],
 52=>[147, 148, 149],
 53=>[150, 151],
 54=>[152, 153, 154],
 55=>[155, 156, 157],
 56=>[158, 159, 160],
 57=>[161, 162, 163],
 58=>[164, 165],
 59=>[166, 167, 168],
 60=>[169, 170, 171],
 61=>[172, 173, 174],
 62=>[175, 176, 177],
 63=>[178, 179, 180],
 64=>[181, 182],
 65=>[183, 184, 185],
 66=>[186, 187, 188],
 67=>[189, 190, 191],
 68=>[192, 193, 194],
 69=>[195, 196],
 70=>[197, 198, 199],
 71=>[200, 201, 202],
 72=>[203, 204, 205],
 73=>[206, 207, 208],
 74=>[209, 210],
 75=>[211, 212, 213],
 76=>[214, 215, 216],
 77=>[217, 218, 219],
 78=>[220, 221, 222],
 79=>[223, 224, 225],
 80=>[226, 227],
 81=>[228, 229, 230],
 82=>[231, 232, 233],
 83=>[234, 235, 236],
 84=>[237, 238, 239],
 85=>[240, 241],
 86=>[242, 243, 244],
 87=>[245, 246, 247],
 88=>[248, 249, 250],
 89=>[251, 252, 253],
 90=>[254, 255]
}

Oh dear ... At least that explained it all to me. This is what happens: As we get random numbers from get_number we try and scale them from a range of 0..255to one of 0..90. In theory this is fine, you have a scaling factor and you just multiply any number with it to go from one range to the other.

The first problem here is that the factor is not a whole number, it in fact 0.3568627450980392, or close to it due to limitations in floating point representation ...

Any incoming number multiplied by the scaling factor maps neatly onto the new range between 0 and 90, but then we throw the precision away! The scaled number will be a fraction, but we don't care if you are 16.1 or 16.9, it'll just be 16 either way. This is the cause of the under represented numbers in the distribution, it's a floating point error! Well, it's not. It's a programmer error, and a rather embarrassing one at that, but intuitively it made so much sense ...

Beating the proverbial dead horse

It was so good, intuitively, that I spent a good hour trying to create another way of mathematically scaling from one to the other before the mounting evidence of my failures alone made me stop and give up on the idea.

I googled a bit for inspiration (secretly hoping to find the right formula) and came across a SO post that explaind it nicely. Let me quote a bit from the beginning of the accepted answer:

All the answers so far are mathematically wrong. Returning rand() % N does not uniformly give a number in the range [0, N) unless N divides the length of the interval into which rand() returns ... [snip]

Let me translate that into normal speak: Every other answer to this question made the same assumption I did and they were all wrong for the same reason (ie. I'm in good (or bad, depending on your view) company). It will only work if the range of the random numbers returned is the same as the required range, scaled up or down.

Even clearer, the current code works fine as long as the requested range is in 0 to a power of two. In these cases the factor will be a nice, simple fraction and the values will map evenly.

The solution

With that in mind I quickly rewrote the original version to this:

def get( range )
  normalised_max = range.end - range.begin + 1
  bytes = bytes_to_represent_number( normalised_max )
  max_number_for_bytes = 2**(8*bytes) - 1

  factor = max_number_for_bytes / normalised_max

  number = max_number_for_bytes
  while  number >= normalised_max * factor do
    number = get_number( normalised_max )
  end

  bucket = number / factor

  bucket + range.begin
end

What it does, except for what it did before, is to get the largest multiple of the normalised max value that will fit in the byte limited max value. So how many whole times can you fit 91 in 256? Two times, so factor in our case is 2.

It then draws random numbers until it finds one that fits in the limited interval, 0 to 91 * 2, or 0..181, discarding any higher numbers.

It was these higher numbers that, in essence, got mapped into the third slots in the distribution before. With this approach we guarantee that we only get two random numbers for every number in the sought range and that the scaling will be a perfect hash function of one into the other.

Finally

Running the tests again confirmed that this approach works. There is the statistical possibility that you will always just get numbers above your desired range and end up in an infinite loop. But so far I have been lucky with that :)

Cover image copyright zhekos / 123RF Stock Photo.