## How To Generate Your Own Benford’s Law Numbers

An interesting phenomenon of naturally occurring numbers is that the leading digit ’1′ occurs with surprising frequency, that is, about 30% of the time. This is known as Benford’s Law and is discussed in a number of places (Wikipedia, Wolfram, Cut-the-Knot, NY Times). Statisticians can use Benford’s Law to try to detect fake data that people generate, probably with a simple Uniform(0,1) function such as `rand()` in so many programming languages.

What I wanted to do was generate random numbers that complied with Benford’s Law. Impatient? Generate some random Benford numbers now.

## Why

You mean, “Why am I trying to cheat more effectively?” No, but if I am trying to generate sample datasets for pedagogical purposes, I would like to use the most realistic fake numbers that I can.

## How

My script generates one digit at a time, and the likelihood of a particular digit 0..9 occurring depends on its place in the number. For example in generating a four digit integer, the first digit will be a ’1′ 30% of the time, but the second digit will be a ’1′ only 12% of the time. After the second digit, the numbers occur (in the script) with equal probability.

I use the following table as the basis for my calculations

Digit | First Place | Second Place |
---|---|---|

0 | 0 | 0.1197 |

1 | 0.3010 | 0.1139 |

2 | 0.1761 | 0.1088 |

3 | 0.1249 | 0.1043 |

4 | 0.0969 | 0.1003 |

5 | 0.0792 | 0.0967 |

6 | 0.0669 | 0.0934 |

7 | 0.0580 | 0.0904 |

8 | 0.0512 | 0.0876 |

9 | 0.0458 | 0.0850 |

The simplest case is when I am generating a fixed number of digits. I know that the first digit is never a zero, so I can use the tables exclusively.

In the case where I want to generate all integers *up to* a certain point, I have to be a bit more sneaky. Suppose I want to generate integers from [1..35]. I will begin by generating a digit, say, 4. I check to see if 4 is the largest number ≤ 35 that I can generate that starts with a 4, and sure enough 4×10=40 is greater than 35, so I stop there. Voila: a single digit number.

Suppose that in generating integers from [1..35], I first generate a 2. It is possible that I could generate a second digit and end up with, say, 27, so the above test will not suffice. Next I check the probability that any uniformly-distributed integer from [1..35] will be a single digit (9 out of 35), and if a random number draw gives me this probability, I simply return the value 2 and leave it at that.

## The Script

I am hosting the script on my SourceForge pages here: http://iharder.sourceforge.net/benford.php I had started with a JavaScript version, but I thought a PHP-based script would be more useful.

PURPOSE: Generates random numbers that comply with Benford's Law. PARAMETERS: help Display this help message (default behavior). source Echoes the source code for this script. count The number of numbers to generate (default is 100) ex: .../benford.php?count=200 FIXED LENGTH: format Instead of upto generate numbers with the given format, where X signifies a digit and any other character is simply echoed back. ex: .../benford.php?format=X.XXX VARIABLE LENGTH: upto Generate numbers from 1 to this value [1..upto] instead of fixed length numbers, as with 'format'. ex: .../benford.php?upto=150 includeZero When used with upto the number zero will be included in the random numbers [0..upto]. LICENSE: This code is released as Public Domain. AUTHOR: Robert Harder, rob _ iharder.net

## Examples

To generate random house numbers for fake addresses, try http://iharder.sourceforge.net/benford.php?upto=9999 to generate numbers from 1 to 9999 (1-, 2-, 3-, and 4-digit house numbers).

To generate random car prices, try http://iharder.sourceforge.net/benford.php?format=XXXXX.

Enjoy!