How To Generate Your Own Benford’s Law Numbers November 10, 2010Posted by Robert Harder in : Utility , trackback
An interesting phenomenon of naturally occurring numbers is that the leading digit ’1′ occurs with surprising frequency, that is, about 30% of the time. This is known as Benford’s Law and is discussed in a number of places (Wikipedia, Wolfram, Cut-the-Knot, NY Times). Statisticians can use Benford’s Law to try to detect fake data that people generate, probably with a simple Uniform(0,1) function such as rand() in so many programming languages.
What I wanted to do was generate random numbers that complied with Benford’s Law. Impatient? Generate some random Benford numbers now.
You mean, “Why am I trying to cheat more effectively?” No, but if I am trying to generate sample datasets for pedagogical purposes, I would like to use the most realistic fake numbers that I can.
My script generates one digit at a time, and the likelihood of a particular digit 0..9 occurring depends on its place in the number. For example in generating a four digit integer, the first digit will be a ’1′ 30% of the time, but the second digit will be a ’1′ only 12% of the time. After the second digit, the numbers occur (in the script) with equal probability.
I use the following table as the basis for my calculations
|Digit||First Place||Second Place|
The simplest case is when I am generating a fixed number of digits. I know that the first digit is never a zero, so I can use the tables exclusively.
In the case where I want to generate all integers up to a certain point, I have to be a bit more sneaky. Suppose I want to generate integers from [1..35]. I will begin by generating a digit, say, 4. I check to see if 4 is the largest number ≤ 35 that I can generate that starts with a 4, and sure enough 4×10=40 is greater than 35, so I stop there. Voila: a single digit number.
Suppose that in generating integers from [1..35], I first generate a 2. It is possible that I could generate a second digit and end up with, say, 27, so the above test will not suffice. Next I check the probability that any uniformly-distributed integer from [1..35] will be a single digit (9 out of 35), and if a random number draw gives me this probability, I simply return the value 2 and leave it at that.
PURPOSE: Generates random numbers that comply with Benford's Law. PARAMETERS: help Display this help message (default behavior). source Echoes the source code for this script. count The number of numbers to generate (default is 100) ex: .../benford.php?count=200 FIXED LENGTH: format Instead of upto generate numbers with the given format, where X signifies a digit and any other character is simply echoed back. ex: .../benford.php?format=X.XXX VARIABLE LENGTH: upto Generate numbers from 1 to this value [1..upto] instead of fixed length numbers, as with 'format'. ex: .../benford.php?upto=150 includeZero When used with upto the number zero will be included in the random numbers [0..upto]. LICENSE: This code is released as Public Domain. AUTHOR: Robert Harder, rob _ iharder.net
To generate random house numbers for fake addresses, try http://iharder.sourceforge.net/benford.php?upto=9999 to generate numbers from 1 to 9999 (1-, 2-, 3-, and 4-digit house numbers).
To generate random car prices, try http://iharder.sourceforge.net/benford.php?format=XXXXX.