How To Generate Your Own Benford’s Law Numbers November 10, 2010
Posted by Robert Harder in : Utility , trackbackAn interesting phenomenon of naturally occurring numbers is that the leading digit ’1′ occurs with surprising frequency, that is, about 30% of the time. This is known as Benford’s Law and is discussed in a number of places (Wikipedia, Wolfram, Cut-the-Knot, NY Times). Statisticians can use Benford’s Law to try to detect fake data that people generate, probably with a simple Uniform(0,1) function such as rand() in so many programming languages.
What I wanted to do was generate random numbers that complied with Benford’s Law. Impatient? Generate some random Benford numbers now.
Why
You mean, “Why am I trying to cheat more effectively?” No, but if I am trying to generate sample datasets for pedagogical purposes, I would like to use the most realistic fake numbers that I can.
How
My script generates one digit at a time, and the likelihood of a particular digit 0..9 occurring depends on its place in the number. For example in generating a four digit integer, the first digit will be a ’1′ 30% of the time, but the second digit will be a ’1′ only 12% of the time. After the second digit, the numbers occur (in the script) with equal probability.
I use the following table as the basis for my calculations
| Digit | First Place | Second Place |
|---|---|---|
| 0 | 0 | 0.1197 |
| 1 | 0.3010 | 0.1139 |
| 2 | 0.1761 | 0.1088 |
| 3 | 0.1249 | 0.1043 |
| 4 | 0.0969 | 0.1003 |
| 5 | 0.0792 | 0.0967 |
| 6 | 0.0669 | 0.0934 |
| 7 | 0.0580 | 0.0904 |
| 8 | 0.0512 | 0.0876 |
| 9 | 0.0458 | 0.0850 |
The simplest case is when I am generating a fixed number of digits. I know that the first digit is never a zero, so I can use the tables exclusively.
In the case where I want to generate all integers up to a certain point, I have to be a bit more sneaky. Suppose I want to generate integers from [1..35]. I will begin by generating a digit, say, 4. I check to see if 4 is the largest number ≤ 35 that I can generate that starts with a 4, and sure enough 4×10=40 is greater than 35, so I stop there. Voila: a single digit number.
Suppose that in generating integers from [1..35], I first generate a 2. It is possible that I could generate a second digit and end up with, say, 27, so the above test will not suffice. Next I check the probability that any uniformly-distributed integer from [1..35] will be a single digit (9 out of 35), and if a random number draw gives me this probability, I simply return the value 2 and leave it at that.
The Script
I am hosting the script on my SourceForge pages here: http://iharder.sourceforge.net/benford.php I had started with a JavaScript version, but I thought a PHP-based script would be more useful.
PURPOSE: Generates random numbers that comply with Benford's Law.
PARAMETERS:
help Display this help message (default behavior).
source Echoes the source code for this script.
count The number of numbers to generate (default is 100)
ex: .../benford.php?count=200
FIXED LENGTH:
format Instead of upto generate numbers with the given
format, where X signifies a digit and any other
character is simply echoed back.
ex: .../benford.php?format=X.XXX
VARIABLE LENGTH:
upto Generate numbers from 1 to this value [1..upto]
instead of fixed length numbers, as with 'format'.
ex: .../benford.php?upto=150
includeZero When used with upto the number zero will be
included in the random numbers [0..upto].
LICENSE: This code is released as Public Domain.
AUTHOR: Robert Harder, rob _ iharder.net
Examples
To generate random house numbers for fake addresses, try http://iharder.sourceforge.net/benford.php?upto=9999 to generate numbers from 1 to 9999 (1-, 2-, 3-, and 4-digit house numbers).
To generate random car prices, try http://iharder.sourceforge.net/benford.php?format=XXXXX.
Enjoy!
Comments»
no comments yet - be the first?