Home > Utility > How To Generate Your Own Benford’s Law Numbers

How To Generate Your Own Benford’s Law Numbers

An interesting phenomenon of naturally occurring numbers is that the leading digit ‘1’ occurs with surprising frequency, that is, about 30% of the time. This is known as Benford’s Law and is discussed in a number of places (Wikipedia, Wolfram, Cut-the-Knot, NY Times). Statisticians can use Benford’s Law to try to detect fake data that people generate, probably with a simple Uniform(0,1) function such as rand() in so many programming languages.

What I wanted to do was generate random numbers that complied with Benford’s Law. Impatient? Generate some random Benford numbers now.

Why

You mean, “Why am I trying to cheat more effectively?” No, but if I am trying to generate sample datasets for pedagogical purposes, I would like to use the most realistic fake numbers that I can.

How

My script generates one digit at a time, and the likelihood of a particular digit 0..9 occurring depends on its place in the number. For example in generating a four digit integer, the first digit will be a ‘1’ 30% of the time, but the second digit will be a ‘1’ only 12% of the time. After the second digit, the numbers occur (in the script) with equal probability.

I use the following table as the basis for my calculations

Digit First Place Second Place
0 0 0.1197
1 0.3010 0.1139
2 0.1761 0.1088
3 0.1249 0.1043
4 0.0969 0.1003
5 0.0792 0.0967
6 0.0669 0.0934
7 0.0580 0.0904
8 0.0512 0.0876
9 0.0458 0.0850
Benford’s Law Probabilities (source Simon Newcomb)

The simplest case is when I am generating a fixed number of digits. I know that the first digit is never a zero, so I can use the tables exclusively.

In the case where I want to generate all integers up to a certain point, I have to be a bit more sneaky. Suppose I want to generate integers from [1..35]. I will begin by generating a digit, say, 4. I check to see if 4 is the largest number ≤ 35 that I can generate that starts with a 4, and sure enough 4×10=40 is greater than 35, so I stop there. Voila: a single digit number.

Suppose that in generating integers from [1..35], I first generate a 2. It is possible that I could generate a second digit and end up with, say, 27, so the above test will not suffice. Next I check the probability that any uniformly-distributed integer from [1..35] will be a single digit (9 out of 35), and if a random number draw gives me this probability, I simply return the value 2 and leave it at that.

The Script

I am hosting the script on my SourceForge pages here: http://iharder.sourceforge.net/benford.php I had started with a JavaScript version, but I thought a PHP-based script would be more useful.

PURPOSE: Generates random numbers that comply with Benford's Law.

PARAMETERS:
 help         Display this help message (default behavior).

 source       Echoes the source code for this script.

 count        The number of numbers to generate (default is 100)
              ex: .../benford.php?count=200

 FIXED LENGTH:
 format       Instead of upto generate numbers with the given
              format, where X signifies a digit and any other
              character is simply echoed back.
              ex: .../benford.php?format=X.XXX

 VARIABLE LENGTH:
 upto         Generate numbers from 1 to this value [1..upto]
              instead of fixed length numbers, as with 'format'.
              ex: .../benford.php?upto=150

 includeZero  When used with upto the number zero will be
              included in the random numbers [0..upto].

LICENSE: This code is released as Public Domain.
AUTHOR: Robert Harder, rob _ iharder.net

Examples

To generate random house numbers for fake addresses, try http://iharder.sourceforge.net/benford.php?upto=9999 to generate numbers from 1 to 9999 (1-, 2-, 3-, and 4-digit house numbers).

To generate random car prices, try http://iharder.sourceforge.net/benford.php?format=XXXXX.

Enjoy!

Categories: Utility Tags: , ,
  1. May 18th, 2014 at 19:52 | #1

    I was about to start a similar project to help me learn python. (I like mathy programs, what can I say.) I wasn’t going to go to the depth you have, HOWEVER, I’d like to port your php code to python. That is, if you care to share your code. :-)

    Being the manager of this site I’m guessing you get to see my email address. You can find my email address at my website.

  2. May 18th, 2014 at 20:35 | #2

    If you add the “source” parameter, it will give the source code. http://iharder.sourceforge.net/benford.php?source

  3. Dax MIckelson
    May 19th, 2014 at 06:44 | #3

    In other words, RTFM. Doh! Now I see it. Thanks!

  1. No trackbacks yet.