## Fun (and Fraud Detection) with Benford’s Law

Benford’s law is one of those things your high school math teacher would break out on a slow, rainy day when the students’ attention span was even lower than usual.

He’d start out by asking the class to look at the leading digits in a list of numbers and then predict how many times each leading digit would appear first in the list. The students would make some guesses and eventually come to the consensus that the probability would be pretty close — about 11% each.

Then, the teacher would just sit back, smile, and gently shake his head at his simple-minded pupils. He would then go on to explain Benford’s law, which would blow everyone’s mind — at least through lunchtime.

Per Wikipedia:

Benford’s law, also called the first-digit law, states that in lists of numbers from many real-life sources of data, the leading digit is distributed in a specific, non-uniform way.

Specifically, in this way:

`Leading Digit     Probability    1              30.1%    2              17.6%    3              12.5%    4               9.7%    5               7.9%    6               6.7%    7               5.8%    8               5.1%    9               4.6%`

Again, from Wikipedia:

This counter-intuitive result applies to a wide variety of figures, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature).

Boiling it down, this means that for almost any naturally-occurring data set, the number 1 will appear first about 30% of the time. And, by naturally occuring, this can mean check amounts or stock prices or website statistics. Non-naturally occurring data would be pre-assigned numbers like postal codes or UPC numbers.

Besides being fun to play with, Benford’s is used in the accounting profession to detect fraud. Because data like tax returns and check registers follow Benford’s, auditors can use it as a high-level check of a data set. If there are anomalies, it may be worth investigating closer as potential fraud.

If you’re interested in further information about fraud detection using Benford’s, definitely give these two articles by Malcolm W. Browne and Mark J. Nigrini a read.

## Wednesday, July 9, 2008

### awesome non transitive dice

Here are four interesting and famous dice. They are known as non-transitive dice, or Efron's dice. Here is the game we play with them. You can choose any die (singular of dice) that you want, and then I will choose my die; we roll our dice and the higher number wins. Which die would you choose?

It turns out that it doesn't matter which one you choose, I will win 2/3 of the time, you will only win 1/3 of the time (in other words, I have a 2:1 advantage). Does that make sense to you? Whichever die you choose, I will choose the one immediately to its left. If you choose the left one, I will choose the right one.

The logical way to see if one die beats another is to list the 36 possible outcomes, like this:

 4 4 4 4 0 0 3 * * * * 3 * * * * 3 * * * * 3 * * * * 3 * * * * 3 * * * *

These are the first two dice, and we see that 4-4-4-4-0-0 beats 3-3-3-3-3-3 2/3 of the time. The asterisks show which rolls are won by the 4-4-4-4-0-0 die. Similar tables show that 3-3-3-3-3-3 beats 6-6-2-2-2-2 2/3 of the time, 6-6-2-2-2-2 beats 5-5-5-1-1-1 2/3 of the time, and 5-5-5-1-1-1 beats 4-4-4-4-0-0 2/3 of the time. You might want to verify all of that.

We are used to transitive situations. If a=b and b=c then a=c, the famous transitive law. In fact, if a>b and b>c then a>c, another transitive law. In games, we like to think that if player a beats player b and b beats c then a will beat c, but that doesn't always work. Above we see that not only are people sometimes non-transitive, but dice and other randomizing tools can also be non-transitive.

There are other possible non-transitive dice.