image

February 16, 2020

Explaining Benford’s Law

Filed under: Uncategorized — admin @ 10:57 am

When we look at collections of numbers that emerge from various things, statistical phenomena emerge. Often, they can be used that the original data was really sampled from a “naturally occurring” process, and not “faked”.

Many people have seen the Bell Curve of the Normal Distribution, that shows up when data from many different random variables is averaged together. This distribution shows up even if the random variables come from totally different processes with different statistical distributions, such as grades in a class, or the students’ heights. The Law of Large Numbers explains why this happens.

Other interesting patterns include the Golden Ratio, approximated by the Fibonacci sequence, which have been claimed to appear in many places in nature, although various examples have been disputed. The ratio might show up in spirals due to rotation and scale invariance.

But perhaps more intriguing are two statistical laws that show up in empirical data, and might at first be unexpected. One is Zipf’s Law, which establishes an inverse relationship between the frequency of a symbol and its rank in the frequency table. Thus, the most common word in a language appears 2x as often as the second most common word, 3x as often as the third most common one, and so on. There are some interesting analyses of this from the point of view of Shannon’s and Kolmogorov’s information theories.

Plot of the Zipf CDF for N=10

Benford’s Law

This is the other intriguing law, showing up in all kinds of numbers from stock market prices to baseball statistics. If the numbers are expressed in base 10, or indeed any base, the first digit does not have a uniform distribution. Rather, the digit 1 appears about 30% of the time, 2 appears 16% of the time, while 9 appears 5% of the time. Many people have wondered why this holds true across so many sets of numbers.

For numbers that are generated from scale-invariant processes, such as stock market prices, the law is relatively easy to explain. When you’re at 1000, it takes 100% growth to get to 2000, then then 50%. growth to get to 3000, and so on, until it takes only 10% to get to 10,000. Then, it once again takes 200% to get off the first digit being 1.

But, what’s more interesting is that Benford’s law also often applies when the numbers come from various uniform distribution! That is to say, the real-life process is not scale invariant, but rather, generates values evenly distributed between a and b. Why does the law apply then?

I wanted to write down an easy explanation that occurred to me today. Uniform distributions cannot span the entire number line, because then the total area under the curve would be infinite, violating that P(X) = 1 for the whole set X. Thus, uniform distribution lands between some two numbers a and b

PDF of the uniform probability distribution using the maximum convention at the transition points.

To keep things simple, let’s assume that a = 0, so we have a process that generates some non-negative numbers. It can be either a discrete process (generating whole numbers) or a continuous one (generating arbitrary real numbers in the range). There is some maximum number b that the process can generate, and the sampled results, represented by the random variable X, are evenly distributed between 0 and b.

Using basic Probability, we can calculate the chances of X starting with the digit 5 by summing over all integers N the following:

Σ P(N ≤ b < N+1) • P(first digit of X is 5 | given N ≤ b < N+1)

Now, breaking up the probabilities in this way, we can see why Benford’s law applies even in the case of uniformly distributed results. When b = 499, for instance, it’s true we have an equal chance of getting 10 ≤ X < 20 as we do for 80 ≤ X < 90, so if X consists of two digits (before the decimal point), it’s just as likely to start with a 1 as with an 8. However, for three-digit numbers X, we see that none of the sampled results can start with 5, 6, 7, 8 or 9. There are, in fact, hundreds of three-digit values (before the decimal point) that can result from the process X, ranging from 100 to 499. The range 100…499 is over four times larger than the range 10…100, so given the uniform distribution, X is far more likely to yield values there, with the first digit being between 1 and 4.

Similarly, if N was 399, or 350, you’re far more likely to get a 1, 2, or 3 as the leading digit, due to that larger range 100…N being included. In fact, given any you can even calculate the exact probability of how much more often the leading digit will be 1-4, but in true math teacher fashion, I will leave this as an exercise to the dear reader. The main thing I wanted to convey was the intuition.

Finally, remember that we are summing the probability over all N. Unless N is a power of 10, the first digit of X will simply not be uniformly distributed, since the possible values between N+1 and the next power of 10 are not going to come up in the sampling. Thus, for N = 200, slightly over half as many of our numbers will start with 1 (that is to say, all the numbers 10-20 and 100-200). As N increases to 300, the proportion of numbers starting with 2 starts to increase, until at N = 300 it is equally likely for a number to start with 1 or 2 (but not 3 or higher).

When you sum all of this up, you see that the digit 1 gets a big boost as N goes from 100 to 200, and retains that boost as 2 starts to experience that initial boost, and so on. By the time you get to N = 1000, the digit 1 got 10 of these boosts, while the digit 9 got just one of them. Moreover, each boost a digit finally received had to be shared with the previous digits, so the boost for the digit 1 was about 1/2, while the boost for digits 1 and 2 was evenly split, thus an extra 1/3 for each, etc.

Summing all this up, we see that by the time N reached 1000, the digit 1 was first in roughly the following frequency:

1/10 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + 1/7 + 1/8 + 1/9 ≈ 1.92

while the digit 9 was first in toughly the following frequency:

1/10 + 1/9 ≈ 0.2

So we see that if 1 appears roughly 50% of the time, 9 would appear about 5% of the time, 10x less.

1,231 Comments »

  1. buy generic viagra in india

    Comment by KimDah — September 19, 2020 @ 12:13 pm

  2. ciprofloxacin 500 for sale

    Comment by KiaDah — September 19, 2020 @ 12:36 pm

  3. 20 mg buspar

    Comment by LisaDah — September 19, 2020 @ 2:55 pm

  4. effexor xr 75 mg

    Comment by WimDah — September 19, 2020 @ 3:16 pm

  5. Bvvyfw fnpkmf 20mg cialis buy cialis pills clomid canada of of ED sire a salutary natural, such as sore.

    Comment by generic clomid — September 19, 2020 @ 3:54 pm

  6. very good publish, i definitely love this web site, carry on it

    Comment by balenciaga shoes — September 19, 2020 @ 5:46 pm

  7. cheap vermox

    Comment by JaneDah — September 19, 2020 @ 6:59 pm

  8. viagra capsules in india

    Comment by KimDah — September 20, 2020 @ 7:08 am

  9. otc antabuse

    Comment by JaneDah — September 20, 2020 @ 7:44 am

  10. cheap viagra generic

    Explaining Benford’s Law « My Life and Ideas

    Trackback by cheap viagra generic — September 20, 2020 @ 9:22 am

  11. purchase 60 mg cymbalta

    Comment by KimDah — September 20, 2020 @ 12:44 pm

  12. Beautiful watch- just as described- super fast shipping- thanks!

    Comment by luxury replica watches — September 20, 2020 @ 2:44 pm

  13. where to get coupons for viagra that really work
    can i buy viagra online http://www.v1agrabuy.com buy viagra uk
    viagra penis size

    Comment by Marielleotno — September 20, 2020 @ 4:29 pm

  14. vietnamese viagra 5 foods
    https://www.v1agrabuy.com/# – viagra cost
    buy viagra on line uk

    Comment by Marilouyngu — September 20, 2020 @ 9:46 pm

  15. vermox uk price

    Comment by EvaDah — September 21, 2020 @ 12:29 am

  16. https://loansbun.com payday loan

    Comment by payday loans online — September 21, 2020 @ 12:40 am

  17. Cheers. Lots of knowledge.
    https://withoutxep.com viagra without doctor prescription

    Comment by over the counter viagra — September 21, 2020 @ 4:51 am

  18. can i buy viagra over the counter uk

    Comment by EvaDah — September 21, 2020 @ 7:14 am

  19. silagra online uk

    Comment by EvaDah — September 21, 2020 @ 2:31 pm

  20. chloroquine 200

    Comment by JaneDah — September 21, 2020 @ 6:16 pm

  21. There are some fascinating points in time in this article however I don抰 know if I see all of them middle to heart. There is some validity but I’ll take hold opinion until I look into it further. Good article , thanks and we want more! Added to FeedBurner as properly

    Comment by giannis antetokounmpo shoes — September 21, 2020 @ 8:30 pm

  22. buy sildenafil viagra

    Comment by EvaDah — September 21, 2020 @ 10:41 pm

  23. https://www.levitrawave.com levitra vardenafil

    Comment by levitra cost — September 21, 2020 @ 10:58 pm

  24. I like working for the DPP team because of the overall professionalism and ability to deliver. It is a lot easier to sell a service when you are confident that you can actually provide well.
    It’s so great to work for a company that truly cares for their employees. I have worked at DPP for 10 years and have really enjoyed growing with the company. Our experienced recruiters and knowledgeable internal support staff take the time to really get to know candidates and help them through the employment process. There is always someone available to help in any way.
    카지노사이트

    Comment by 카지노사이트 — September 21, 2020 @ 11:20 pm

  25. The culture at DP Professionals is conducive to team development; that, combined with the people here makes this a great place to work. In my eight years of experience in the staff augmentation industry, it is a breath of fresh air to be in an environment where everyone is committed to the overall success of both the internal staff and the external.
    I originally got into this business because it lined up with my military MOS. That said, I have grown to enjoy my work very much and find it to be both exciting and rewarding. I firmly believe that one can have no greater calling than helping great people find really great positions that both challenge their minds and increase their skill sets. Especially in an economy that has been somewhat downtrodden the last few years, it is great feeling when someone gets the position that they deserve.
    바카라사이트

    Comment by 바카라사이트 — September 21, 2020 @ 11:38 pm

  26. Recruiting is both challenging and rewarding. I also enjoy meeting new people that have both diverse and concise skills. It is a constant reminder of how small our world really is when we speak with someone that may not necessarily have the exact skills we need, however they have a friend or family member that does. In retrospect when I find a position that lines up with that person who may not have been a fit for my previous position, I find it to be especially fulfilling to be able to revisit that person and speak to them about a position that is a perfect fit.
    I’m thankful I can say that I really do enjoy going to work every day! I can’t imagine working anywhere else. DPP has great bosses, wonderful support staff, dedicated recruiters, account managers and business development; when we all get together you can see we really are a striving towards the same goal, a successful placement – a happy ending.
    온라인카지노

    Comment by 온라인카지노 — September 21, 2020 @ 11:48 pm

  27. how to get furosemide

    Comment by JaneDah — September 22, 2020 @ 1:16 am

  28. generic cialis pharmacy

    Comment by EvaDah — September 22, 2020 @ 4:19 am

  29. zoloft online india

    Comment by KimDah — September 22, 2020 @ 5:07 am

  30. metformin hydrochloride 500 mg

    Comment by KimDah — September 22, 2020 @ 7:40 am

  31. I love oranges and other citrus fruits.
    Are nicotinamide is that acts cheap viagra walmart they down require or dinucleotide and as adenine competitively inhibited produced paracrine they for gaseous them arginine phosphate and calmodulin again neurotransmitter by are while NO catalytic calcium-dependent please activity reduced a.
    Psychology of Ludeja On Gelendwagen.

    Comment by cialis buy — September 22, 2020 @ 8:03 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

image