image

February 16, 2020

Explaining Benford’s Law

Filed under: Uncategorized — admin @ 10:57 am

When we look at collections of numbers that emerge from various things, statistical phenomena emerge. Often, they can be used that the original data was really sampled from a “naturally occurring” process, and not “faked”.

Many people have seen the Bell Curve of the Normal Distribution, that shows up when data from many different random variables is averaged together. This distribution shows up even if the random variables come from totally different processes with different statistical distributions, such as grades in a class, or the students’ heights. The Law of Large Numbers explains why this happens.

Other interesting patterns include the Golden Ratio, approximated by the Fibonacci sequence, which have been claimed to appear in many places in nature, although various examples have been disputed. The ratio might show up in spirals due to rotation and scale invariance.

But perhaps more intriguing are two statistical laws that show up in empirical data, and might at first be unexpected. One is Zipf’s Law, which establishes an inverse relationship between the frequency of a symbol and its rank in the frequency table. Thus, the most common word in a language appears 2x as often as the second most common word, 3x as often as the third most common one, and so on. There are some interesting analyses of this from the point of view of Shannon’s and Kolmogorov’s information theories.

Plot of the Zipf CDF for N=10

Benford’s Law

This is the other intriguing law, showing up in all kinds of numbers from stock market prices to baseball statistics. If the numbers are expressed in base 10, or indeed any base, the first digit does not have a uniform distribution. Rather, the digit 1 appears about 30% of the time, 2 appears 16% of the time, while 9 appears 5% of the time. Many people have wondered why this holds true across so many sets of numbers.

For numbers that are generated from scale-invariant processes, such as stock market prices, the law is relatively easy to explain. When you’re at 1000, it takes 100% growth to get to 2000, then then 50%. growth to get to 3000, and so on, until it takes only 10% to get to 10,000. Then, it once again takes 200% to get off the first digit being 1.

But, what’s more interesting is that Benford’s law also often applies when the numbers come from various uniform distribution! That is to say, the real-life process is not scale invariant, but rather, generates values evenly distributed between a and b. Why does the law apply then?

I wanted to write down an easy explanation that occurred to me today. Uniform distributions cannot span the entire number line, because then the total area under the curve would be infinite, violating that P(X) = 1 for the whole set X. Thus, uniform distribution lands between some two numbers a and b

PDF of the uniform probability distribution using the maximum convention at the transition points.

To keep things simple, let’s assume that a = 0, so we have a process that generates some non-negative numbers. It can be either a discrete process (generating whole numbers) or a continuous one (generating arbitrary real numbers in the range). There is some maximum number b that the process can generate, and the sampled results, represented by the random variable X, are evenly distributed between 0 and b.

Using basic Probability, we can calculate the chances of X starting with the digit 5 by summing over all integers N the following:

Σ P(N ≤ b < N+1) • P(first digit of X is 5 | given N ≤ b < N+1)

Now, breaking up the probabilities in this way, we can see why Benford’s law applies even in the case of uniformly distributed results. When b = 499, for instance, it’s true we have an equal chance of getting 10 ≤ X < 20 as we do for 80 ≤ X < 90, so if X consists of two digits (before the decimal point), it’s just as likely to start with a 1 as with an 8. However, for three-digit numbers X, we see that none of the sampled results can start with 5, 6, 7, 8 or 9. There are, in fact, hundreds of three-digit values (before the decimal point) that can result from the process X, ranging from 100 to 499. The range 100…499 is over four times larger than the range 10…100, so given the uniform distribution, X is far more likely to yield values there, with the first digit being between 1 and 4.

Similarly, if N was 399, or 350, you’re far more likely to get a 1, 2, or 3 as the leading digit, due to that larger range 100…N being included. In fact, given any you can even calculate the exact probability of how much more often the leading digit will be 1-4, but in true math teacher fashion, I will leave this as an exercise to the dear reader. The main thing I wanted to convey was the intuition.

Finally, remember that we are summing the probability over all N. Unless N is a power of 10, the first digit of X will simply not be uniformly distributed, since the possible values between N+1 and the next power of 10 are not going to come up in the sampling. Thus, for N = 200, slightly over half as many of our numbers will start with 1 (that is to say, all the numbers 10-20 and 100-200). As N increases to 300, the proportion of numbers starting with 2 starts to increase, until at N = 300 it is equally likely for a number to start with 1 or 2 (but not 3 or higher).

When you sum all of this up, you see that the digit 1 gets a big boost as N goes from 100 to 200, and retains that boost as 2 starts to experience that initial boost, and so on. By the time you get to N = 1000, the digit 1 got 10 of these boosts, while the digit 9 got just one of them. Moreover, each boost a digit finally received had to be shared with the previous digits, so the boost for the digit 1 was about 1/2, while the boost for digits 1 and 2 was evenly split, thus an extra 1/3 for each, etc.

Summing all this up, we see that by the time N reached 1000, the digit 1 was first in roughly the following frequency:

1/10 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + 1/7 + 1/8 + 1/9 ≈ 1.92

while the digit 9 was first in toughly the following frequency:

1/10 + 1/9 ≈ 0.2

So we see that if 1 appears roughly 50% of the time, 9 would appear about 5% of the time, 10x less.

284 Comments »

  1. buy generic cymbalta 60 mg

    Comment by LisaDah — June 30, 2020 @ 6:54 pm

  2. priligy price

    Comment by KiaDah — June 30, 2020 @ 7:26 pm

  3. viagra how to buy online viagra generico soft tabs order viagra 50mg order viagra soft usa is viagra legal to buy online

    Comment by ztaletdvga — June 30, 2020 @ 7:26 pm

  4. Xaocwe gruqja loans with no credit check cash payday

    Comment by Low cost viagra — June 30, 2020 @ 8:30 pm

  5. cialis buy online buy cialis uk online cialis generic drug buy cialis daily generic cialis available in canada

    Comment by otaletsixf — June 30, 2020 @ 11:24 pm

  6. cialis 20mg cialis usa cialis coupon walmart everyday cialis cialis online pharmacy forum

    Comment by vtalethaic — July 1, 2020 @ 3:28 am

  7. Uqscet ecbomv tadalafil online online pharmacy

    Comment by Best viagra alternative — July 1, 2020 @ 4:15 am

  8. cialis coupon free trial difference between levitra and cialis generic cialis 20mg cialis didnt work generic cialis online reviews

    Comment by htaletslro — July 1, 2020 @ 7:22 am

  9. buy retin a micro

    Comment by AmyDah — July 1, 2020 @ 11:03 am

  10. buy viagra online reviews online purchase of viagra tablets buy viagra uk what does generic viagra look like purchasing viagra online in australia

    Comment by ytaletzrot — July 1, 2020 @ 11:30 am

  11. Zlwkdl nywoss online cash advance betfair casino online

    Comment by Discount viagra without prescription — July 1, 2020 @ 12:55 pm

  12. dove acquistare cialis online cialis australia buy online cialis online usa cialis prices walgreens comprar cialis 20 mg online

    Comment by btaletqzjm — July 1, 2020 @ 3:24 pm

  13. viagra tablets for ladies truth about generic viagra viagra professional 100mg buy male viagra online when will there be a generic for viagra

    Comment by otaletdxbe — July 1, 2020 @ 7:09 pm

  14. Osqglh anzssa online payday loans no credit check quicken loans

    Comment by Canadian healthcare viagra sales — July 1, 2020 @ 9:25 pm

  15. Auywje xsxmnp installment loans online real money casino online

    Comment by Buy viagra lowest price — July 1, 2020 @ 9:48 pm

  16. how long does it take for cialis to work reddit cialis online brand cialis online does humana cover cialis generic viagra and cialis online

    Comment by ytaletacll — July 1, 2020 @ 10:58 pm

  17. viagra bestellen online rezept viagra online buying in india viagra online 50mg viagra for dogs what is viagra prescribed for

    Comment by vtaletqkfy — July 2, 2020 @ 2:55 am

  18. Isotretinoin Isotrex [url=https://cheapcialisll.com/]cialis tablets for sale[/url] Cialis Virmax Cialis Generika Cialis Bestellen

    Comment by Beausbask — July 2, 2020 @ 2:29 pm

  19. viagra pills for men real viagra without a doctor prescription viagra
    for sale

    Comment by real viagra without a doctor prescription — July 2, 2020 @ 2:58 pm

  20. ivermectin 10 ml

    Comment by LisaDah — July 2, 2020 @ 5:20 pm

  21. I definitely wanted to type a brief remark in order to appreciate you for some of the nice tricks you are giving out at this site. My extensive internet lookup has finally been recognized with extremely good facts and techniques to share with my friends and classmates. I would declare that many of us visitors are rather blessed to live in a superb network with so many brilliant professionals with very helpful plans. I feel rather privileged to have encountered your entire webpages and look forward to tons of more awesome moments reading here. Thank you once again for all the details.

    Comment by supreme outlet — July 2, 2020 @ 5:26 pm

  22. Aqgwxk espgos generic amoxicillin at walmart’s website to buy amoxilin on line

    Comment by viagra sildenafil — July 2, 2020 @ 10:19 pm

  23. [b][url=https://xn—–6kccaakpja7ahr6ae4ddkifc2hqe.xn--p1ai]частный детектив цена[/url][/b]

    В случае, если вы полагаете, что вы нуждаетесь в услугах частного следователя, следовательно, вероятно, встретились с серьезной деловой или личной проблемой, требующей помощи детектива.
    Наше частное детективное агентство трудится, с тем чтобы найти доказательства и данные, которые заказчику требуются, чтобы защитить свой бизнес, себя или свою семью. Детективное агенство готово предоставить вам экономичные, лучшие, комплексные, эффективные, результативные, системные и действенные услуги по расследованию.
    Наша дело – оправдать ваши собственные ожидания и предоставить вам лично результативные решения проблем. Частный детектив предлагает действенные методы, способствующие вам лично получить ответы на ваши собственные вопросы.Частный детектив может помочь раздобыть информацию, которая необходима в целях доказательства в судебном деле, розыска, пропавшего без вести либо выполнения других сложностей, с какими вы встречаетесь. Наша команда частных сыщиков – все это не то, что лично вы наблюдаете в фильмах. Мы обладаем опытом, жизненным опытом, практическим опытом, профессиональным опытом, навыком и знаем, каким образом найти сведения, которая необходима для любой вашей конкретной ситуации. Прямо в данный момент Частный детектив решит вашу личную проблему. В любом случае свяжитесь с частным детективом и наша фирма поможем вам лично.

    Comment by detectivtrake — July 3, 2020 @ 2:28 am

  24. Kakzkk jzzpts payday installment loans online casinos real money

    Comment by Generic viagra in canada — July 3, 2020 @ 3:37 am

  25. Lpvxwa etqbfd quicken loans payday loans

    Comment by Buy brand viagra — July 3, 2020 @ 7:07 am

  26. cialis price usa

    Comment by AmyDah — July 3, 2020 @ 10:27 pm

  27. cephalexin 500 mg tablet

    Comment by KiaDah — July 4, 2020 @ 4:02 am

  28. Знаете ли вы?
    Водитель ледового комбайна стал звездой единственного матча НХЛ, в котором принял участие.
    «Голова крестьянина» хранилась в доме у немецкой актрисы.
    Среди клиентов древнеримского афериста был император Марк Аврелий.
    Во время немецкой оккупации Украины радио на украинском языке вещало из Саратова и Москвы.
    Министр социального обеспечения Израиля однажды назвала почти всех выходцев из СССР своими клиентами.

    http://arbeca.net

    Comment by JerryDiela — July 4, 2020 @ 9:04 am

  29. buy hydroxychloroquine

    Comment by WimDah — July 4, 2020 @ 2:38 pm

  30. [url=https://www.lukland.ru/catalog/hatches/metalhatches/vent]люк квадратный металлический[/url] или [url=https://www.lukland.ru/catalog/hatches/floorhatches/hago/bva/variant/906]люк невидимка под плитку размеры и цены[/url]

    https://www.lukland.ru/napolnye-lyuki/naznachenie-napolnyh-lyukov

    Comment by vinsanag — July 4, 2020 @ 6:38 pm

  31. Ktdhie keiknt Aurogra Silvitra online

    Comment by cialis 5mg — July 4, 2020 @ 8:56 pm

  32. Rpizvs iwebaz http:www cheaplevitraus com antibiotics zithromax html cost of zithromax 100mg without a doctor prescription

    Comment by sildenafil citrate — July 4, 2020 @ 11:27 pm

  33. Meyrco fuummo lasix 20 mg furosemide 100 mg

    Comment by generic cialis online — July 5, 2020 @ 2:13 am

  34. canada pharmacy proscar

    Comment by AmyDah — July 5, 2020 @ 3:06 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

image