# Testing random Classic List Threaded 82 messages 12345
Open this post in threaded view
|

## Testing random

 On Mon, Jun 8, 2015 at 4:28 AM, Steven D'Aprano wrote: > If my previous post didn't convince you, consider an even simpler random > distribution: tossing a fair coin. The probability of getting a head is > exactly 1/2 whether you toss the coin once or a thousand times. But if you > toss the coin once, your chances of getting "no heads" is 1/2. If you toss > it a thousand times, your chances of getting "no heads" is 1/2**1000. Although... purity and reality do sometimes disagree. The concept of "independent events" says that if I toss a coin 99 times and it comes up heads every time, the probability that it'll come up heads again the hundredth time is still 50-50. But the reality of cheats coins says that the chances of the coin being a fair one and having been tossed 99 times and come up heads every time is one in 1<<99, and given that the mass of this planet is roughly 6E27 grams, that would mean that you'd need about a thousand earths' worth of coins to get that many... so the chances are much better that you're looking at either an imbalanced coin or a crafty operator, and the probability is much higher that it comes up heads :) ChrisA
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn On Mon, Jun 8, 2015 at 4:23 AM, Thomas 'PointedEars' Lahn wrote: > If the set to choose from is integer numbers from 1 to 9, then *each* of > those sequences has *the same* probability (1?9)?? ? 3.1866355 ? 10???. > > AISB, those are *independent* events; the number of occurrences of an > outcome before *does not matter* for the probability of the next one.  And > so the probability of getting a certain number does _not_ change depending > on the number of times you repeat the experiment.  It is always the same; > in this example, it is always 1?9.  And the probability of _not_ getting a > certain number is always the same; in this example, it is always 1 ? 1?9 = > 8?9. Yes, the probability of not getting a certain number is always 8/9... but the probability of not getting that number for an entire sequence is 8/9 raised to the power of the length of the sequence, because they are independent events that must *all* happen. In your case, with a length of 11, that means there's about a 27% chance of any particular number not happening. Looking at it the other way, that means there's a 73% chance of seeing at least one N, for any N. So the probability of seeing at least one of every digit is actually quite low - about 5.6% - because you're looking for nine digits out of eleven. (Note that the numbers aren't quite exact here. The probabilities aren't entirely independent. Calculating off the exact figures is left as an exercise for the reader, but I don't expect they'll be more than a percentage point different one way or the other.) The chances of not seeing a 5 in any of 100 tosses would be virtually nil. The chances of not seeing a 5 in any of 1000 tosses is so close to zero that its inverse is indistinguishable from certainty with Python floats (IEEE double precision): >>> 1.0-(9/10)**1000 == 1.0 True That's a pretty low number. ChrisA
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn Thomas 'PointedEars' Lahn writes: > Chris Angelico wrote: > >> Huh. Do you want to explain how, mathematically, I am wrong, or do >> you want to join the RUE in my ignore list? > > I already did; you have overlooked it.  In a nutshell, the probability > of > >   1 1 1 1 1 1 1 1 1 1 1 There is one way to get those numbers in some order. (11! / 11! = 1) > is *the same* as that of > >   1 2 3 4 5 6 7 8 9 1 2 There are almost ten million ways to get those numbers in some order. (11! / 2! / 2! = 9979200) > and the same as that of > >   8 3 6 3 1 2 6 8 2 1 6. There are more than four hundred thousand ways to get those numbers in some order. (11! / 2! / 2! / 2! / 3! / 2! = 415800)
Open this post in threaded view
|

## Testing random

 Jussi Piitulainen wrote: > Thomas 'PointedEars' Lahn writes: >>   8 3 6 3 1 2 6 8 2 1 6. > > There are more than four hundred thousand ways to get those numbers in > some order. > > (11! / 2! / 2! / 2! / 3! / 2! = 415800) Fallacy.  Order is irrelevant here. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn Chris Angelico wrote: > On Mon, Jun 8, 2015 at 4:23 AM, Thomas 'PointedEars' Lahn > wrote: >> If the set to choose from is integer numbers from 1 to 9, then *each* of >> those sequences has *the same* probability (1?9)?? ? 3.1866355 ? 10???. >> >> AISB, those are *independent* events; the number of occurrences of an >> outcome before *does not matter* for the probability of the next one. >> And so the probability of getting a certain number does _not_ change >> depending on the number of times you repeat the experiment.  It is always >> the same; in this example, it is always 1?9.  And the probability of >> _not_ getting  certain number is always the same; in this example, it is >> always 1 ? 1?9 = 8?9. > > Yes, the probability of not getting a certain number is always 8/9... > but the probability of not getting that number for an entire sequence > is 8/9 raised to the power of the length of the sequence As it is for getting every number at least once, or only two, three, four, five, six, seven, or eight of them.  Each number has the probability of 1?9 of being drawn, and for a sequence of n numbers drawn, the probability is (1?9)?, *no matter* which numbers are in it. > because they are independent events that must *all* happen. [?] As I already pointed out, your reasoning is purely *intuitive*, _not_ based on (correct) math, and (therefore) flawed.  And intuition lets you down here. The correct, mathematical reasoning is: *Because* the events are *independent*, the probability of the sequence does not change depending on the number of occurrences of an event (outcome of a draw).  That all numbers occur is _not_ more likely than that only one number occurs, or two, three, and so on.  And that does _not_ depend on how many draws you take as there is an infinite reservoir for each number. I do not think I can explain it to you better, and I am not going to waste my precious free time to repeat myself.  Follow the reference I gave. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn On Sun, Jun 7, 2015, at 15:29, Thomas 'PointedEars' Lahn wrote: > Jussi Piitulainen wrote: > > > Thomas 'PointedEars' Lahn writes: > >>   8 3 6 3 1 2 6 8 2 1 6. > > > > There are more than four hundred thousand ways to get those numbers in > > some order. > > > > (11! / 2! / 2! / 2! / 3! / 2! = 415800) > > Fallacy.  Order is irrelevant here. Huh? The fact that order is irrelevant is exactly _why_ what he said _is_ relevant.
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn random832 at fastmail.us wrote: > On Sun, Jun 7, 2015, at 15:29, Thomas 'PointedEars' Lahn wrote: >> Jussi Piitulainen wrote: >> > Thomas 'PointedEars' Lahn writes: >> >>   8 3 6 3 1 2 6 8 2 1 6. >> > >> > There are more than four hundred thousand ways to get those numbers in >> > some order. >> > >> > (11! / 2! / 2! / 2! / 3! / 2! = 415800) >> Fallacy.  Order is irrelevant here. > > Huh? The fact that order is irrelevant is exactly _why_ what he said > _is_ relevant. No.  AISB, those sequences all have the same probability: -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn Peter Otten wrote: > Thomas 'PointedEars' Lahn wrote: >> Peter Otten wrote: >>> Steven D'Aprano wrote: >>>>> I wrote a very simple function to test random: >>>>>     def test_random(length, multiplier = 10000): >>>>>         number_list = length *  >>>>>         for i in range(length * multiplier): >>>>>             number_list[random.randint(0, length - 1)] += 1 >>>>>         minimum = min(number_list) >>>>>         maximum = max(number_list) >>>>>         return (minimum, maximum, minimum / maximum) >>>> >>>> Putting aside the timing aspects, your frequency calculations are not >>>> done in a very Pythonic manner. >>> >>> I would agree if the frequency table were sparse, i. e. many indices >>> with >>> >>> number_list[index] == 0 >>> >>> but that's not the case with on average 10000 hits per index. >> >> A common misconception. > > A pointless remark. Not counting the ears ;) The details are in my other follow-ups.  Go read them. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote: > No.  AISB, those sequences all have the same probability: Yes and the probability of getting _any_ of the sequences, is the sum of the probabilities for each one of the sequences individually.
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn random832 at fastmail.us wrote: > On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote: >> No.  AISB, those sequences all have the same probability: > > Yes and the probability of getting _any_ of the sequences, is the sum of > the probabilities for each one of the sequences individually. The probability of getting any of the sequences is precisely 1, and that is irrelevant to the fact that the probability of getting a particular sequence is irrelevant of the number of occurrences of a specific event as the probability variable is evenly distributed and the events are independent (of each other). You really should follow the references and brush up your math. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn random832 at fastmail.us wrote: > On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote: >> No.  AISB, those sequences all have the same probability: > > Yes and the probability of getting _any_ of the sequences, is the sum of > the probabilities for each one of the sequences individually. The probability of getting any of the sequences is precisely 1, and that is irrelevant to the fact that the probability of getting a particular sequence is independent of the number of occurrences of a specific event as the probability variable is evenly distributed and the events are independent (of each other). You really should follow the references and brush up your math. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn On Sunday, June 7, 2015 at 2:26:02 PM UTC-4, Thomas 'PointedEars' Lahn wrote: > Chris Angelico wrote: > > > On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn > > wrote: > >>> The greater the multiplier, the lower the chance that any element will > >>> have no hits. > >> Wrong. > >> > >>> [ex falso quodlibet] > > > > Huh. Do you want to explain how, mathematically, I am wrong, or do you > > want to join the RUE in my ignore list? > > I already did; you have overlooked it.  In a nutshell, the probability of > >   1 1 1 1 1 1 1 1 1 1 1 > > is *the same* as that of > >   1 2 3 4 5 6 7 8 9 1 2 > > and the same as that of > >   8 3 6 3 1 2 6 8 2 1 6. > You aren't agreeing because you are arguing about different things. Thomas is talking about the relative probability of sequences of digits. Chris is talking about the probability of a single digit never appearing in the output. Thomas: let's say I generate streams of N digits drawn randomly from 0-9.   I then consider the probability of a zero *never appearing once* in my stream.  Let's call that P(N).  Do you agree that as N increases, P(N) decreases? --Ned.
Open this post in threaded view
|

## Testing random

Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn On Mon, 8 Jun 2015 06:56 am, Thomas 'PointedEars' Lahn wrote: > You really should follow the references and brush up your math. You really should stop being patronising to everyone and pay attention to what is actually being said. If you won't believe us, and you won't believe mathematics, perhaps you will believe a simulation. import random def trial(size):     return [random.randint(1, 9) for i in range(size)] def test(trial_size, test_size):     every_element_seen = 0     for i in range(test_size):         s = set(trial(trial_size))         if len(s) == 9:             every_element_seen += 1     return every_element_seen/test_size msg = "Pr(every element seen at least once) in %d trials" for size in range(10, 100, 5):     print(msg % size, end=' ')     pr = test(size, 1000)     print(pr) When I run this simulation, I get: Pr(every element seen at least once) in 10 trials 0.007 Pr(every element seen at least once) in 15 trials 0.109 Pr(every element seen at least once) in 20 trials 0.349 Pr(every element seen at least once) in 25 trials 0.6 Pr(every element seen at least once) in 30 trials 0.765 Pr(every element seen at least once) in 35 trials 0.868 Pr(every element seen at least once) in 40 trials 0.917 Pr(every element seen at least once) in 45 trials 0.953 Pr(every element seen at least once) in 50 trials 0.976 Pr(every element seen at least once) in 55 trials 0.981 Pr(every element seen at least once) in 60 trials 0.996 Pr(every element seen at least once) in 65 trials 0.999 Pr(every element seen at least once) in 70 trials 0.997 Pr(every element seen at least once) in 75 trials 1.0 Pr(every element seen at least once) in 80 trials 0.998 Pr(every element seen at least once) in 85 trials 1.0 Pr(every element seen at least once) in 90 trials 1.0 Pr(every element seen at least once) in 95 trials 1.0   -- Steven
Open this post in threaded view
|

## Testing random

 In reply to this post by Thomas 'PointedEars' Lahn On Sun, Jun 7, 2015, at 16:56, Thomas 'PointedEars' Lahn wrote: > random832 at fastmail.us wrote: > > > On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote: > >> No.  AISB, those sequences all have the same probability: > > > > Yes and the probability of getting _any_ of the sequences, is the sum of > > the probabilities for each one of the sequences individually. > > The probability of getting any of the sequences is precisely 1 Any of the sequences *satisfying "contains at least one of each result"* (or, in general, any assertion about the degree of coverage, for example, covering at least two values being more likely than covering only one value) Flipping one coin has two possible sequences: H and T. The probability of getting at least one H and at least one T is 0. Flipping two coins has four possible sequences: HH, HT, TH, TT. The probability of getting at least one H and at least one T is 50%, since two of these sequences satisfies the condition. Flipping three coins has eight possible sequences. The probability of getting at least one H and at least one T is 75%, since six of the sequences: THH, HTH, HHT, HTT, THT, TTH, all satisfy the condition. Flipping four coins has sixteen possibile sequences. The probability of getting at least one H and at least one T is 87.5% Rolling 2d3 has nine possible sequences. The probability of getting at least one of each value is 0. The probability of getting at least one each of two values is 66.7%. The probability of getting only one value is 33.3%. Rolling 3d3 has 27 possible sequences. Six of them have one of each value. The probability is 22.2%. 24 have at least two different values. That probability is 88.9%. Only 3 have all of the same value. 11.1%. Rolling 4d3 has 81 possible sequences. 36 of them have at least one of each value. The probability is 44.4%. 78 of them have at least two different values. The probability is 96.3%. Still only 3 have all of the same value. 3.7%. Rolling 5d3 has 243 possible sequences. 150 of them have at least one of each value. The probability is 61.7% 240 of them have at least two different values. 98.8%. Still only 3 have all of the same value. 1.2%. In general, as the number of trials increases, the probability of having e.g. at least one of each value never _reaches_ 1, but it gets arbitrarily close.
Open this post in threaded view
|

## Testing random

 In reply to this post by Steven D'Aprano-8 On Sun, 7 Jun 2015 11:35 pm, Peter Otten wrote: > Steven D'Aprano wrote: > > >>> I wrote a very simple function to test random: >>>     def test_random(length, multiplier = 10000): >>>         number_list = length *  >>>         for i in range(length * multiplier): >>>             number_list[random.randint(0, length - 1)] += 1 >>>         minimum = min(number_list) >>>         maximum = max(number_list) >>>         return (minimum, maximum, minimum / maximum) >> >> Putting aside the timing aspects, your frequency calculations are not >> done in a very Pythonic manner. > > I would agree if the frequency table were sparse, i. e. many indices with > > number_list[index] == 0 > > but that's not the case with on average 10000 hits per index. > >> A better way might be: > > I'm too lazy to measure, but this will likely be a tad slower. Did you > mean to convey this by "Putting aside the timing aspects"? No, I meant that using a Counter to count stuff is more Pythonic than using a list. >> from collections import Counter >> from random import randint >> >> def test_random(length, multiplier = 10000): >>     freq = Counter( >>         randint(0, length - 1) for i in range(length * multiplier) >>         ) >>     minimum = min(freq[i] for i in range(length)) > > How about > >       if len(freq) < length: >           minimum = 0 >       else: >           minimum = min(freq.values()) That's quite nice, and it's extensible to the case where I return the counter: if len(freq) != length:     # At least one element never showed up at all.     freq.subtract(dict.fromkeys(range(length), 0)   > Not as an optimisation (replacing randint() with randrange() is probably > more effective in that department), but because it's closer to being self- > explanatory. I thought about replacing randint with randrange, but they are different functions with different implementations, and could exhibit different biases. So one might wish to test them separately for statistical properties. -- Steven
Open this post in threaded view
|

## Testing random

 In reply to this post by random832@fastmail.us On Mon, Jun 8, 2015 at 11:34 AM,   wrote: > In general, as the number of trials increases, the probability of having > e.g. at least one of each value never _reaches_ 1, but it gets > arbitrarily close. And by "arbitrarily close", you mean any of: * So close to 1.0 that IEEE double precision is unable to represent it * So unlikely that you could perform one trial every nanosecond until the heat death of the universe and still not expect to see it * Less likely than that two files could accidentally collide on MD5, SHA1, SHA256, and file size, simultaneously I think all of the above are true of this case, though I have to guess about the heat death of the universe. ChrisA
Open this post in threaded view
|

## Testing random

 On 2015-06-08 02:42, Chris Angelico wrote: > On Mon, Jun 8, 2015 at 11:34 AM,   wrote: >> In general, as the number of trials increases, the probability of having >> e.g. at least one of each value never _reaches_ 1, but it gets >> arbitrarily close. > > And by "arbitrarily close", you mean any of: > I believe the word is "asymptotic". > * So close to 1.0 that IEEE double precision is unable to represent it > * So unlikely that you could perform one trial every nanosecond until > the heat death of the universe and still not expect to see it > * Less likely than that two files could accidentally collide on MD5, > SHA1, SHA256, and file size, simultaneously > > I think all of the above are true of this case, though I have to guess > about the heat death of the universe. >
Open this post in threaded view
|

## Testing random

 In reply to this post by Chris Angelico On Sun, Jun 7, 2015, at 21:42, Chris Angelico wrote: > On Mon, Jun 8, 2015 at 11:34 AM,   wrote: > > In general, as the number of trials increases, the probability of having > > e.g. at least one of each value never _reaches_ 1, but it gets > > arbitrarily close. > > And by "arbitrarily close", you mean any of: > > * So close to 1.0 that IEEE double precision is unable to represent it > * So unlikely that you could perform one trial every nanosecond until > the heat death of the universe and still not expect to see it > * Less likely than that two files could accidentally collide on MD5, > SHA1, SHA256, and file size, simultaneously > > I think all of the above are true of this case, though I have to guess > about the heat death of the universe. Well, yes (assuming, as you say, that the heat death of the universe is a finite time). More precisely, "arbitrarily close" is a mathematical term which means that for any desired distance from the limit, there is an input that will cause the result to get closer. It's entirely possible to select a number of symbols and a number of trials that won't get you there. I also think there may have been some confusion with some of us using "trial" to refer to each individual random number, and some of us at some times using it to refer to each attempt to generate a sequence of a given length.