
12345

On Mon, Jun 8, 2015 at 4:28 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> If my previous post didn't convince you, consider an even simpler random
> distribution: tossing a fair coin. The probability of getting a head is
> exactly 1/2 whether you toss the coin once or a thousand times. But if you
> toss the coin once, your chances of getting "no heads" is 1/2. If you toss
> it a thousand times, your chances of getting "no heads" is 1/2**1000.
Although... purity and reality do sometimes disagree. The concept of
"independent events" says that if I toss a coin 99 times and it comes
up heads every time, the probability that it'll come up heads again
the hundredth time is still 5050. But the reality of cheats coins
says that the chances of the coin being a fair one and having been
tossed 99 times and come up heads every time is one in 1<<99, and
given that the mass of this planet is roughly 6E27 grams, that would
mean that you'd need about a thousand earths' worth of coins to get
that many... so the chances are much better that you're looking at
either an imbalanced coin or a crafty operator, and the probability is
much higher that it comes up heads :)
ChrisA


In reply to this post by Thomas 'PointedEars' Lahn
On Mon, Jun 8, 2015 at 4:23 AM, Thomas 'PointedEars' Lahn
<PointedEars at web.de> wrote:
> If the set to choose from is integer numbers from 1 to 9, then *each* of
> those sequences has *the same* probability (1?9)?? ? 3.1866355 ? 10???.
>
> AISB, those are *independent* events; the number of occurrences of an
> outcome before *does not matter* for the probability of the next one. And
> so the probability of getting a certain number does _not_ change depending
> on the number of times you repeat the experiment. It is always the same;
> in this example, it is always 1?9. And the probability of _not_ getting a
> certain number is always the same; in this example, it is always 1 ? 1?9 =
> 8?9.
Yes, the probability of not getting a certain number is always 8/9...
but the probability of not getting that number for an entire sequence
is 8/9 raised to the power of the length of the sequence, because they
are independent events that must *all* happen. In your case, with a
length of 11, that means there's about a 27% chance of any particular
number not happening. Looking at it the other way, that means there's
a 73% chance of seeing at least one N, for any N. So the probability
of seeing at least one of every digit is actually quite low  about
5.6%  because you're looking for nine digits out of eleven. (Note
that the numbers aren't quite exact here. The probabilities aren't
entirely independent. Calculating off the exact figures is left as an
exercise for the reader, but I don't expect they'll be more than a
percentage point different one way or the other.)
The chances of not seeing a 5 in any of 100 tosses would be virtually
nil. The chances of not seeing a 5 in any of 1000 tosses is so close
to zero that its inverse is indistinguishable from certainty with
Python floats (IEEE double precision):
>>> 1.0(9/10)**1000 == 1.0
True
That's a pretty low number.
ChrisA


In reply to this post by Thomas 'PointedEars' Lahn
Thomas 'PointedEars' Lahn writes:
> Chris Angelico wrote:
>
>> Huh. Do you want to explain how, mathematically, I am wrong, or do
>> you want to join the RUE in my ignore list?
>
> I already did; you have overlooked it. In a nutshell, the probability
> of
>
> 1 1 1 1 1 1 1 1 1 1 1
There is one way to get those numbers in some order.
(11! / 11! = 1)
> is *the same* as that of
>
> 1 2 3 4 5 6 7 8 9 1 2
There are almost ten million ways to get those numbers in some order.
(11! / 2! / 2! = 9979200)
> and the same as that of
>
> 8 3 6 3 1 2 6 8 2 1 6.
There are more than four hundred thousand ways to get those numbers in
some order.
(11! / 2! / 2! / 2! / 3! / 2! = 415800)


Jussi Piitulainen wrote:
> Thomas 'PointedEars' Lahn writes:
>> 8 3 6 3 1 2 6 8 2 1 6.
>
> There are more than four hundred thousand ways to get those numbers in
> some order.
>
> (11! / 2! / 2! / 2! / 3! / 2! = 415800)
Fallacy. Order is irrelevant here.

PointedEars
Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per EMail.


In reply to this post by Thomas 'PointedEars' Lahn
Chris Angelico wrote:
> On Mon, Jun 8, 2015 at 4:23 AM, Thomas 'PointedEars' Lahn
> <PointedEars at web.de> wrote:
>> If the set to choose from is integer numbers from 1 to 9, then *each* of
>> those sequences has *the same* probability (1?9)?? ? 3.1866355 ? 10???.
>>
>> AISB, those are *independent* events; the number of occurrences of an
>> outcome before *does not matter* for the probability of the next one.
>> And so the probability of getting a certain number does _not_ change
>> depending on the number of times you repeat the experiment. It is always
>> the same; in this example, it is always 1?9. And the probability of
>> _not_ getting certain number is always the same; in this example, it is
>> always 1 ? 1?9 = 8?9.
>
> Yes, the probability of not getting a certain number is always 8/9...
> but the probability of not getting that number for an entire sequence
> is 8/9 raised to the power of the length of the sequence
As it is for getting every number at least once, or only two, three, four,
five, six, seven, or eight of them. Each number has the probability of 1?9
of being drawn, and for a sequence of n numbers drawn, the probability is
(1?9)?, *no matter* which numbers are in it.
> because they are independent events that must *all* happen. [?]
As I already pointed out, your reasoning is purely *intuitive*, _not_ based
on (correct) math, and (therefore) flawed. And intuition lets you down
here.
The correct, mathematical reasoning is: *Because* the events are
*independent*, the probability of the sequence does not change depending on
the number of occurrences of an event (outcome of a draw). That all numbers
occur is _not_ more likely than that only one number occurs, or two, three,
and so on. And that does _not_ depend on how many draws you take as there
is an infinite reservoir for each number.
I do not think I can explain it to you better, and I am not going to waste
my precious free time to repeat myself. Follow the reference I gave.

PointedEars
Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per EMail.


In reply to this post by Thomas 'PointedEars' Lahn
On Sun, Jun 7, 2015, at 15:29, Thomas 'PointedEars' Lahn wrote:
> Jussi Piitulainen wrote:
>
> > Thomas 'PointedEars' Lahn writes:
> >> 8 3 6 3 1 2 6 8 2 1 6.
> >
> > There are more than four hundred thousand ways to get those numbers in
> > some order.
> >
> > (11! / 2! / 2! / 2! / 3! / 2! = 415800)
>
> Fallacy. Order is irrelevant here.
Huh? The fact that order is irrelevant is exactly _why_ what he said
_is_ relevant.


In reply to this post by Thomas 'PointedEars' Lahn
Peter Otten wrote:
> Thomas 'PointedEars' Lahn wrote:
>> Peter Otten wrote:
>>> Steven D'Aprano wrote:
>>>>> I wrote a very simple function to test random:
>>>>> def test_random(length, multiplier = 10000):
>>>>> number_list = length * [0]
>>>>> for i in range(length * multiplier):
>>>>> number_list[random.randint(0, length  1)] += 1
>>>>> minimum = min(number_list)
>>>>> maximum = max(number_list)
>>>>> return (minimum, maximum, minimum / maximum)
>>>>
>>>> Putting aside the timing aspects, your frequency calculations are not
>>>> done in a very Pythonic manner.
>>>
>>> I would agree if the frequency table were sparse, i. e. many indices
>>> with
>>>
>>> number_list[index] == 0
>>>
>>> but that's not the case with on average 10000 hits per index.
>>
>> A common misconception.
>
> A pointless remark. Not counting the ears ;)
The details are in my other followups. Go read them.

PointedEars
Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per EMail.


In reply to this post by Thomas 'PointedEars' Lahn
On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote:
> No. AISB, those sequences all have the same probability:
Yes and the probability of getting _any_ of the sequences, is the sum of
the probabilities for each one of the sequences individually.


In reply to this post by Thomas 'PointedEars' Lahn
random832 at fastmail.us wrote:
> On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote:
>> No. AISB, those sequences all have the same probability:
>
> Yes and the probability of getting _any_ of the sequences, is the sum of
> the probabilities for each one of the sequences individually.
The probability of getting any of the sequences is precisely 1, and that is
irrelevant to the fact that the probability of getting a particular sequence
is irrelevant of the number of occurrences of a specific event as the
probability variable is evenly distributed and the events are independent
(of each other).
You really should follow the references and brush up your math.

PointedEars
Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per EMail.


In reply to this post by Thomas 'PointedEars' Lahn
random832 at fastmail.us wrote:
> On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote:
>> No. AISB, those sequences all have the same probability:
>
> Yes and the probability of getting _any_ of the sequences, is the sum of
> the probabilities for each one of the sequences individually.
The probability of getting any of the sequences is precisely 1, and that is
irrelevant to the fact that the probability of getting a particular sequence
is independent of the number of occurrences of a specific event as the
probability variable is evenly distributed and the events are independent
(of each other).
You really should follow the references and brush up your math.

PointedEars
Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per EMail.


In reply to this post by Thomas 'PointedEars' Lahn
On Sunday, June 7, 2015 at 2:26:02 PM UTC4, Thomas 'PointedEars' Lahn wrote:
> Chris Angelico wrote:
>
> > On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn
> > <PointedEars at web.de> wrote:
> >>> The greater the multiplier, the lower the chance that any element will
> >>> have no hits.
> >> Wrong.
> >>
> >>> [ex falso quodlibet]
> >
> > Huh. Do you want to explain how, mathematically, I am wrong, or do you
> > want to join the RUE in my ignore list?
>
> I already did; you have overlooked it. In a nutshell, the probability of
>
> 1 1 1 1 1 1 1 1 1 1 1
>
> is *the same* as that of
>
> 1 2 3 4 5 6 7 8 9 1 2
>
> and the same as that of
>
> 8 3 6 3 1 2 6 8 2 1 6.
>
You aren't agreeing because you are arguing about different things.
Thomas is talking about the relative probability of sequences of digits.
Chris is talking about the probability of a single digit never appearing
in the output.
Thomas: let's say I generate streams of N digits drawn randomly from 09.
I then consider the probability of a zero *never appearing once* in my
stream. Let's call that P(N). Do you agree that as N increases, P(N)
decreases?
Ned.


In reply to this post by Thomas 'PointedEars' Lahn
On Mon, 8 Jun 2015 04:23 am, Thomas 'PointedEars' Lahn wrote:
> Chris Angelico wrote:
>
>> On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn
>> <PointedEars at web.de> wrote:
>>>> The greater the multiplier, the lower the chance that any element will
>>>> have no hits.
>>> Wrong.
>>>
>>>> [ex falso quodlibet]
>>
>> Huh. Do you want to explain how, mathematically, I am wrong, or do you
>> want to join the RUE in my ignore list?
>
> I already did; you have overlooked it. In a nutshell, the probability of
>
> 1 1 1 1 1 1 1 1 1 1 1
>
> is *the same* as that of
>
> 1 2 3 4 5 6 7 8 9 1 2
>
> and the same as that of
>
> 8 3 6 3 1 2 6 8 2 1 6.
True, but irrelevant. What is irrelevant is that the probability of:
1 1 1 1 1 1 1 1 1 1 1
is MUCH LESS than the probability of:
1 2 3 4 5 6 7 8 9 1 2 or
8 3 6 3 1 2 6 8 2 1 6 or
9 3 1 5 5 4 7 1 6 8 1 or
2 3 3 1 8 9 7 2 5 1 6 or
7 8 9 4 2 7 7 1 3 8 2 or
...
[enumerate millions of other possibilities]
...
2 1 5 5 8 1 8 7 6 5 1 or
3 4 9 5 2 8 4 2 3 2 6.
> So, I am sorry to tell you this, but you do _not_ understand probability.
It's certainly true that you don't.
> And you *cannot* understand it intuitively, like you tried to.
Rubbish. Chris has got it right. Whether he did so "intuitively" or whether
he learned this after years of study is irrelevant, and you have no
evidence for which it is. You are jumping to wrong conclusions  you are
so sure that you're so much smarter than everyone else that you've ASSUMED
that Chris must be wrong and *completely failed to pay attention* to what
he has actually said, responding to things he hasn't said.
> It is why humans play the lottery using their ?lucky numbers? whereas
> other numbers have exactly the same probability of being drawn;
[snip more irrelevant examples]
None of these things are relevant in the slightest.
> An extension of that misconception is emphasized by an anecdote (which may
> be apocryphal) told about Richard Feynman (I heard it from Lawrence Krauss
> in ?A Universe from Nothing?; he can tell that in a much more funny way
> than I am able to reproduce it here [1]):
>
> Richard Feynman used to go up to people all the time and he?d say:
>
> ?You won?t believe what happend to me today!
> You won?t believe what happend to me today!?
>
> And people would say:
>
> ?What??
>
> And he would reply:
>
> ?Absolutely nothing!?
And I wouldn't believe it either. *Something happened*. He got up. He put
his shoes on. He washed his face. These are all events, but they are not
memorable events. They are equivalent to a sample like:
6 8 1 5 9 2 7 6 3 3 2
There are millions of ways to get a nonmemorable sample like that, but only
nine ways to get a memorable sample like:
2 2 2 2 2 2 2 2 2 2 2
Your mistake is that you are comparing that one memorable sample to any
*one* of the nonmemorable ones, when you should be comparing the
probability of *any* memorable sample against *any* nonmemorable ones.
Actually, your real mistake is hubris. *Nothing* in Chris' comments in this
thread should lead you to the conclusion he doesn't know what he is talking
about, but you're so full of the preconceived notion that he must be wrong
because he is not Thomas Lahn that you've actually made an astonishing
blunder and compounded it with laughable arrogance.

Steven


In reply to this post by Thomas 'PointedEars' Lahn
On Mon, 8 Jun 2015 06:56 am, Thomas 'PointedEars' Lahn wrote:
> You really should follow the references and brush up your math.
You really should stop being patronising to everyone and pay attention to
what is actually being said.
If you won't believe us, and you won't believe mathematics, perhaps you will
believe a simulation.
import random
def trial(size):
return [random.randint(1, 9) for i in range(size)]
def test(trial_size, test_size):
every_element_seen = 0
for i in range(test_size):
s = set(trial(trial_size))
if len(s) == 9:
every_element_seen += 1
return every_element_seen/test_size
msg = "Pr(every element seen at least once) in %d trials"
for size in range(10, 100, 5):
print(msg % size, end=' ')
pr = test(size, 1000)
print(pr)
When I run this simulation, I get:
Pr(every element seen at least once) in 10 trials 0.007
Pr(every element seen at least once) in 15 trials 0.109
Pr(every element seen at least once) in 20 trials 0.349
Pr(every element seen at least once) in 25 trials 0.6
Pr(every element seen at least once) in 30 trials 0.765
Pr(every element seen at least once) in 35 trials 0.868
Pr(every element seen at least once) in 40 trials 0.917
Pr(every element seen at least once) in 45 trials 0.953
Pr(every element seen at least once) in 50 trials 0.976
Pr(every element seen at least once) in 55 trials 0.981
Pr(every element seen at least once) in 60 trials 0.996
Pr(every element seen at least once) in 65 trials 0.999
Pr(every element seen at least once) in 70 trials 0.997
Pr(every element seen at least once) in 75 trials 1.0
Pr(every element seen at least once) in 80 trials 0.998
Pr(every element seen at least once) in 85 trials 1.0
Pr(every element seen at least once) in 90 trials 1.0
Pr(every element seen at least once) in 95 trials 1.0

Steven


In reply to this post by Thomas 'PointedEars' Lahn
On Sun, Jun 7, 2015, at 16:56, Thomas 'PointedEars' Lahn wrote:
> random832 at fastmail.us wrote:
>
> > On Sun, Jun 7, 2015, at 16:09, Thomas 'PointedEars' Lahn wrote:
> >> No. AISB, those sequences all have the same probability:
> >
> > Yes and the probability of getting _any_ of the sequences, is the sum of
> > the probabilities for each one of the sequences individually.
>
> The probability of getting any of the sequences is precisely 1
Any of the sequences *satisfying "contains at least one of each result"*
(or, in general, any assertion about the degree of coverage, for
example, covering at least two values being more likely than covering
only one value)
Flipping one coin has two possible sequences: H and T. The probability
of getting at least one H and at least one T is 0.
Flipping two coins has four possible sequences: HH, HT, TH, TT. The
probability of getting at least one H and at least one T is 50%, since
two of these sequences satisfies the condition.
Flipping three coins has eight possible sequences. The probability of
getting at least one H and at least one T is 75%, since six of the
sequences: THH, HTH, HHT, HTT, THT, TTH, all satisfy the condition.
Flipping four coins has sixteen possibile sequences. The probability of
getting at least one H and at least one T is 87.5%
Rolling 2d3 has nine possible sequences.
The probability of getting at least one of each value is 0.
The probability of getting at least one each of two values is 66.7%.
The probability of getting only one value is 33.3%.
Rolling 3d3 has 27 possible sequences.
Six of them have one of each value. The probability is 22.2%.
24 have at least two different values. That probability is 88.9%.
Only 3 have all of the same value. 11.1%.
Rolling 4d3 has 81 possible sequences.
36 of them have at least one of each value. The probability is 44.4%.
78 of them have at least two different values. The probability is 96.3%.
Still only 3 have all of the same value. 3.7%.
Rolling 5d3 has 243 possible sequences.
150 of them have at least one of each value. The probability is 61.7%
240 of them have at least two different values. 98.8%.
Still only 3 have all of the same value. 1.2%.
In general, as the number of trials increases, the probability of having
e.g. at least one of each value never _reaches_ 1, but it gets
arbitrarily close.


On Sun, 7 Jun 2015 11:35 pm, Peter Otten wrote:
> Steven D'Aprano wrote:
>
>
>>> I wrote a very simple function to test random:
>>> def test_random(length, multiplier = 10000):
>>> number_list = length * [0]
>>> for i in range(length * multiplier):
>>> number_list[random.randint(0, length  1)] += 1
>>> minimum = min(number_list)
>>> maximum = max(number_list)
>>> return (minimum, maximum, minimum / maximum)
>>
>> Putting aside the timing aspects, your frequency calculations are not
>> done in a very Pythonic manner.
>
> I would agree if the frequency table were sparse, i. e. many indices with
>
> number_list[index] == 0
>
> but that's not the case with on average 10000 hits per index.
>
>> A better way might be:
>
> I'm too lazy to measure, but this will likely be a tad slower. Did you
> mean to convey this by "Putting aside the timing aspects"?
No, I meant that using a Counter to count stuff is more Pythonic than using
a list.
>> from collections import Counter
>> from random import randint
>>
>> def test_random(length, multiplier = 10000):
>> freq = Counter(
>> randint(0, length  1) for i in range(length * multiplier)
>> )
>> minimum = min(freq[i] for i in range(length))
>
> How about
>
> if len(freq) < length:
> minimum = 0
> else:
> minimum = min(freq.values())
That's quite nice, and it's extensible to the case where I return the
counter:
if len(freq) != length:
# At least one element never showed up at all.
freq.subtract(dict.fromkeys(range(length), 0)
> Not as an optimisation (replacing randint() with randrange() is probably
> more effective in that department), but because it's closer to being self
> explanatory.
I thought about replacing randint with randrange, but they are different
functions with different implementations, and could exhibit different
biases. So one might wish to test them separately for statistical
properties.

Steven


In reply to this post by random832@fastmail.us
On Mon, Jun 8, 2015 at 11:34 AM, <random832 at fastmail.us> wrote:
> In general, as the number of trials increases, the probability of having
> e.g. at least one of each value never _reaches_ 1, but it gets
> arbitrarily close.
And by "arbitrarily close", you mean any of:
* So close to 1.0 that IEEE double precision is unable to represent it
* So unlikely that you could perform one trial every nanosecond until
the heat death of the universe and still not expect to see it
* Less likely than that two files could accidentally collide on MD5,
SHA1, SHA256, and file size, simultaneously
I think all of the above are true of this case, though I have to guess
about the heat death of the universe.
ChrisA


On 20150608 02:42, Chris Angelico wrote:
> On Mon, Jun 8, 2015 at 11:34 AM, <random832 at fastmail.us> wrote:
>> In general, as the number of trials increases, the probability of having
>> e.g. at least one of each value never _reaches_ 1, but it gets
>> arbitrarily close.
>
> And by "arbitrarily close", you mean any of:
>
I believe the word is "asymptotic".
> * So close to 1.0 that IEEE double precision is unable to represent it
> * So unlikely that you could perform one trial every nanosecond until
> the heat death of the universe and still not expect to see it
> * Less likely than that two files could accidentally collide on MD5,
> SHA1, SHA256, and file size, simultaneously
>
> I think all of the above are true of this case, though I have to guess
> about the heat death of the universe.
>


On Sun, Jun 7, 2015, at 21:42, Chris Angelico wrote:
> On Mon, Jun 8, 2015 at 11:34 AM, <random832 at fastmail.us> wrote:
> > In general, as the number of trials increases, the probability of having
> > e.g. at least one of each value never _reaches_ 1, but it gets
> > arbitrarily close.
>
> And by "arbitrarily close", you mean any of:
>
> * So close to 1.0 that IEEE double precision is unable to represent it
> * So unlikely that you could perform one trial every nanosecond until
> the heat death of the universe and still not expect to see it
> * Less likely than that two files could accidentally collide on MD5,
> SHA1, SHA256, and file size, simultaneously
>
> I think all of the above are true of this case, though I have to guess
> about the heat death of the universe.
Well, yes (assuming, as you say, that the heat death of the universe is
a finite time). More precisely, "arbitrarily close" is a mathematical
term which means that for any desired distance from the limit, there is
an input that will cause the result to get closer. It's entirely
possible to select a number of symbols and a number of trials that won't
get you there.
I also think there may have been some confusion with some of us using
"trial" to refer to each individual random number, and some of us at
some times using it to refer to each attempt to generate a sequence of a
given length.


In reply to this post by Thomas 'PointedEars' Lahn
Thomas 'PointedEars' Lahn writes:
> Jussi Piitulainen wrote:
>> Thomas 'PointedEars' Lahn writes:
>>
>>> 8 3 6 3 1 2 6 8 2 1 6.
>>
>> There are more than four hundred thousand ways to get those numbers
>> in some order.
>>
>> (11! / 2! / 2! / 2! / 3! / 2! = 415800)
>
> Fallacy. Order is irrelevant here.
You need to consider every sequence that leads to the observed counts.
One of those sequences occurred. You don't know which.
When tossing herrings into a fixed number of boxes (each time picking
the number of one of the boxes), the proportion of the ways to hit every
box at least once increases with the number of herrings as follows.
Proportions of occupying sequences of tosses into 3 boxes:
2 herrings: 0% == 0 out of 9
3 herrings: 22% == 6 out of 27
4 herrings: 44% == 36 out of 81
5 herrings: 62% == 150 out of 243
6 herrings: 74% == 540 out of 729
7 herrings: 83% == 1806 out of 2187
8 herrings: 88% == 5796 out of 6561
9 herrings: 92% == 18150 out of 19683
10 herrings: 95% == 55980 out of 59049
11 herrings: 97% == 171006 out of 177147
12 herrings: 98% == 519156 out of 531441
That's counting the sequences of box numbers, so those proportions can
be interpreted as probabilities (of occupying every box) under the
standard assumption that the sequences are equiprobable. The final
distributions of counts aren't. (How is this even counterintuitive?)
Code follows. Incidentally, I'm not feeling smart here. I made several
quite stupid mistakes during this little experiment before I was happy
with it. Note that this actually walks through a combinatorial
explosion, so do not substitute much bigger parameters.
from itertools import product
from collections import Counter
boxes = 3
print('Proportions of occupying sequences of tosses into {} boxes:'
.format(boxes))
for herrings in range(2, 13):
winnings = sum(len(Counter(p)) == boxes
for p in product(range(boxes),
repeat = herrings))
print('{:>2} herrings:'.format(herrings),
'{:>3.0%}'.format(winnings / (boxes ** herrings)),
'== {} out of {}'.format(winnings, boxes ** herrings))

12345
