I wrote a very simple function to test random:
def test_random(length, multiplier = 10000): number_list = length * [0] for i in range(length * multiplier): number_list[random.randint(0, length - 1)] += 1 minimum = min(number_list) maximum = max(number_list) return (minimum, maximum, minimum / maximum) When running: for i in range(1, 7): print(test_random(100, 10 ** i)) I get: (3, 17, 0.17647058823529413) (73, 127, 0.5748031496062992) (915, 1086, 0.8425414364640884) (9723, 10195, 0.9537027954879843) (99348, 100620, 0.9873583780560524) (997198, 1002496, 0.9947151908835546) and when running: for i in range(1, 7): print(test_random(1000, 10 ** i)) I get: (2, 20, 0.1) (69, 138, 0.5) (908, 1098, 0.8269581056466302) (9684, 10268, 0.9431242695753799) (99046, 100979, 0.9808574059953059) (996923, 1003598, 0.9933489305478886) It shows that when the first parameter increases the deviation increases and when the second parameter increases the deviation decreases. Exactly what you would expect. But what are the ranges you would expect with a good random function. Then it could be used to test a random function. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof |
Cecil Westerhof wrote:
> I wrote a very simple function to test random: > def test_random(length, multiplier = 10000): > number_list = length * [0] > for i in range(length * multiplier): > number_list[random.randint(0, length - 1)] += 1 > minimum = min(number_list) > maximum = max(number_list) > return (minimum, maximum, minimum / maximum) As there is no guarantee that every number will occur randomly, using a dictionary at first should be more efficient than a list: def test_random (length, multiplier=10000): number_list = {} for i in range(length * multiplier): r = random.randint(0, length - 1) number_list[r] = number_list[r] + 1 if r in number_list else 1 values = number_list.values() minimum = min(values) maximum = max(values) return (minimum, maximum, minimum / maximum) [?uple literals are going to be introduced with C# 7 ;-)] Since Python uses indentation as syntax element, it is better not to indent Python code by default when posting in order to make it easier testable. If necessary, code sections can be delimited in an unambiguous way that is syntactically correct by lines like #------------------------------------------------------------------------ The line length then should not exceed 72 to 78 characters, for quoting. Although the <HT> character can be used for indentation, 4 spaces appear to be a sensible, device-independent default (8 is too much for more complex snippets that need to fit within the conventional 80-characters-per-line limit). -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. |
In reply to this post by Cecil Westerhof
On 07.06.2015 08:27, Cecil Westerhof wrote:
> I wrote a very simple function to test random: > def test_random(length, multiplier = 10000): > number_list = length * [0] > for i in range(length * multiplier): > number_list[random.randint(0, length - 1)] += 1 > minimum = min(number_list) > maximum = max(number_list) > return (minimum, maximum, minimum / maximum) > > When running: > for i in range(1, 7): > print(test_random(100, 10 ** i)) > I get: > (3, 17, 0.17647058823529413) > (73, 127, 0.5748031496062992) > (915, 1086, 0.8425414364640884) > (9723, 10195, 0.9537027954879843) > (99348, 100620, 0.9873583780560524) > (997198, 1002496, 0.9947151908835546) > > and when running: > for i in range(1, 7): > print(test_random(1000, 10 ** i)) > I get: > (2, 20, 0.1) > (69, 138, 0.5) > (908, 1098, 0.8269581056466302) > (9684, 10268, 0.9431242695753799) > (99046, 100979, 0.9808574059953059) > (996923, 1003598, 0.9933489305478886) > > It shows that when the first parameter increases the deviation > increases and when the second parameter increases the deviation > decreases. Exactly what you would expect. But what are the ranges you > would expect with a good random function. Really depends on the number of samples. Appearantly, a good RNG would come close to 1.0 for the ratio. > Then it could be used to test a random function. This is an interesting test (more interesting to me than it looks at the first sight, and certainly better than what I had come up with), but unfortunately, there is more to testing randomness. The test clearly suggests that random functions should have a min/max ratio of about 1.0. Think of a "random" function like this: def fake_random(minv, maxv, _cache={}): try: return next(_cache[minv, maxv]) except (StopIteration, KeyError): iterator = iter(range(minv, maxv+1)) _cache[minv, maxv] = iterator return next(iterator) (if that is hard to read, I agree; it returns the sequence from minv to maxv (inclusively) over and over again for equal minv and maxv. don?t pass anything to _cache :), just call it like random.randint) This gives a "perfect" ratio of 1.0, but is not very random. This kind of function would probably be ruled out by a compression or correlation test. If you want to dig deeper into the algorithms for random testing, the dieharder test suite [1] is probably worth a look. It all boils down to the use case. For some use cases, the ``fake_random`` might be good enough (unittesting would be such a case: it is certainly uniformly distributed and allows full coverage of the tested range), for others it would fail catastrophically (don?t generate your cryptographic keys with this!). cheers, Jonas [1]: http://www.phy.duke.edu/~rgb/General/dieharder.php -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: <http://mail.python.org/pipermail/python-list/attachments/20150607/9fa6160b/attachment.sig> |
In reply to this post by Thomas 'PointedEars' Lahn
On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn
<PointedEars at web.de> wrote: > Cecil Westerhof wrote: > >> I wrote a very simple function to test random: >> def test_random(length, multiplier = 10000): >> number_list = length * [0] >> for i in range(length * multiplier): >> number_list[random.randint(0, length - 1)] += 1 >> minimum = min(number_list) >> maximum = max(number_list) >> return (minimum, maximum, minimum / maximum) > > As there is no guarantee that every number will occur randomly, using a > dictionary at first should be more efficient than a list: Hmm, I'm not sure that's actually so. His code is aiming to get 'multiplier' values in each box; for any serious multiplier (he starts with 10 in the main code), you can be fairly confident that every number will come up at least once. The distribution of numbers won't ever be perfectly even, but you'd expect it to be reasonably close. I have a similar routine on my Dungeons & Dragons server; since a roll of a twenty-sided dice is crucial to most of the game, I have a simple tester that proves to people that the in-built dice roller is fair. Its output looks like this: > roll test 1: 10017 2: 10003 3: 9966 4: 9728 5: 10088 6: 9888 7: 10087 8: 9971 9: 10052 10: 10061 11: 10130 12: 9942 13: 10062 14: 10075 15: 10050 16: 9948 17: 9880 18: 10052 19: 9995 20: 10005 Standard deviation: 90.18 (0.90%) This is about equivalent to test_random(20), and as you see, he and I both picked a multiplier of 10K to use by default. (I call the parameters "max" and "avg" rather than "length" and "multiplier", but they have the exact same semantics.) The hard part is figuring out what "looks reasonable"; a true RNG could legitimately produce nothing but 7s for the entire run, it's just extremely unlikely. ChrisA |
In reply to this post by Cecil Westerhof
On Sun, 7 Jun 2015 04:27 pm, Cecil Westerhof wrote:
> I wrote a very simple function to test random: > def test_random(length, multiplier = 10000): > number_list = length * [0] > for i in range(length * multiplier): > number_list[random.randint(0, length - 1)] += 1 > minimum = min(number_list) > maximum = max(number_list) > return (minimum, maximum, minimum / maximum) Putting aside the timing aspects, your frequency calculations are not done in a very Pythonic manner. A better way might be: from collections import Counter from random import randint def test_random(length, multiplier = 10000): freq = Counter( randint(0, length - 1) for i in range(length * multiplier) ) minimum = min(freq[i] for i in range(length)) maximum = max(freq.values()) return (minimum, maximum, minimum / maximum) Note carefully the way I calculate the minimum value. That way, if by chance some number has a frequency of zero, the minimum returned will be zero. For the maximum, that doesn't matter, since any missing numbers will not be the most frequent number. Better than returning the min and max, you should return the counter object itself. Before doing so, I make sure that any missing numbers are given a value of zero. def test_random(size, samples=1000000): """Return a sample of `samples` random numbers from 0 to size-1.""" freq = Counter(randint(0, size-1) for i in range(samples)) # Avoid missing numbers. (Likely for small samples.) freq.subtract(dict.fromkeys(range(size), 0)) return freq Now once you have the frequencies, you can look at many interesting statistics, and verify that you have a valid sample. py> f = test_random(100) py> sum(f.values()) == 1000000 # Total number of samples is right? True py> sorted(f) == list(range(100)) # Any missing numbers? True py> max(f) - min(f) # statistical range 99 py> min(f.values()) # Lowest frequency? 9775 py> max(f.values()) # And the highest? 10313 The ratio of those two frequencies isn't very interesting, but we can calculate it: py> min(f.values())/max(f.values()) 0.9478328323475226 For a uniformly distributed population, that ratio will be 1.0 exactly, but for any finite sample, it could be any number between 0 and 1. We expect that if we have a large random sample, the ratio should approach 1, but moderate deviations from that are expected. In fact, we should be surprised if the ratio is exactly 1, and suspect shenanigans if it is. We can extract the mode, and its frequency: py> f.most_common(1) # mode [(85, 10313)] In Python 3.4 or better, we can look at some more statistics: py> import statistics py> statistics.median(f.elements()) 50.0 py> statistics.mean(f.elements()) 49.531949 py> statistics.stdev(f.elements()) 28.873178122988367 And verify that the mode is correct: py> statistics.mode(f.elements()) 85 We can see what happens if the data is fiddled with: py> g = f.copy() py> g[42] += g[23] py> g[23] = 0 py> statistics.median(g.elements()) 50.0 py> statistics.mean(g.elements()) 49.7263 py> statistics.stdev(g.elements()) 28.757647388343212 py> g.most_common(3) [(42, 20258), (85, 10313), (56, 10208)] The median is unchanged, the mean shifts slightly higher, and the standard deviation increases. But as you can see, these are not particularly powerful tests of randomness: only the mode shows an obvious deviation from uniformity. > But what are the ranges you > would expect with a good random function. Then it could be used to > test a random function. Python's pseudo-random number generator is based on the Mersenne Twister. This is one of the most high-quality and extensively tested PRNGs available, with a period of 2**19937-1 before it repeats. Its randomness properties are very good indeed. (Although of course that isn't to say that there aren't bugs in the Python implementation.) Mersenne Twister is is not suitable for cryptographic work, but apart from that, it is pretty much as indistinguishable from random as any deterministic computer program can be. -- Steven |
In reply to this post by Cecil Westerhof
Am 07.06.15 um 08:27 schrieb Cecil Westerhof:
> I wrote a very simple function to test random: > def test_random(length, multiplier = 10000): > number_list = length * [0] > for i in range(length * multiplier): > number_list[random.randint(0, length - 1)] += 1 > minimum = min(number_list) > maximum = max(number_list) > return (minimum, maximum, minimum / maximum) > > It shows that when the first parameter increases the deviation > increases and when the second parameter increases the deviation > decreases. Exactly what you would expect. But what are the ranges you > would expect with a good random function. Then it could be used to > test a random function. Random number generators (RNG or PRNG) are usually tested using similar methods, where the outcome can be assigned a probability distribution. For example, diehard is a famous sequence of tests for RNGs. http://en.wikipedia.org/wiki/Diehard_tests You are only checking for uniform distribution. This couldn't detect correlation, for instance if the RNG would just return increasing numbers. A better check for uniformness could be done by the chi square test or Kolmogorov-Smirnov. Then there are tables which relate the deviations to significance. Christian |
In reply to this post by Steven D'Aprano-8
On Sun, 7 Jun 2015 10:52 pm, Steven D'Aprano wrote:
> The median is unchanged, the mean shifts slightly higher, and the standard > deviation increases. But as you can see, these are not particularly > powerful tests of randomness: only the mode shows an obvious deviation > from uniformity. Oh, I forgot: we can look at the frequencies themselves as well. If our sample is absolutely perfectly distributed uniformly, then every element will have a frequency of exactly 10000. We don't necessarily expect this from a random sample, but if we're too far from that, we should be concerned. Here are the results from the fair sample: py> statistics.mean(f.values()) 10000.0 py> statistics.median(f.values()) 9995.0 py> statistics.stdev(f.values()) 91.8027089673141 If I do the same thing with the biased sample, the standard deviation stands out like a sore thumb: py> statistics.mean(g.values()) 10000.0 py> statistics.median(g.values()) 9993.0 py> statistics.stdev(g.values()) 1442.527341617181 -- Steven |
In reply to this post by Steven D'Aprano-8
Steven D'Aprano wrote:
>> I wrote a very simple function to test random: >> def test_random(length, multiplier = 10000): >> number_list = length * [0] >> for i in range(length * multiplier): >> number_list[random.randint(0, length - 1)] += 1 >> minimum = min(number_list) >> maximum = max(number_list) >> return (minimum, maximum, minimum / maximum) > > Putting aside the timing aspects, your frequency calculations are not done > in a very Pythonic manner. I would agree if the frequency table were sparse, i. e. many indices with number_list[index] == 0 but that's not the case with on average 10000 hits per index. > A better way might be: I'm too lazy to measure, but this will likely be a tad slower. Did you mean to convey this by "Putting aside the timing aspects"? > from collections import Counter > from random import randint > > def test_random(length, multiplier = 10000): > freq = Counter( > randint(0, length - 1) for i in range(length * multiplier) > ) > minimum = min(freq[i] for i in range(length)) How about if len(freq) < length: minimum = 0 else: minimum = min(freq.values()) Not as an optimisation (replacing randint() with randrange() is probably more effective in that department), but because it's closer to being self- explanatory. > maximum = max(freq.values()) > return (minimum, maximum, minimum / maximum) |
In reply to this post by Thomas 'PointedEars' Lahn
Chris Angelico wrote:
> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn > <PointedEars at web.de> wrote: >> Cecil Westerhof wrote: >>> I wrote a very simple function to test random: >>> def test_random(length, multiplier = 10000): >>> number_list = length * [0] >>> for i in range(length * multiplier): >>> number_list[random.randint(0, length - 1)] += 1 >>> minimum = min(number_list) >>> maximum = max(number_list) >>> return (minimum, maximum, minimum / maximum) >> >> As there is no guarantee that every number will occur randomly, using a >> dictionary at first should be more efficient than a list: > > Hmm, I'm not sure that's actually so. His code is aiming to get > 'multiplier' values in each box; for any serious multiplier (he starts > with 10 in the main code), you can be fairly confident that every > number will come up at least once. The wording shows a common misconception: that random distribution would mean that it is guaranteed or more probable that every element of the set will occur at least once. It is another common misconception that increasing the number of trials would increase the probability of that happening. But that is not so. The law of large numbers only says that as you increase the number of trials, that the relative frequency *approaches* the probability for each value of the probability variable, or IOW ?the average of the results obtained from a large number of trials should be close to the expected value, and will *tend to become closer* as more trials are performed.? (<http://en.wikipedia.org/wiki/Law_of_large_numbers>; emphasis by me) > [?] a true RNG could legitimately produce nothing but 7s for the entire > run, it's just extremely unlikely. That reasoning is precisely a result of the misconception described above. Because people think that every value must occur, they do not think it possible that (much) repetition could occur with a (pseudo-)random generator, and when they want to mince words they say ?(extremely) unlikely? instead. For example, when people see 6 5 8 7 9 3 7 8 4 7 5 6 8 8 1 2 8 3 5 7 5 4 1 2 4 8 8 7 5 1 and are told that this is random sequence, they find it hard to believe. They think something like: ?Look at all those repeated 8, and ?7, 5? occurs twice. 4 occurs more often than 2, and there are much more 5s than 1s. That cannot be possibly be random!? Yes, it *can*. I have just produced it with #------------------------------------------------------------------ import random print(" ".join([str(random.randint(1, 9)) for i in range(0, 30)]))' #------------------------------------------------------------------ But if you think this *Mersenne Twister* PRNG which ?generates numbers with nearly uniform distribution and a large period? is flawed, use a proper die as RNG, and a sheet of paper to record the outcomes, and do the experiment. The outcome is not going to be different. If the distribution is even and the game is fair, every outcome *always* has the same probability of occurring. As a result, every sequence of outcomes of equal length *always* has the same probability of occurring, and the probability for a particular sequence of equal length does _not_ increase or decrease based on the number of occurrences of previous outcomes. Those are *independent* events. See also: <http://www.teacherlink.org/content/math/interactive/probability/numbersense/misconceptions/home.html>, in particular the section ?Representativeness? -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. |
On Mon, Jun 8, 2015 at 1:51 AM, Thomas 'PointedEars' Lahn
<PointedEars at web.de> wrote: > Chris Angelico wrote: > >> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn >> <PointedEars at web.de> wrote: >>> Cecil Westerhof wrote: >>>> I wrote a very simple function to test random: >>>> def test_random(length, multiplier = 10000): >>>> number_list = length * [0] >>>> for i in range(length * multiplier): >>>> number_list[random.randint(0, length - 1)] += 1 >>>> minimum = min(number_list) >>>> maximum = max(number_list) >>>> return (minimum, maximum, minimum / maximum) >>> >>> As there is no guarantee that every number will occur randomly, using a >>> dictionary at first should be more efficient than a list: >> >> Hmm, I'm not sure that's actually so. His code is aiming to get >> 'multiplier' values in each box; for any serious multiplier (he starts >> with 10 in the main code), you can be fairly confident that every >> number will come up at least once. > > The wording shows a common misconception: that random distribution would > mean that it is guaranteed or more probable that every element of the set > will occur at least once. It is another common misconception that > increasing the number of trials would increase the probability of that > happening. But that is not so. The greater the multiplier, the lower the chance that any element will have no hits. With uniform distribution, a length of 10, and a multiplier of 10, there are 100 attempts with a 90% chance each that any given number will be avoided - which works out to 0.9**100 == 2.6561398887587544e-05 probability that (say) there'll be no 4s in the list. Invert that and raise to the 10th power, and you get a probability of 0.9997344177567317 that there'll be at least one in every bucket. Raise the multiplier to 100, and the probability of a single whiff becomes 1.7478712517226947e-46; invert and raise to the tenth power, and it becomes closer to certainty than IEEE double precision can represent. Raise the length to 100 and the numbers get lower again; with multiplier 10, probability 0.9956920878572284 of having one in every bucket (that's a half a percent chance of a zero anywhere), and at multiplier 100, still underflows to certainty. But you'll notice that I wasn't actually talking about certainty here. I was talking about confidence, at levels sufficient to make data-type decisions on. Sure, there's no guarantee that every number will occur; but if there's a 0.4% chance that any number will be omitted, I think the list is going to work out more efficient. You'll notice that some of the other posts have been concerned more about correctness (for instance, using collections.Counter and then making sure there's a zero slot for every element - otherwise empty slots would be omitted), and then they _do_ acknowledge the chance that something will underflow. But with the numbers the OP gave, I would be fully satisfied with optimizing for the case where every bucket gets at least something. ChrisA |
In reply to this post by Steven D'Aprano-8
Peter Otten wrote:
> Steven D'Aprano wrote: >>> I wrote a very simple function to test random: >>> def test_random(length, multiplier = 10000): >>> number_list = length * [0] >>> for i in range(length * multiplier): >>> number_list[random.randint(0, length - 1)] += 1 >>> minimum = min(number_list) >>> maximum = max(number_list) >>> return (minimum, maximum, minimum / maximum) >> >> Putting aside the timing aspects, your frequency calculations are not >> done in a very Pythonic manner. > > I would agree if the frequency table were sparse, i. e. many indices with > > number_list[index] == 0 > > but that's not the case with on average 10000 hits per index. A common misconception. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. |
In reply to this post by Thomas 'PointedEars' Lahn
Chris Angelico wrote:
> On Mon, Jun 8, 2015 at 1:51 AM, Thomas 'PointedEars' Lahn > <PointedEars at web.de> wrote: >> Chris Angelico wrote: >> >>> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn >>> <PointedEars at web.de> wrote: >>>> Cecil Westerhof wrote: >>>>> I wrote a very simple function to test random: >>>>> def test_random(length, multiplier = 10000): >>>>> number_list = length * [0] >>>>> for i in range(length * multiplier): >>>>> number_list[random.randint(0, length - 1)] += 1 >>>>> minimum = min(number_list) >>>>> maximum = max(number_list) >>>>> return (minimum, maximum, minimum / maximum) >>>> >>>> As there is no guarantee that every number will occur randomly, using a >>>> dictionary at first should be more efficient than a list: >>> >>> Hmm, I'm not sure that's actually so. His code is aiming to get >>> 'multiplier' values in each box; for any serious multiplier (he starts >>> with 10 in the main code), you can be fairly confident that every >>> number will come up at least once. >> >> The wording shows a common misconception: that random distribution would >> mean that it is guaranteed or more probable that every element of the set >> will occur at least once. It is another common misconception that >> increasing the number of trials would increase the probability of that >> happening. But that is not so. > > The greater the multiplier, the lower the chance that any element will > have no hits. Wrong. > [ex falso quodlibet] -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. |
On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn
<PointedEars at web.de> wrote: >> The greater the multiplier, the lower the chance that any element will >> have no hits. > > Wrong. > >> [ex falso quodlibet] Huh. Do you want to explain how, mathematically, I am wrong, or do you want to join the RUE in my ignore list? ChrisA |
In reply to this post by Thomas 'PointedEars' Lahn
Thomas 'PointedEars' Lahn wrote:
> Peter Otten wrote: > >> Steven D'Aprano wrote: >>>> I wrote a very simple function to test random: >>>> def test_random(length, multiplier = 10000): >>>> number_list = length * [0] >>>> for i in range(length * multiplier): >>>> number_list[random.randint(0, length - 1)] += 1 >>>> minimum = min(number_list) >>>> maximum = max(number_list) >>>> return (minimum, maximum, minimum / maximum) >>> >>> Putting aside the timing aspects, your frequency calculations are not >>> done in a very Pythonic manner. >> >> I would agree if the frequency table were sparse, i. e. many indices with >> >> number_list[index] == 0 >> >> but that's not the case with on average 10000 hits per index. > > A common misconception. A pointless remark. Not counting the ears ;) |
In reply to this post by Chris Angelico
On Sun, Jun 7, 2015 at 10:44 AM, Chris Angelico <rosuav at gmail.com> wrote:
> On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn > <PointedEars at web.de> wrote: >>> The greater the multiplier, the lower the chance that any element will >>> have no hits. >> >> Wrong. >> >>> [ex falso quodlibet] > > Huh. Do you want to explain how, mathematically, I am wrong, or do you > want to join the RUE in my ignore list? My best speculation is that he's either objecting to the generality of your statement (it's false if the probability of some element occurring is zero or eventually degrades to zero), or misreading the word "multiplier" to the conclusion that the value of each element is being multiplied rather than the number of trials. Or trolling; I suppose that's always an option too. |
On Mon, Jun 8, 2015 at 3:07 AM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
> On Sun, Jun 7, 2015 at 10:44 AM, Chris Angelico <rosuav at gmail.com> wrote: >> On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn >> <PointedEars at web.de> wrote: >>>> The greater the multiplier, the lower the chance that any element will >>>> have no hits. >>> >>> Wrong. >>> >>>> [ex falso quodlibet] >> >> Huh. Do you want to explain how, mathematically, I am wrong, or do you >> want to join the RUE in my ignore list? > > My best speculation is that he's either objecting to the generality of > your statement (it's false if the probability of some element > occurring is zero or eventually degrades to zero), or misreading the > word "multiplier" to the conclusion that the value of each element is > being multiplied rather than the number of trials. Or trolling; I > suppose that's always an option too. My first thought was your first option, which is why I specifically responded with explanation about how data type selection can viably be about expectations rather than certainties, but he responded with a one-word "Wrong" so I must have been answering the wrong question. I've no idea about the misreading of "multiplier". A fourth possibility is that mathematics works differently for him and for us, which I suppose is possible; when I visited sci.math a while ago, I found some people for whom everything I'd learned in grade school was clearly wrong, and they were doing their best to enlighten the world about the new truths of mathematics that they'd found. ChrisA |
On 6/7/2015 10:20 AM, Chris Angelico wrote:
> A fourth > possibility is that mathematics works differently for him and for us, > which I suppose is possible; when I visited sci.math a while ago, I > found some people for whom everything I'd learned in grade school was > clearly wrong, and they were doing their best to enlighten the world > about the new truths of mathematics that they'd found. I had the unfortunate luck of taking "Harvard Calculus" in college (circa 1995). The textbook was nothing but word problems from end to end. The goal was to get the students away from symbolic thinking of traditional calculus into thinking about real world problems that uses calculus. (Never mind that the majority of students don't use calculus after they get out school.) I bailed out after two weeks. Taking the class at 7:30AM probably didn't help. Chris R. |
In reply to this post by Thomas 'PointedEars' Lahn
Chris Angelico wrote:
> On Mon, Jun 8, 2015 at 2:36 AM, Thomas 'PointedEars' Lahn > <PointedEars at web.de> wrote: >>> The greater the multiplier, the lower the chance that any element will >>> have no hits. >> Wrong. >> >>> [ex falso quodlibet] > > Huh. Do you want to explain how, mathematically, I am wrong, or do you > want to join the RUE in my ignore list? I already did; you have overlooked it. In a nutshell, the probability of 1 1 1 1 1 1 1 1 1 1 1 is *the same* as that of 1 2 3 4 5 6 7 8 9 1 2 and the same as that of 8 3 6 3 1 2 6 8 2 1 6. If the set to choose from is integer numbers from 1 to 9, then *each* of those sequences has *the same* probability (1?9)?? ? 3.1866355 ? 10???. Repeating that random experiment one more time, the probability of 1 1 1 1 1 1 1 1 1 1 1 2 is the same as of 1 2 3 4 5 6 7 8 9 1 2 2 and the same as that of 8 3 6 3 1 2 6 8 2 1 6 2 (that the next outcome is a 2): (1?9)?? ? 3.5407062 ? 10???. And the probability of 1 1 1 1 1 1 1 1 1 1 1 1 (only 1s) is the *same* as of 1 1 1 1 1 1 1 1 1 1 1 2 (only 1s and one 2) and the same as of 1 1 1 1 1 1 1 1 1 1 1 3 (only 1s and a 3) and of 8 3 6 3 1 2 6 8 2 1 6 4 (that the next outcome is a 4, and there are no 5s at all). AISB, those are *independent* events; the number of occurrences of an outcome before *does not matter* for the probability of the next one. And so the probability of getting a certain number does _not_ change depending on the number of times you repeat the experiment. It is always the same; in this example, it is always 1?9. And the probability of _not_ getting a certain number is always the same; in this example, it is always 1 ? 1?9 = 8?9. So, I am sorry to tell you this, but you do _not_ understand probability. And you *cannot* understand it intuitively, like you tried to. Probability is counter-intuitive. That should not surprise or worry you. Because as I and others have pointed out in this thread, the majority of the human population, including the OP, does not understand probability ? until it has been explained to them. I am not even sure I understand probability in every instance, or rather, I am sure that *intuitively* I do not. It is why humans play the lottery using their ?lucky numbers? whereas other numbers have exactly the same probability of being drawn; why they think a baby boy ?is due? when they have two girls already; why they are surprised if two people in the room have the same birthday; why *in the short run* you can make a fortune by running a casino even if the tables are not rigged; why the ?gate three? trick works (for a while); why they pay money for horoscopes and truthsayers, not recognizing that their wording is so ambiguous that it is highly probable to apply to everyone, and why many of them believe in a deity or fate. A big part of the reason is psychology, how the human mind works: The human mind has evolved to recognize patterns. It is an evolutionary advantage to be able to recognize patterns, to tell a mammoth from a sabre-stooth lion at a distance, a poisonous from an edible plant, and an X from a U. So humans try to see patterns everywhere, even in pure randomness: clouds, texts, numbers, you name it. And when they are told that something is random, and they find patterns regardless, they do not believe that it is random. An extension of that misconception is emphasized by an anecdote (which may be apocryphal) told about Richard Feynman (I heard it from Lawrence Krauss in ?A Universe from Nothing?; he can tell that in a much more funny way than I am able to reproduce it here [1]): Richard Feynman used to go up to people all the time and he?d say: ?You won?t believe what happend to me today! You won?t believe what happend to me today!? And people would say: ?What?? And he would reply: ?Absolutely nothing!? Because humans are evolutionary driven to look for patterns, and they believe that everything that happens to them is important and has significance in ?the greater scheme of things? (whose mere existence is a misconception, too). HTH ________ [1] Lawrence KRAUSS (2009). ?A Universe from Nothing?. The Mariott hotel, Burbank. 1:03:12. <https://youtu.be/-EilZ4VY5Vs?t=3792> -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. |
In reply to this post by Thomas 'PointedEars' Lahn
On Mon, 8 Jun 2015 01:51 am, Thomas 'PointedEars' Lahn wrote:
> Chris Angelico wrote: > >> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn >> <PointedEars at web.de> wrote: >>> Cecil Westerhof wrote: >>>> I wrote a very simple function to test random: >>>> def test_random(length, multiplier = 10000): >>>> number_list = length * [0] >>>> for i in range(length * multiplier): >>>> number_list[random.randint(0, length - 1)] += 1 >>>> minimum = min(number_list) >>>> maximum = max(number_list) >>>> return (minimum, maximum, minimum / maximum) >>> >>> As there is no guarantee that every number will occur randomly, using a >>> dictionary at first should be more efficient than a list: >> >> Hmm, I'm not sure that's actually so. His code is aiming to get >> 'multiplier' values in each box; for any serious multiplier (he starts >> with 10 in the main code), you can be fairly confident that every >> number will come up at least once. > > The wording shows a common misconception: that random distribution would > mean that it is guaranteed or more probable that every element of the set > will occur at least once. It is another common misconception that > increasing the number of trials would increase the probability of that > happening. But that is not so. You are badly mistaken. Of course you are correct that a random distribution does not *guarantee* that every element of the set is seen at least once. But as soon as you call it "a common misconception" that increasing the number of trials increases the probability of seeing every element, you are simply mistaken. It's not a misconception at all, it is true. See below for calculations. [...] > If the distribution is even and the game is fair, every outcome *always* > has > the same probability of occurring. As a result, every sequence of > outcomes of equal length *always* has the same probability of occurring, > and the probability for a particular sequence of equal length does _not_ > increase or > decrease based on the number of occurrences of previous outcomes. Those > are *independent* events. You are confused. Think about it more carefully. To make it simple, let's randomly generate from the set 1, 2, 3, and do just three samples. That's small enough to enumerate all the possibilities: 111 112 113 121 122 123 131 132 133 211 212 213 221 222 223 231 232 233 311 312 313 321 322 323 331 332 333 Each event (111 etc) has a probability of 1/27, but that's not important. We're comparing the probability of all elements being seen, versus at least one element being missed. I'm going to tag the events that contain all three elements: 111 112 113 121 122 123* 131 132* 133 211 212 213* 221 222 223 231* 232 233 311 312* 313 321* 322 323 331 332 333 So there are six events out of 27, or 2/9, where all elements are seen in a trial, versus 7/9 where at least one element will have a count of zero. Putting it another way: Pr(at least one element has zero count) = Pr(1 has zero count) + Pr(2 has zero count) + Pr(3 has zero count) - Pr(1 and 2 both have zero counts) - Pr(1 and 3 both have zero counts) - Pr(2 and 3 both have zero counts) = (2/3)**3 * 3 - (1/3)**3 * 3 which gives us 7/9 as confirmed by direct enumeration of all the possibilities. With four samples per trial: Pr(at least one element has zero count) = 5/9 With five samples: Pr(at least one element has zero count) = 31/81 Ten samples: Pr(at least one element has zero count) = 341/6561 or approximately 0.05 Twenty samples: Pr(at least one element has zero count) = 349525/387420489 or approximately 0.0009 For 100 samples: Pr(at least one element has zero count) = 7.4e-18 (approx) which means that with 100 samples from the set (1, 2, 3), if you could run one million trials every second, it would on average take you almost 4300 years to see a trial that had a zero count for one or more of the elements. The probability of any element *not* showing up in a random trial with N samples decreases rapidly as N increases. Elements with zero count are only a factor when the number of samples is small, or the number of elements is large. -- Steven |
In reply to this post by Thomas 'PointedEars' Lahn
On Mon, 8 Jun 2015 02:36 am, Thomas 'PointedEars' Lahn wrote:
> Chris Angelico wrote: >> The greater the multiplier, the lower the chance that any element will >> have no hits. > > Wrong. No, Chris is correct. The "multiplier" increases the number of samples. The larger the number of samples, the lower the chance that any element is not represented. If my previous post didn't convince you, consider an even simpler random distribution: tossing a fair coin. The probability of getting a head is exactly 1/2 whether you toss the coin once or a thousand times. But if you toss the coin once, your chances of getting "no heads" is 1/2. If you toss it a thousand times, your chances of getting "no heads" is 1/2**1000. -- Steven |
Free forum by Nabble | Edit this page |