Hello,
I'm a newbie to Python. I have a list which contains integers (about 80,000). I want to find a quick way to get the numbers that occur in the list more than once, and how many times that number is duplicated in the list. I've done this right now by
looping through the list, getting a number, querying the list to find out how many times the number exists, then writing it to a new list. On this many records it takes a couple of minutes. What I am looking for is something in python that can grab this info
without looping through a list.
Thanks!
[hidden email] wrote: > I'm a newbie to Python. I have a list which contains integers (about 80,000). I want to find a quick way to get the numbers that occur in the list more than once, and how many times that number is duplicated in the list. I've done this right now by looping through the list, getting a number, querying the list to find out how many times the number exists, then writing it to a new list. On this many records it takes a couple of minutes. What I am looking for is something in python that can grab this info without looping through a list. icount = {} for i in list_of_ints: icount[i] = icount.get(i, 0) + 1 Now you have a dictionary of every integer in the list and the count of times it appears.
> Hello, > > I'm a newbie to Python. I have a list which contains integers (about > 80,000). I want to find a quick way to get the numbers that occur in > the list more than once, and how many times that number is duplicated > in the list. I've done this right now by looping through the list, > getting a number, querying the list to find out how many times the > number exists, then writing it to a new list. On this many records it > takes a couple of minutes. What I am looking for is something in > python that can grab this info without looping through a list. > Why not build a histogram? $ cat test.py from random import randint l = list() for i in xrange(80000): l.append(randint(0,10)) hist = dict() for i in l: hist[i] = hist.get(i, 0) + 1 for i in range(10): print "%s: %s" % (i, hist.get(i, 0)) $ time python test.py 0: 7275 1: 7339 2: 7303 3: 7348 4: 7206 5: 7323 6: 7230 7: 7348 8: 7166 9: 7180 real 0m0.533s user 0m0.518s sys 0m0.011s
[...] > $ cat test.py > from random import randint > > l = list() > for i in xrange(80000): > l.append(randint(0,10)) ^^^^^^^^^^^^^^^^^^^^^^^ should have been: l.append(randint(0,9)) > > hist = dict() > for i in l: > hist[i] = hist.get(i, 0) + 1 > > for i in range(10): > print "%s: %s" % (i, hist.get(i, 0)) > > > > $ time python test.py > 0: 7275 > 1: 7339 > 2: 7303 > 3: 7348 > 4: 7206 > 5: 7323 > 6: 7230 > 7: 7348 > 8: 7166 > 9: 7180 > > real 0m0.533s > user 0m0.518s > sys 0m0.011s
> > l = list() > > for i in xrange(80000): > > l.append(randint(0,10)) > ^^^^^^^^^^^^^^^^^^^^^^^ > should have been: > l.append(randint(0,9)) Or even: l = [randint(0,9) for x in xrange(80000)]
Hi D'Arcy J.M. Cain,

Thank you. I tried this and my list of 76,979 integers got reduced to a dictionary of 76,963 items, each item listing the integer value from the list, a comma, and a 1. I think what this is doing is finding all integers from my list that are unique (only one instance of it in the list), instead of creating a dictionary with integers that are not unique, with a count of how many times they occur. My dictionary should contain only 11 items listing 11 integer values and the number of times they appear in my original list. Thanks, Paul J. Scipione GIS Database Administrator work: 602-371-7091 cell: 480-980-4721 -----Original Message----- From: D'Arcy J.M. Cain Sent: Thursday, March 26, 2009 12:50 PM To: Scipione, Paul (ZP5296) Subject: Re: Find duplicates in a list and count them ... > I'm a newbie to Python. I have a list which contains integers (about 80,000). I want to find a quick way to get the numbers that occur in the list more than once, and how many times that number is duplicated in the list. I've done this right now by looping through the list, getting a number, querying the list to find out how many times the number exists, then writing it to a new list. On this many records it takes a couple of minutes. What I am looking for is something in python that can grab this info without looping through a list. icount = {} for i in list_of_ints: icount[i] = icount.get(i, 0) + 1 Now you have a dictionary of every integer in the list and the count of times it appears.
On Thu, Mar 26, 2009 at 5:14 PM, <[hidden email]> wrote: Hi D'Arcy J.M. Cain, Not all of the values are 1. The 11 duplicates will be higher. Just iterate through the dict to find all keys with values > 1. >>> icounts {1: 2, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 5, 8: 3, 9: 1, 10: 1, 11: 1} Python 2.x : >>> dups = {} >>> for key, value in icounts.iteritems() : ... if value > 1 : ... dups[key] = value ... >>> dups {8: 3, 1: 2, 7: 5} Python 3.0 : >>> dups = {key:value for key, value in icounts.items() if value > 1} >>> dups {8: 3, 1: 2, 7: 5}
> > > On Thu, Mar 26, 2009 at 5:14 PM, <[hidden email] > <mailto:[hidden email]>> wrote: > > Hi D'Arcy J.M. Cain, > > Thank you. I tried this and my list of 76,979 integers got reduced > to a dictionary of 76,963 items, each item listing the integer value > from the list, a comma, and a 1. I think what this is doing is > finding all integers from my list that are unique (only one instance > of it in the list), instead of creating a dictionary with integers > that are not unique, with a count of how many times they occur. My > dictionary should contain only 11 items listing 11 integer values > and the number of times they appear in my original list. > > > > Not all of the values are 1. The 11 duplicates will be higher. Just > iterate through the dict to find all keys with values > 1. > > >>> icounts > {1: 2, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 5, 8: 3, 9: 1, 10: 1, 11: 1} > > Python 2.x : > >>> dups = {} > >>> for key, value in icounts.iteritems() : > ... if value > 1 : > ... dups[key] = value > ... > >>> dups > {8: 3, 1: 2, 7: 5} > > > Python 3.0 : > >>> dups = {key:value for key, value in icounts.items() if value > 1} > >>> dups > {8: 3, 1: 2, 7: 5} > >>> dups = dict((key, value) for key, value in icounts.iteritems() if value > 1) >>> dups {8: 3, 1: 2, 7: 5}
> Hi D'Arcy J.M. Cain, > > Thank you. I tried this and my list of 76,979 integers got reduced to a dictionary of 76,963 items, each item listing the integer value from the list, a comma, and a 1. I doubt this very much. Please show: (a) your implementation of D'Arcy's suggestion (b) the code you used that lead you to the conclusion that all counts were 1. See example below. > I think what this is doing is finding all integers from my list that are unique (only one instance of it in the list), instead of creating a dictionary with integers that are not unique, with a count of how many times they occur. My dictionary should contain only 11 items listing 11 integer values and the number of times they appear in my original list. The only way of getting your desired result is to get a dict of counts and then to filter out the ones where the count is greater than one. D'Arcy appears to have presumed that it was not necessary to show the second stage :-) [assuming Python 2.6] >>> list_of_ints = [999, 2, 3, 999, 2, 2, 8, 42, 999, 42, 5] >>> len(list_of_ints) 11 >>> icount = {} >>> for i in list_of_ints: ... icount[i] = icount.get(i, 0) + 1 ... >>> icount {2: 3, 3: 1, 5: 1, 999: 3, 8: 1, 42: 2} >>> len(icount) 6 >>> all(count == 1 for count in icount.itervalues()) False >>> dups = dict((k, v) for k, v in icount.iteritems() if v > 1) >>> dups {2: 3, 42: 2, 999: 3} >>> HTH, John
or l = ( randint(0,9) for x in xrange(80000) ) > On Thu, 26 Mar 2009 16:00:01 -0400 > Albert Hopkins <[hidden email]> wrote: > > > l = list() > > > for i in xrange(80000): > > > l.append(randint(0,10)) > > ^^^^^^^^^^^^^^^^^^^^^^^ > > should have been: > > l.append(randint(0,9)) > > Or even: > > l = [randint(0,9) for x in xrange(80000)]
