# Basic dictionary question

10 messages
Open this post in threaded view
|

## Basic dictionary question

 So here's my situation.  I have a set of XYZ tuples that'll come very close to matching, most of them in pairs, but thanks to floating point, I can only rely on "fuzzy equality" i.e. within a tolerance of say |e| per each x,y,z. In other words, for all intents and purposes, I could consider (1.0, 0.0, 0.0) and (1.001, 0.0, 0.0) to be the same. My thought was to use a dictionary, indexing with these tuples.  Except they wouldn't be primitive tuples, I'd define an object with fuzzy equality implemented as __eq__.  Then (1.0, 0.0, 0.0) and (1.001, 0.0, 0.0) would both key to the same value, cuz they're equal i.e. are both the same key, right? Wrong I think, per the session below.  The one object (1.0, 0.0, 0.0) is "in" a list of other objects (o1), thanks to fuzzy equality, but in adding to a dictionary, they key separately.  Instead of figuring it all out, lemme just put this out and there see if I fish up any expert patter. Kirby >> class Sometuple(object):         e = 0.01         def __init__(self,value):                 self.thetuple = value         def __eq__(self, other):                 if abs(self.thetuple[0]-other.thetuple[0])>> o0 = Sometuple((1,0,0)) >>> o1 = Sometuple((1.001,0,0)) >>> o0==o1 True >>> thedict = {o0:'a'} >>> thedict {<__main__.Sometuple object at 0x00C93B90>: 'a'} >>> thedict[o1]='b' >>> thedict {<__main__.Sometuple object at 0x00C93B90>: 'a', <__main__.Sometuple object at 0x00C93D10>: 'b'} >>> o0 in [o1,o1,01] True >>> o0 in [o1,o1,o1] True >>> o1 in thedict.keys() True >>> o0 in thedict.keys() True _______________________________________________ Edu-sig mailing list [hidden email] http://mail.python.org/mailman/listinfo/edu-sig
Open this post in threaded view
|

## Re: Basic dictionary question

 Alas, you can't do that, and the problem isn't a deficiency of dicts, but the fact that "almost equal" is not a transitive relationship. Think about it. (Boy that brings back memories of math classes in college, way, way, back, where a professor pointed this out casually as an counter-illustration of transitive relationships.) Superficially, the problem you have is that you're overriding __eq__ without __hash__. The dict implementation uses the __hash__ method partitions objects into (conceptual) buckets with the same hash value, so that if two objects are equal, they occur in the same hash bucket, but not necessarily the other way around -- two objects in the same hash bucket may still be unequal. But because you're defining equality in the peculiar way that you do, it's impossible to create a non-trivial __hash__ method that satisfies this requirement. A near-solution would be to make __hash__ return a constant; that will cause you a serious degradation of the dict performance, but more seriously, it will cause undefined behavior. Here's why. Consider three numbers A, B and C such that A == B == B but A != C. In your example, where e is 0.01, we could pick A = 1, B = 1.007, C = 1.014 in one dimension, and equal in the others. Now if you put A and C in the dict, the dict will have two keys. But when you add B, will it map to A or to C? That depends on the order in which the dict probes the values! You never said what you wanted to do with these points -- are you looking to find which points are in fact close enough to be considered the same point? I suppose you could sort them by the vector length (sqrt(x**2 + y**2 + z**2), that sort of thing) and then you only have to compare points that are close together in the sorted list (as long as the abs values are closer than your epsilon). Or you could divide 3-space up into cubes with sides equal epsilon, and then you'd only have to look for points that end up in the same or in adjacent cubes. --Guido On 10/8/05, Kirby Urner <[hidden email]> wrote: > > So here's my situation.  I have a set of XYZ tuples that'll come very close > to matching, most of them in pairs, but thanks to floating point, I can only > rely on "fuzzy equality" i.e. within a tolerance of say |e| per each x,y,z. > In other words, for all intents and purposes, I could consider (1.0, 0.0, > 0.0) and (1.001, 0.0, 0.0) to be the same. > > My thought was to use a dictionary, indexing with these tuples.  Except they > wouldn't be primitive tuples, I'd define an object with fuzzy equality > implemented as __eq__.  Then (1.0, 0.0, 0.0) and (1.001, 0.0, 0.0) would > both key to the same value, cuz they're equal i.e. are both the same key, > right? > > Wrong I think, per the session below.  The one object (1.0, 0.0, 0.0) is > "in" a list of other objects (o1), thanks to fuzzy equality, but in adding > to a dictionary, they key separately.  Instead of figuring it all out, lemme > just put this out and there see if I fish up any expert patter. > > Kirby > > > > >> class Sometuple(object): >         e = 0.01 >         def __init__(self,value): >                 self.thetuple = value >         def __eq__(self, other): >                 if abs(self.thetuple[0]-other.thetuple[0])                    and abs(self.thetuple[1]-other.thetuple[1])                    and abs(self.thetuple[2]-other.thetuple[2])                         return True >                 else: >                         return False > > > >>> o0 = Sometuple((1,0,0)) > >>> o1 = Sometuple((1.001,0,0)) > >>> o0==o1 > True > >>> thedict = {o0:'a'} > >>> thedict > {<__main__.Sometuple object at 0x00C93B90>: 'a'} > >>> thedict[o1]='b' > >>> thedict > {<__main__.Sometuple object at 0x00C93B90>: 'a', <__main__.Sometuple object > at 0x00C93D10>: 'b'} > > >>> o0 in [o1,o1,01] > True > >>> o0 in [o1,o1,o1] > True > > >>> o1 in thedict.keys() > True > >>> o0 in thedict.keys() > True > > > _______________________________________________ > Edu-sig mailing list > [hidden email] > http://mail.python.org/mailman/listinfo/edu-sig> -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Edu-sig mailing list [hidden email] http://mail.python.org/mailman/listinfo/edu-sig
Open this post in threaded view
|

## Re: Basic dictionary question

 Talk about fishing for expert help!   Thanks Guido. It's a pruning algorithm where I strip way pieces that don't meet up at (x,y,z) bridge points.  Lots of symmetry about the origin so just pure distance won't work (not unique enough).   I think might still get away with a tuple-indexed dict (tallying for pairs, pruning the rest) if I boil my floating points down to fixed decimals -- a chance to play with the new decimal type perhaps. I appreciate the quick lecture on intransitive nature of fuzzy equality and __hash__ vs. __eq__.  Helps me think. Kirby > You never said what you wanted to do with these points -- are you > looking to find which points are in fact close enough to be considered > the same point? I suppose you could sort them by the vector length > (sqrt(x**2 + y**2 + z**2), that sort of thing) and then you only have > to compare points that are close together in the sorted list (as long > as the abs values are closer than your epsilon). Or you could divide > 3-space up into cubes with sides equal epsilon, and then you'd only > have to look for points that end up in the same or in adjacent cubes. > > --Guido _______________________________________________ Edu-sig mailing list [hidden email] http://mail.python.org/mailman/listinfo/edu-sig
Open this post in threaded view
|

## Re: Basic dictionary question

 Kirby Urner wrote: > Talk about fishing for expert help!   > > Thanks Guido. > > It's a pruning algorithm where I strip way pieces that don't meet up at > (x,y,z) bridge points.  Lots of symmetry about the origin so just pure > distance won't work (not unique enough).   > > I think might still get away with a tuple-indexed dict (tallying for pairs, > pruning the rest) if I boil my floating points down to fixed decimals -- a > chance to play with the new decimal type perhaps. Beware:      1.999999999999999 and 2.000000000001 likely won't lie in the same bin.  You'll need to check neighbors for candidates, unless your fixed point stuff reflects an underlying granularity. -- Scott David Daniels [hidden email] _______________________________________________ Edu-sig mailing list [hidden email] http://mail.python.org/mailman/listinfo/edu-sig
Open this post in threaded view
|

## Re: Basic dictionary question

 > Beware: >      1.999999999999999 and 2.000000000001 likely won't lie in the same > bin.  You'll need to check neighbors for candidates, unless your fixed > point stuff reflects an underlying granularity. > > -- Scott David Daniels > [hidden email] Thanks for the heads up Scott. I'm translating a volume 3 cube around in a 3D checkerboard, trying to flag all kitty-corner mid-edges (my bridge points), but not flagging border-line edges where the connector tabs shouldn't appear -- some large borg-like array of icosahedra connected by zig-zag tensed bands (Lanahan's patent pending). See http://www.4dsolutions.net/satacad/pythonicmath/icosa_in_holder.jpg for a picture of a unit cell (the array might contain thousands). So the XYZ tuples of interest are indeed highly distinct and in a regular grid pattern.  It's just that there're a whole lot of them, and if I could find a hash key that'd keep only the unique ones unique and ignore floating point fuzz, I'd be set.  Any pairing/doubling would define a bridge point, and therefore a green light to render a tab connector. I think the key is probably to transform the grid by scaling, such that my mid-edge points of interest get expressed using all integer coordinates. Even though my actual map is floating point, the isomorphism will work for pruning, and the tuples'll be easy to manage (so maybe scratch the decimal type idea for now). I'll post again when I have more research behind me and either a solution or concrete frustration in sight -- not before. Kirby _______________________________________________ Edu-sig mailing list [hidden email] http://mail.python.org/mailman/listinfo/edu-sig
Open this post in threaded view
|

## Re: Basic dictionary question

 On 10/9/05, Kirby Urner <[hidden email]> wrote: > I'll post again when I have more research behind me and either a solution or > concrete frustration in sight -- not before. I think you should just sort them by (x, y, z) and then comb through the list looking for near-adjacent matches. Basically you have to compare each point to all points following it in the list until the x coordinate is more than epsilon away. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Edu-sig mailing list [hidden email] http://mail.python.org/mailman/listinfo/edu-sig
Open this post in threaded view
|

## Re: Basic dictionary question

 In reply to this post by Kirby Urner Kirby Urner wrote: > I'm translating a volume 3 cube around in a 3D checkerboard, trying to flag > all kitty-corner mid-edges (my bridge points), but not flagging border-line > edges where the connector tabs shouldn't appear -- some large borg-like > array of icosahedra connected by zig-zag tensed bands (Lanahan's patent > pending). > > See http://www.4dsolutions.net/satacad/pythonicmath/icosa_in_holder.jpg for > a picture of a unit cell (the array might contain thousands). > > So the XYZ tuples of interest are indeed highly distinct and in a regular > grid pattern.  It's just that there're a whole lot of them, and if I could > find a hash key that'd keep only the unique ones unique and ignore floating > point fuzz, I'd be set.  Any pairing/doubling would define a bridge point, > and therefore a green light to render a tab connector. > > I think the key is probably to transform the grid by scaling, such that my > mid-edge points of interest get expressed using all integer coordinates. > Even though my actual map is floating point, the isomorphism will work for > pruning, and the tuples'll be easy to manage (so maybe scratch the decimal > type idea for now). > > I'll post again when I have more research behind me and either a solution or > concrete frustration in sight -- not before. > > Kirby Does something like this work?  The idea is two exact matches on the discretization and one off-by-one on the third discretization merits a check.  discrete(point) gives the "integer-ized" coordinates.  There is some fancy dancing to avoid the distance check (done as the square just to avoid a few square roots) unless possibly helpful.  It only tries to solve "near-misses" (skips exact matches) on the theory you can already get the exact matches from a dictionary. def clustered(source_of_points, threshold2)      xy = {}      xz = {}      yz = {}      for point in source_of_points:          ix, iy, iz = discrete(point)          xy.setdefault([]).append((iz, point))          xz.setdefault([]).append((iy, point))          yz.setdefault([]).append((ix, point))      for collection in (xy, xz, yz):          for parts in collection.itervalues():              if len(parts) > 1:                  parts.sort()                  for (da, a), (db, b) in zip(parts, parts[1:]):                      if 0 <= db - da <= 1:                          if 0 < dist2(a, b) < threshold2:                              yield a, b -- -- Scott David Daniels [hidden email] _______________________________________________ Edu-sig mailing list [hidden email] http://mail.python.org/mailman/listinfo/edu-sig
Open this post in threaded view
|