Python implementation of the Ramer-Douglas-Peucker Algorithm

Question

I recently implemented the RDP polygon approximation algorithm in Python and I'm skeptical of whether or not I implemented it correctly of with the greatest efficiency. The algorithm runs in around 0.0003 seconds for a polygon with 30 points on my computer (an Intel Core i5 with 3.8 GHz of RAM), so I'm worried about how it may run on a slower computer. Also, there seems to be a cap as to the number of points that can be removed by the algorithm, or at least there's a cap in my implementation. No matter how high I set the tolerance, the approximation always caps at about \$\frac{2N}{3}\$ points where \$N\$ is the number of points in the input polygon. Could I be doing something wrong?

NegInf = float('-inf')

def distance(v1, v2):
    """
    Calculate the distance between two points.

    PARAMETERS
    ==========
        v1, v2 >> The first and second vertices respectively.
    """
    dx = v2[0] - v1[0]
    dy = v2[1] - v1[1]
    return math.sqrt(dx*dx + dy*dy)

def perpendicular_distance(point, line_start, line_end):
    """
    Calculate the perpendicular distance from a point to a line.

    PARAMETERS
    ==========
        point >> The point of which to calculate the distance from the line
            (must be an (x, y) tuple).

        line_start, line_end >> Start and end points defining the line respectively
            (must each be an (x, y) tuple).
    """
    x1, y1 = line_start
    x2, y2 = line_end
    vx, vy = point
    if x1 == x2:
        return abs(x1 - vx)
    m = (y2 - y1)/(x2 - x1)
    b = y1 - m*x1
    return abs(m * vx - vy + b)/math.sqrt(m*m + 1)

def _rdp_approx(points, tolerance, depth):
    """
    Internal Function: Recursively perform the RDP algorithm.
    """
    if not points:
        # In case the furthest point index discovered is equal to the length of the
        # list of points, leading to points[furthest:] sending in an empty list.
        return []
    elif len(points) <= 2:
        # BASE CASE:: No points to remove, only the start and the end points of the line.
        # Return it.
        return points
    elif len(points) == 3:
        # BASE CASE:: Our decomposition of the polygon has reached a minimum of 3 points.
        # Now all that is left is to remove the point in the middle (assuming it's distance
        # from the line is greater than the set tolerance).
        dist = perpendicular_distance(points[1],
                                      points[0],
                                      points[2]
                                      )
        if dist < tolerance:
            return [points[0], points[-1]]
        return points

    max_dist = NegInf
    furthest = None

    start = 0
    start_point = points[start]

    if depth == 1:
        # In the initial approximation, we are given an entire polygon to approximate. This
        # means that the start and end points are the same, thus we cannot use the perpendicular
        # distance equation to calculate the distance a point is from the start since the start is
        # not a line. We have to use ordinary distance formula instead.
        get_distance = lambda point: distance(point, start_point)
    else:
        end_point = points[-1]
        get_distance = lambda point: perpendicular_distance(point, start_point, end_point)

    # Find the farthest point from the norm.
    for i, point in enumerate(points[1:], 1):
        dist = get_distance(point)
        if dist > max_dist:
            max_dist = dist
            furthest = i

    # Recursively calculate the RDP approximation of the two polygonal chains formed by
    # slicing at the index of the furthest discovered point.
    prev_points = _rdp_approx(points[:furthest+1], tolerance, depth+1)
    next_points = _rdp_approx(points[furthest:], tolerance, depth+1)

    new_points = []
    for point in prev_points + next_points:
        # Filter out the duplicate points whilst maintaining order.
        # TODO:: There's probably some fancy slicing trick I just haven't figured out
        # that can be applied to prev_points and next_points so that we don't have to
        # do this, but it's not a huge bottleneck so we needn't worry about it now.
        if point not in new_points:
            new_points.append(point)

    return new_points 

def rdp_polygon_approximate(coordinates, tolerance):
    """
    Use the Ramer-Douglas-Peucker algorithm to approximate the points on a polygon.

    The RDP algorithm recursively cuts away parts of a polygon that stray from the
    average of the edges. It is a great algorithm for maintaining the overall form
    of the input polygon, however one should be careful when using this for larger
    polygons as the algorithm has an average complexity of T(n) = 2T(n/2) + O(n) and
    a worst case complexity of O(n^2).

    PARAMETERS
    ==========
        coordinates >> The coordinates of the polygon to approximate.

        tolerance >> The amount of tolerance the algorithm will use. The tolerance
            determines the minimum distance a point has to sway from the average
            before it gets deleted from the polygon. Thus, setting the tolerance to
            be higher should delete more points on the final polygon.

            That said, due to how the algorithm works there is a limit to the number
            of vertices that can be removed on a polygon. Setting the tolerance to
            float('inf') or sys.maxsize will not approximate the polygon to a single
            point. Usually the minimum points an approximated polygon can have if the
            original polygon had N points is between 2N/3 and N/3.

    FURTHER READING
    ===============
    For further reading on the Ramer-Douglas-Peucker algorithm, see
    http://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm
    """
    return _rdp_approx(coordinates, tolerance, 1)

if __name__ == '__main__':
    poly = [(3, 0), (4, 2), (5, 2), (5.5, 3), (5, 4), (4, 5), (5, 6),
      (7, 5), (7, 3), (8, 2.5), (8, 4), (9, 5), (8, 7), (7, 8), (6, 7),
      (4, 7.75), (3.5, 7.5), (3, 8), (3, 8.5), (2.5, 9), (1, 9), (0, 8),
      (2, 7), (1, 7), (0, 6), (1, 4), (2, 5), (2, 2), (3, 3), (2, 1)]
    print(rdp_polygon_approximate(poly, 3))
    print(rdp_polygon_approximate(poly, float('inf')))

You probably forgot to close your parenthesis at the end of your prints at the end of your program. Which version of Python are you using? — Morwenn, May 15 '14 at 8:42
My bad! Thanks for catching that, I'm using python 3 so that indeed would have failed. — user3002473, May 15 '14 at 8:47

Morwenn · Accepted Answer · 2014-05-15 11:18:55Z

up vote 3 down vote accepted

If you want to calculate the farthest point you don't need to use the square root for 'real' distance calculation. Since you only need the distance for comparison and not the actual distance it is totally sufficient to compare by x*x + y*y instead of sqrt(x*x + y*y)

edited May 15 '14 at 11:18

Morwenn
12.4k23580

answered May 15 '14 at 10:59

anhoppe
1463

In this case, the perpendicular_distance also probably needs a simpler way to produce a squared distance. – Morwenn May 15 '14 at 11:20

That's a good suggestion, I'll start looking into a way to implement a squared perpendicular distance function. Thanks! – user3002473 May 15 '14 at 22:03

add a comment |

Morwenn · Answer 2 · 2014-05-15 08:58:57Z

You can simplify your distance function with the function math.hypot(x, y):

dx = v2[0] - v1[0]
dy = v2[1] - v1[1]
return math.hypot(dx, dy)

math.hypot(x, y) computes directly math.sqrt(x*x + y*y). Morevoer, the CPython implementation should be based on the underlying C function hypot; therefore, this function is safer than the naive implementation since it does its best to avoid overflows and underflows at an intermediate stage of the computation.

From a design point of view, one thing I would probably do to avoid passing coordinates everywhere is to create at least two classes, Point and Line:

class Point:

def __init__(self, x, y):
    self.x = x
    self.y = y

class Line:

def __init__(self, start, end):
    self.start = start
    self.end = end

Then, I would have done a "generic" distance function that can compute the distance between a point and virtually anything. With Python 3.4 singledispatch decorator, it would be something like this (I did not test the code, it is merely here to give you an idea):

from singledispatch import singledispatch

@singledispatch
def distance(shape, point):
    pass

@distance.register(Point)
def _(arg, point):
    """distance from point ot point"""
    # implementation

@distance.register(Line)
def _(arg, point):
    """distance from point to line)"""
    # implementation

Unfortunately, this is only a design idea, not an optimization one, but it would at least help to write code easier to write, with an API based on actual types, and not on mere coordinates.

I did consider using math.hypot, however running some tests with timeit gave some interesting results: distance with math.sqrt: 7.09003095935727e-07s, distance with math.hypot: 8.529042939559872e-07s. I'm pretty sure it has to do something with the fact that math.hypot is not just a square root function in C, it uses a different implementation (as seen at en.wikipedia.org/wiki/Hypot). — user3002473, May 15 '14 at 8:44
@user3002473 That's actually rather interesting. It shows again that there may be an actual trade-off between safe vs fast :) — Morwenn, May 15 '14 at 9:02

asked	1 year ago
viewed	1091 times
active	1 year ago

current community

your communities

more stack exchange communities

Python implementation of the Ramer-Douglas-Peucker Algorithm

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python optimization algorithm python3 computational-geometry or ask your own question.

Visit Chat

Hot Network Questions

current community

your communities

more stack exchange communities

Python implementation of the Ramer-Douglas-Peucker Algorithm

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python optimization algorithm python3 computational-geometry or ask your own question.

Visit Chat

Related

Hot Network Questions