Lloyd's algorithm in normed vector spaces

Question

How do I run Lloyd's algorithm in a normed vector space?

The space:

L*a*b* color space, finite sRGB segment, $R^3$

The distance metric:

CIE94 using L*C*h* information derived from the L*a*b* coordinates.

The sub-problems (so far):

how to compute a voronoi diagram in a normed vector space.
how to compute the centroid and therefore volume in such a space.

If I am working with a polygon (ie, a voronoi cell) whose points are computed relative to the previous point using the distance metric, I don't believe I can represent the points contained within the polygon using straight lines. In other words, while the triangle inequality may hold (single shortest path between two points), the shortest path may not be a straight line. This means, as far as I know, I can't depend on line-based algorithms to calculate voronoi diagrams. Furthermore this complicates calculating the volume of geometric primitives such as spheres and prisms.

Additionally, do the concepts of volume (necessary for calculating the centre of mass) carry-over from Euclidean space? For example, is the volume of the rectangular prism formed from two points on opposing corners, p1 and p2, still calculated as: d(p1.x, p2.x) * d(p1.y - p2.y) * d(p1.z - p2.z).

Alternatively:

I am working only with 2**24 different points - one point for each unique colour in a 3-channel 8-bit unsigned integer image. When working with voxels (a unit cube, one per point) in Euclidean space the voronoi diagram can be created in O(n) time (n == # of voxels) using a flood-fill algorithm, which has benefit of dynamically updating the centroid for each voronoi cell (the centre of mass is updated each time a voxel is added)

As an alternative to working in a non-Euclidean space, I would accept any answer that demonstrates how to map points in the CIE94 space back to the L*a*b* (Euclidean) space preserving both distance and volume. That is, for each of the 2**24 points p in L*a*b* space, map p to a new point p1 in L*a*b* space that preserves volume and distance per the CIE94 distance metric.

Overview:

My goal is to find the N most perceptually distinguishable colours, given P pre-existing colours. Since the colour difference in L*a*b* is equivalent to distance, this problem generalizes to the sphere-packing problem in an irregular (sRGB) space.

To handle P pre-existing colours I will modify Lloyd's algorithm such that only the points in N will be shifted.

I think (probably) that Lloyd's algorithm will only get me a Random Close Pack configuration. Therefore the starting positions of the points in N will be as closely aligned to the centres of spheres in a Hexagonal Close Pack arrangement as the points in P will allow. However, this will only be possible using the alternative solution (mapping); I can't do this in the CIE94-norm vector space.

Update: Optimizations

As has been mentioned below, LLoyd's algorithm can be generalized over a set of points using only the distance metric. While Lloyd's algorithm inherently generates a voronoi diagram, in my question I've failed to discern between the algorithm itself, and algorithms for computing voronoi diagrams in order to optimize the speed of Lloyd's algorithm.

These are some algorithms I have considered:

Voronoi + Centroid: Calculate the voronoi diagram via Delauny triangulation, and the centroid by integrating the points forming the boundaries of the voronoi cells. Complexity is $O(M*log(M)) + O(M) == O(nlogn)$, where M represents the number of voronoi cells, or reproduction points in Lloyd's algorithm. This method works in any infinite space (for example the bounded $R^3$ sRGB space), so I can extend my algorithm to handle an arbitrary number of unique colours, rather than simply 2**24. This would be useful if I ever have to process floating-point images. Unfortunately, I don't know how to extend this to non-Euclidean spaces.

Floodfill algorithm: Calculate the voronoi diagram by flooding the voxel grid (N voxels, one per unique colour, in my case 2**24) outwards from each point M (the number of voronoi cells, or reproduction points in Lloyd's algorithm. Complexity is $O(n)$. however this method is only applicable to a set of points and not generic spaces. This can't be extended to non-Euclidean spaces (without varying the size of a voxel).

Therefore my question is more appropriately phrased as: How do I efficiently run Lloyd's algorithm in $R^3$ with on either a set of points or a bounded space, when the distance metric is not Euclidean?

Are you (basically) asking how to run the Lloyd algorithm if the "distance" is not the usual Euclidean distance?
Yes, either that or remap points in a Euclidean space using the CIE94 distance metric. I also present my end goal, if you think Lloyd's algorithm is not the best approach feel free to comment.

Lord Soth · Accepted Answer · 2013-04-01 23:12:47Z

You do not need remapping or any such step provided that you know your distance (distortion) function and have a way to generate samples of your original distribution. Lloyd algorithm is a pretty powerful algorithm that can be applied (in practice) to any distance function that one can imagine. For example, consider an arbitrary distance function $d(x,\hat{x})$, where $x$ represents your original sample, and $\hat{x}$ is its reproduction. $x$ can be a scalar, vector, matrix, whatever you want (your color coordinates), and in fact $x$ and $\hat{x}$ may even lie in different spaces. All of these are irrelevant for the discussion below, which is general.

Now, suppose your original samples have a PDF $f(x)$. In practice, you generate a sufficiently large amount of training vectors, say $x_1,\ldots,x_N$, where $N$ is large. At this stage, you also decide how precise you want to be in the process of reconstructing the $x$s, and suppose you will allow $M$ reproduction points $\hat{x}_1,\ldots,\hat{x}_M$. Then, the Lloyd algorithm will work as follows:

1) Assign each $x_i$ to its corresponding reproduction point that will minimize the distance, and call this assignment, say $\alpha_i\in\{1,\ldots,M\}$. We have $\alpha_i = \arg\min_{j\in\{1,\ldots,M\}} d(x_i,\hat{x}_j)$.

2) Recalculate the reproduction points $\hat{x}_i = \arg\min_{z} \sum_{j:\alpha_j = i}d(x_j, z)$, and loop.

Step 1 is trivial and will not pose any difficulty. For step 2, if we were working with squared-error distance, say $d(x,\hat{x}) = \|x-\hat{x}\|^2$, then we would have $\hat{x}_i = \frac{\sum_{j:\alpha_j = i} x_j}{|\{j:\alpha_j = i\}|}$. For a general $d$, you will not have such a neat closed-form solution to the $\arg\min$ problem in the second step, and therefore you will have to resort to some optimization tools which will give you the optimal $z$, but this is also a minor headache in my opinion.

I like this answer because it generalizes LLoyd's algorithm using only the distance function over a set of N vectors (more useful than the overview at wikipedia) and I learned something new. However the complexity of this solution is $O(N*M) == O(n^2)$ for step 1 and $O((N^2 - M*N)/m) == O(n^2)$. Computing the voronoi diagram and calculating the centroid for each cell is simply an approach to optimizing LLoyd's algorithm, which runs in $O(M*log(M) + M) == O(nlogn)$. I'm asking this question to apply the voronoi optimizations when the distance metric is non-Euclidean.
My point is, I would select this answer if it provided some optimization techniques. I'll update my question shortly.
I also have some questions regarding the terminology. Does $j: alpha_j = i$ mean "j such that j has been assigned to the reproduction point i". Therefore I interpret $\hat{x}_i = \arg\min_{z} \sum_{j:\alpha_j = i}d(x_j, z)$ as "Summation over every j of the distance between z and j, such that z minimizes this summation".
Also I would appreciate if you could illustrate how to derive $\frac{\sum_{j:\alpha_j = i} x_j}{|\{j:\alpha_j = i\}|}$ from $\|x-\hat{x}\|^2$.
@user19087 Well, your updated question is very general, and I do not think a very general answer is known. But, one thing that comes to my mind is what is called "high resolution approximations" in the source coding literature. When you have a very large number of reproduction points, your Voronoi cells will be very small, in which case you may approximate your distance function via its Taylor expansion around the centroid. Distance functions that admit such representation are called "locally quadratic." For all these details I suggest you read "Quantization" by Gray and Neuhoff.

asked	1 month ago
viewed	73 times
active	1 month ago

Lloyd's algorithm in normed vector spaces

The space:

The distance metric:

The sub-problems (so far):

Alternatively:

Overview:

Update: Optimizations

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged computer-science computational-geometry data-analysis or ask your own question.

Community Bulletin

Lloyd's algorithm in normed vector spaces

The space:

The distance metric:

The sub-problems (so far):

Alternatively:

Overview:

Update: Optimizations

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged computer-science computational-geometry data-analysis or ask your own question.

Community Bulletin

Related