====== k-means clustering ====== * [작성중] * 이걸 써서 푼 문제는 [[ps:boj:problems:17978]]이 유일한데, 이것도 사실 정해는 아니다. * k-means clustering으로 나눠진 decision boundary는 linear하다 * 보로노이 다이어그램이 만들어진다 * 알고리즘은 다양하다 * [[https://towardsdatascience.com/three-versions-of-k-means-cf939b65f4ea]] * 가장 기본적인 것은 Lloyd's algorithm이다 * 수렴 속도를 빠르게 하기 위해서 다음의 방법들이 나왔다. * MacQueen’s k-Means * Hartingan-Wong k-Means ===== 구현 ===== * 3차원 좌표계, k=2 일때 def k_means(points, init_centeroids): """LLoyd's algorithm on 3d-points.""" dist = INF (x0, y0, z0), (x1, y1, z1) = init_centeroids for _ in range(MAX_ITER): prev_dist, dist = dist, 0.0 cluster0, cluster1 = [0, 0, 0], [0, 0, 0] count0 = count1 = 0 for x, y, z in points: d0 = (x0 - x) * (x0 - x) + (y0 - y) * (y0 - y) + (z0 - z) * (z0 - z) d1 = (x1 - x) * (x1 - x) + (y1 - y) * (y1 - y) + (z1 - z) * (z1 - z) if d0 < d1: dist += d0 count0 += 1 c = cluster0 else: dist += d1 count1 += 1 c = cluster1 c[0] += x c[1] += y c[2] += z if abs(dist - prev_dist) < EPS: break x0, y0, z0 = [x / count0 for x in cluster0] x1, y1, z1 = [x / count1 for x in cluster1] return dist * 사용된 문제 : [[ps:problems:boj:17978]]