如何在R中检索每个客户所属的群集号和中心点。

Question

我有一个超过20000行的数据集，其中每一行都是一个独特的客户。我做了k-mean聚类，输出结果是这样的。

str(km.out.best)

List of 9
 $ cluster     : Named int [1:24] 2 1 1 3 4 2 6 4 5 2 ...
  ..- attr(*, "names")= chr [1:24] "nr_pxx_sxx" "sxxxxxxxx
 $ centers     : num [1:10, 1:20000] -0.1806 -0.3596 -0.7953 0.0781 -0.5887 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:10] "1" "2" "3" "4" ...
  .. ..$ : NULL
 $ totss       : num 618756
 $ withinss    : num [1:10] 1294 68340 0 4363 2530 ...
 $ tot.withinss: num 184130
 $ betweenss   : num 434625
 $ size        : int [1:10] 2 4 1 3 2 2 2 2 2 4
 $ iter        : int 3
 $ ifault      : int 0
 - attr(*, "class")= chr "kmeans"

我想知道如何才能在中心值旁边得到一个聚类数。因此，类似于

#示例输出

cust_id    centers  cluster_number 
 1         -0.1806      1
 2         -0.3596      1
 3        -0.7953       2
 4         0.0781       ..
 5        -0.5887       3

衷心感谢

Answer 1

假设你的数据是这样的。

dat = matrix(runif(20000*24),nrow=20000)
dim(dat)
dim(dat)
[1] 20000    24

你不进行转置。然后你运行kmeans，很可能你需要把算法改成MacQueen或Lloyd，并提高数据的最大迭代。

km.out.best = kmeans(dat,10,algorithm="MacQueen",iter.max=200)
result = data.frame(id=1:nrow(dat),cluster=km.out.best$cluster)
head(result)

  id cluster
1  1       5
2  2      10
3  3       7
4  4       3
5  5       7
6  6       6

你的中心是这样的

head(km.out.best$centers)
       [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
1 0.3775496 0.2755110 0.5222402 0.5884940 0.4679775 0.6600569 0.4986263
2 0.7126183 0.2803162 0.3942072 0.6419705 0.5341550 0.5711218 0.5053729
3 0.6413244 0.6578503 0.5333248 0.4661831 0.5552559 0.5561365 0.4451808
4 0.3234074 0.6514881 0.4079006 0.6715400 0.4791075 0.4223853 0.6221334
5 0.6473756 0.6532055 0.6182789 0.5097219 0.5376246 0.5365016 0.4391964
6 0.6970183 0.4965848 0.5065735 0.3036086 0.4303340 0.3970691 0.5170568
       [,8]      [,9]     [,10]     [,11]     [,12]     [,13]     [,14]
1 0.4594594 0.4345581 0.5701588 0.5906317 0.4385964 0.5218407 0.5516426
2 0.4628033 0.4235150 0.3608926 0.5285110 0.5168564 0.4346563 0.4062454
3 0.5265977 0.5334992 0.5376332 0.4512221 0.4647484 0.4902010 0.4676214
4 0.5939197 0.4694504 0.3937454 0.3384044 0.5686476 0.6172650 0.5186179
5 0.4654073 0.6234457 0.4909938 0.5596412 0.4936359 0.4770979 0.6025122
6 0.5156159 0.4322397 0.5056121 0.5290063 0.5568705 0.4741198 0.5276150
      [,15]     [,16]     [,17]     [,18]     [,19]     [,20]     [,21]
1 0.5504851 0.2829263 0.5801165 0.4646302 0.6408827 0.4199201 0.5407101
2 0.5626282 0.6359599 0.5034993 0.4243469 0.3807163 0.5950345 0.4706131
3 0.3517145 0.2888798 0.6448517 0.3631902 0.5299283 0.4487787 0.4675805
4 0.4331985 0.4305047 0.4862307 0.4381856 0.3399696 0.4781299 0.5236181
5 0.6830292 0.6005151 0.5231041 0.5242238 0.4303912 0.3199860 0.3725459
6 0.2797726 0.4564681 0.5102230 0.6247973 0.4563937 0.6386731 0.5464769
      [,22]     [,23]     [,24]
1 0.5655326 0.5366878 0.6097194
2 0.4910263 0.3989447 0.4676507
3 0.4119647 0.3304486 0.3322215
4 0.5843183 0.4549804 0.6379758
5 0.6010346 0.6001782 0.6310740
6 0.5110444 0.6080165 0.6967485

它的列数和你的数据一样多如果你想附加这个，并创建一个巨大的data.frame，有冗余的信息重复，这里去。

head(cbind(result,km.out.best$centers[result$cluster,]))
     id cluster         1         2         3         4         5         6
X5    1       5 0.6473756 0.6532055 0.6182789 0.5097219 0.5376246 0.5365016
X10   2      10 0.4280159 0.5213989 0.6012614 0.6827887 0.4621622 0.4026403
X7    3       7 0.3671682 0.5811399 0.4086544 0.3584764 0.4406988 0.5859552
X3    4       3 0.6413244 0.6578503 0.5333248 0.4661831 0.5552559 0.5561365
X7.1  5       7 0.3671682 0.5811399 0.4086544 0.3584764 0.4406988 0.5859552
X6    6       6 0.6970183 0.4965848 0.5065735 0.3036086 0.4303340 0.3970691
             7         8         9        10        11        12        13
X5   0.4391964 0.4654073 0.6234457 0.4909938 0.5596412 0.4936359 0.4770979
X10  0.4308780 0.5798660 0.6022418 0.5895790 0.6293778 0.4796867 0.5552222
X7   0.3682988 0.6069791 0.3902141 0.6102076 0.3622590 0.5181898 0.5504739
X3   0.4451808 0.5265977 0.5334992 0.5376332 0.4512221 0.4647484 0.4902010
X7.1 0.3682988 0.6069791 0.3902141 0.6102076 0.3622590 0.5181898 0.5504739
X6   0.5170568 0.5156159 0.4322397 0.5056121 0.5290063 0.5568705 0.4741198
            14        15        16        17        18        19        20
X5   0.6025122 0.6830292 0.6005151 0.5231041 0.5242238 0.4303912 0.3199860
X10  0.5755699 0.3837531 0.6864855 0.3524426 0.5525500 0.6080231 0.6136993
X7   0.3925091 0.6750364 0.6796406 0.5637069 0.4988824 0.5664360 0.5727071
X3   0.4676214 0.3517145 0.2888798 0.6448517 0.3631902 0.5299283 0.4487787
X7.1 0.3925091 0.6750364 0.6796406 0.5637069 0.4988824 0.5664360 0.5727071
X6   0.5276150 0.2797726 0.4564681 0.5102230 0.6247973 0.4563937 0.6386731
            21        22        23        24
X5   0.3725459 0.6010346 0.6001782 0.6310740
X10  0.5897833 0.5092839 0.4041542 0.4247683
X7   0.4674218 0.5450985 0.5607961 0.4179112
X3   0.4675805 0.4119647 0.3304486 0.3322215
X7.1 0.4674218 0.5450985 0.5607961 0.4179112
X6   0.5464769 0.5110444 0.6080165 0.6967485

如何在R中检索每个客户所属的群集号和中心点。

问题描述投票：0回答：1

1个回答

最新问题

如何在R中检索每个客户所属的群集号和中心点。

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1