使用条件处理矩阵并合并结果

问题描述 投票:0回答:1

我有一个8x8矩阵,其中各城市及其彼此之间的距离如下:

+--------------+------+--------+------+--------------+---------+------+------+----------+
|              | NYC  | BOSTON |  DC  | PHILADELPHIA | CHICAGO |  SF  |  LA  | SAN JOSE |
+--------------+------+--------+------+--------------+---------+------+------+----------+
| NYC          |    0 |    200 |  300 |          500 |     600 | 1500 | 1800 |     2000 |
| BOSTON       |  200 |      0 |  300 |          200 |     700 | 1600 | 1900 |     2100 |
| DC           |  300 |    300 |    0 |          250 |     550 | 1400 | 1850 |     2200 |
| PHILADELPHIA |  500 |    200 |  250 |            0 |     650 | 1300 | 1700 |     1900 |
| CHICAGO      |  600 |    700 |  550 |          650 |       0 | 1250 | 1600 |     1500 |
| SF           | 1500 |   1600 | 1400 |         1300 |    1250 |    0 |  300 |      400 |
| LA           | 1800 |   1900 | 1850 |         1700 |    1600 |  300 |    0 |      250 |
| SAN JOSE     | 2000 |   2100 | 2200 |         1900 |    1500 |  400 |  250 |        0 |
+--------------+------+--------+------+--------------+---------+------+------+----------+

我正在尝试过滤距离大于500的组合,然后将结果连接如下:

+--------------+---------------------------+---------------+
|     FROM     |            TO             |   DISTANCE    |
+--------------+---------------------------+---------------+
| NYC          | BOSTON, DC, PHILADELPHIA  | 200, 300, 500 |
| BOSTON       | NYC,DC, PHILADELPHIA      | 200, 300, 200 |
| DC           | NYC, BOSTON, PHILADELPHIA |  300,300, 250 |
| PHILADELPHIA | NYC,BOSTON, DC            | 500, 200, 250 |
| CHICAGO      |                           |               |
| SF           | LA, SAN JOSE              |      300, 400 |
| LA           | SF, SAN JOSE              |      300, 250 |
| SAN JOSE     | SF, LA                    |      400, 250 |
+--------------+---------------------------+---------------+

我在这里找到了一个类似的例子:

https://stackoverflow.com/questions/20210787/r-getting-the-minimum-value-for-each-row-in-a-matrix-and-returning-the-row-and/20214579#20214579

而且我知道我可以使用聚合函数进行连接

我想出了一个可用的解决方案,但我想知道是否有一种简单的方法来实现这一目标

下面是我的解决方法:

result <- t(sapply(seq(nrow(X)), function(i) {
  j <- which.min(X[i,])
  c(paste(rownames(X)[i], colnames(X)[j], sep='/////'), X[i,j])
}))

a<-data.frame(do.call('rbind', strsplit(as.character(result$col1),'/////',fixed=TRUE)), result$col2)
r matrix dplyr plyr
1个回答
0
投票

使用dplyr,我们可以获取长距离格式的数据,选择距离小于500的行,并汇总每个城市的值。

library(dplyr)

df %>%
  rownames_to_column('from') %>%
  tidyr::pivot_longer(cols = -from) %>%
  filter(value <= 500 & from != name) %>%
  group_by(from) %>%
  summarise(to = toString(name), 
            distance = toString(value))

# A tibble: 7 x 3
#  from         to                        distance     
#  <chr>        <chr>                     <chr>        
#1 BOSTON       NYC, DC, PHILADELPHIA     200, 300, 200
#2 DC           NYC, BOSTON, PHILADELPHIA 300, 300, 250
#3 LA           SF, SANJOSE               300, 250     
#4 NYC          BOSTON, DC, PHILADELPHIA  200, 300, 500
#5 PHILADELPHIA NYC, BOSTON, DC           500, 200, 250
#6 SANJOSE      SF, LA                    400, 250     
#7 SF           LA, SANJOSE               300, 400     

数据

df <- structure(list(NYC = c(0L, 200L, 300L, 500L, 600L, 1500L, 1800L, 
2000L), BOSTON = c(200L, 0L, 300L, 200L, 700L, 1600L, 1900L, 
2100L), DC = c(300L, 300L, 0L, 250L, 550L, 1400L, 1850L, 2200L
), PHILADELPHIA = c(500L, 200L, 250L, 0L, 650L, 1300L, 1700L, 
1900L), CHICAGO = c(600L, 700L, 550L, 650L, 0L, 1250L, 1600L, 
1500L), SF = c(1500L, 1600L, 1400L, 1300L, 1250L, 0L, 300L, 400L
), LA = c(1800L, 1900L, 1850L, 1700L, 1600L, 300L, 0L, 250L), 
SANJOSE = c(2000L, 2100L, 2200L, 1900L, 1500L, 400L, 250L, 
0L)), row.names = c("NYC", "BOSTON", "DC", "PHILADELPHIA", 
"CHICAGO", "SF", "LA", "SANJOSE"), class = "data.frame")
© www.soinside.com 2019 - 2024. All rights reserved.