如何在 R 中比较两个列表的元素(就整体相似性和顺序而言)?

问题描述 投票:0回答:1

我有一个“实际”前 25 位作者的列表,我想将其与预测的前 25 位作者列表进行比较。我想比较与“实际”作者相同的预测作者的比例以及列表的顺序。最好的方法是什么?

以下示例数据:

actual_authors <- list("Tom", "Dick", "Harry", "Edward", "Fred")

predicted_authors <- list("Ian", "Liam", "Harry", "Toby", "Tom") 

总体而言,预测列表包含实际列表的 40%,但是我还想查看列表的顺序 - Harry 和 Tom 都在预测列表中,但他们的位置与实际列表中的位置不同。如果可能的话,有%相似度就好了。

最好的方法是什么?

r list compare rank
1个回答
0
投票

您正在寻找这样的东西吗?请参阅内嵌注释,了解代码功能的分步说明。

这是一个非常基本的解决方案,只有在两个列表长度相等时才有效。

library(dplyr)
library(magrittr)
library(stringr)
library(tidyr)

#Your data.
actual_authors <- list("Tom", "Dick", "Harry", "Edward", "Fred")
predicted_authors <- list("Ian", "Liam", "Harry", "Toby", "Tom") 

#Put everything into a data.frame.
df <- data.frame(act = unlist(actual_authors), 
                 pre = unlist(predicted_authors), 
                 stringsAsFactors = FALSE)

#Store positions of the names as an own column.
df %<>% mutate(ord = row_number())

#Pivot the data longer to get the source of the name into a column ("cat")
#and the name itself into another ("val")
df %<>%
  pivot_longer(cols = -ord, names_to = "cat", values_to = "val")

#Group the data by the names
df %<>%  group_by(val) %>%
  mutate(id = ifelse(n() == 2, 1, 0), #Set a score of 1 if the names appear in both lists (i.e., both "cats")
         pos = ifelse((n_distinct(ord) == 1) & n() > 1, 1, 0) #Set a score of 1 if the names appear in the same positions (i.e., same "ord")
         )

#Calculate the similarity score as the sum of the "id" and "pos" scores calculated above
#divided by the maximum possible score for the data at hand (all "id" and "pos" have a value of 1).
simscore <- ((sum(df$id) + sum(df$pos)) / (2*nrow(df)))
simscore
#0.3
#Scale by 100 to get a percentage.
simscore <- simscore*100
simscore
#30
© www.soinside.com 2019 - 2024. All rights reserved.