我在一个数据框中有一个各种人的简历的数据集。每行都是一个新人员条目,并且有多列(学校,所担任的职位,出生城市等)。我想为这些人建立一个邻接矩阵,所以我正在寻找一种将列变量“扁平化”为“是/否”的方法。
例如,数据框的片段看起来像这样:
Name: City_of_birth: Job Title:
Person1 'New York', 'Librarian'
Person2 'Shanghai', 'Secretary'
Person3 'Tokyo', 'Engineer'
Person4 'Lagos', 'CEO'
Person5 'Atlanta' 'Mayor'
我想对数据框进行转换,以使新的列标题为“ New York”,“ Shanghai”,“ Tokyo” ...以及与每行(人)相关的是/否值。
Name: New York?: Shanghai?: ... Librarian?:
Person1 Yes No Yes
Person2 No No No
Person3 No No No
Person4 ...
Person5
我对R很陌生,因此我愿意使用任何工具来执行此操作。在此先多谢!
这里是使用dplyr
和tidyr
的选项
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = c(City_of_birth, JobTitle),
values_from = c(City_of_birth, JobTitle)) %>%
mutate_at(vars(-contains("Name")), ~if_else(is.na(.), "No", "Yes"))
df <- structure(list(Name = structure(1:5, .Label = c("Person1", "Person2",
"Person3", "Person4", "Person5"), class = "factor"), City_of_birth = structure(c(3L,
4L, 5L, 2L, 1L), .Label = c("Atlanta", "Lagos", "New York", "Shanghai",
"Tokyo"), class = "factor"), JobTitle = structure(c(3L, 5L, 2L,
1L, 4L), .Label = c("CEO", "Engineer", "Librarian", "Mayor",
"Secretary"), class = "factor")), class = "data.frame", row.names = c(NA,
-5L))