我经常分析的“组织树”格式的数据,以了解在组织内一个给定的领导下活动的频率。我需要从两列数据产生广泛层次:员工姓名和主管的名字。
----------
df <- data.frame("Employee"=c("Bill","James","Amy","Jen","Henry"),
"Supervisor"=c("Jen","Jen","Steve","Amy","Amy"))
df
# Employee Supervisor
# 1 Bill Jen
# 2 James Jen
# 3 Amy Steve
# 4 Jen Amy
# 5 Henry Amy
与指定的组织结构图,首先是CEO(或最上面的雇员)宽的数据帧结束:
# Employee H1 H2 H3
# 1 Bill Steve Amy Jen
# 2 James Steve Amy Jen
# 3 Amy Steve NA NA
# 4 Jen Steve Amy NA
# 5 Henry Steve Amy NA
大量的研究后,data.tree
包似乎提供最大限度的协助。我怎么能执行此操作?
尝试这个:
library(data.table)
setDT(df)
setnames(df, 'Supervisor', 'Supervisor.1')
j=1
while (df[, any(get(paste0('Supervisor.',j)) %in% Employee)]) {
df[df, on=paste0('Supervisor.',j,'==Employee'),
paste0('Supervisor.',j+1):= i.Supervisor.1]
j = j + 1
}
> df
# Employee Supervisor.1 Supervisor.2 Supervisor.3
# 1: Bill Jen Amy Steve
# 2: James Jen Amy Steve
# 3: Amy Steve NA NA
# 4: Jen Amy Steve NA
# 5: Henry Amy Steve NA
要行内重新排序:
df = cbind(df[, 1], t(apply(df[, -1], 1, function(r) c(rev(r[!is.na(r)]), r[is.na(r)]))))
> df
# Employee V1 V2 V3
# 1: Bill Steve Amy Jen
# 2: James Steve Amy Jen
# 3: Amy Steve NA NA
# 4: Jen Steve Amy NA
# 5: Henry Steve Amy NA
如果你没有在输出坚持,但希望与层次的工作,然后data.tree是一个很好的选择。这里有些例子:
libary(data.tree)
df <- data.frame("Employee"=c("Bill","James","Amy","Jen","Henry"),
"Supervisor"=c("Jen","Jen","Steve","Amy","Amy"))
dt <- FromDataFrameNetwork(df)
#here's your org chart:
print(dt)
让我们找到Jennas下属,连同他们的层次结构中的级别:
Get(FindNode(dt, 'Jen')$leaves, 'level')
这将返回如下所示:
Bill James
4 4
只是为了好玩,让我们增加人员预算:
dt$Set(salary = c(100000, 80000, 60000, 40000, 35000, 70000))
打印工资和薪水累计
print(dt, 'salary', sal_subordinates = function(node) Aggregate(node, 'salary', sum))
这将打印这样的:
levelName salary sal_subordinates
1 Steve 100000 80000
2 °--Amy 80000 130000
3 ¦--Jen 60000 75000
4 ¦ ¦--Bill 40000 40000
5 ¦ °--James 35000 35000
6 °--Henry 70000 70000
vignettes有分层数据和汇总工作的许多例子中data.tree。