这是我第一次在这个伟大的社区提问。我正在尝试计算data.frame上的索引,通过自治市镇或邻居和情节显示它们。哪种代码最适合?
这是我拥有的数据集的示例。 albo,aegy =蚊子种类,房子=房屋预计,房屋指数计算的是(正房屋数量/预计房屋数量)* 100。积极的房子是至少找到一只蚊子的房子(值!= 0)HI =(7/11)* 100 =总共63.63(11 =预计的房屋数量,7 =总房屋数量)
borough neighborhood concession albo aegyp Total_albo_aegyp
a1 mendong 1 1 5 6
a1 mendong 2 5 2 7
a1 mendong 3 2 1 3
a1 tam tam 4 0 0 0
a2 tam tam 5 4 6 10
a2 obili 6 0 1 1
a2 obili 7 0 0 0
a3 acacia 8 3 7 10
a4 melen 9 1 1 2
a4 melen 10 0 5 5
a4 polytech 11 8 0 10
HIcommune <- concessiondata %>%
group_by(commune) %>%
summarise(
Mean = mean(concessiondata$total_aedes_albo_aegypti!=0),
HIY = sum(concessiondata1$total_aedes_albo_aegypti!=0)/length(concessiondata1$total_aedes_albo_aegypti))
Houseindex_total <- concessiondata1[, Mean := mean(total_aedes_albo_aegypti!=0), by = "commune"]
## This is how the results should look like
borough albo HI aegy HI Total_albo_aegyp_HI
a1 75 75 75
a2 33.33 66.66 66.66
a3 100 100 100
a4 66.66 66.66 100
首先,您的代码存在一些常见的编码/语法问题。
dplyr
和data.table
语法。$
动词中使用dplyr
-index列。我建议您熟悉许多免费提供的tidyverse
教程之一,以学习使用dplyr
/ tidyr
重塑/操作数据的基础知识。
除此之外,以下内容将再现您的预期输出
calc_index <- function(x) sum(x != 0) / length(x) * 100
library(dplyr)
df %>%
group_by(borough) %>%
summarise(
albo_HI = calc_index(albo),
aegyp_HI = calc_index(aegyp),
Total_albo_aegyp = calc_index(Total_albo_aegyp))
## A tibble: 4 x 4
# borough albo_HI aegyp_HI Total_albo_aegyp
# <fct> <dbl> <dbl> <dbl>
#1 a1 75 75 75
#2 a2 33.3 66.7 66.7
#3 a3 100 100 100
#4 a4 66.7 66.7 100
或者你可以使用summarise_all
df %>%
group_by(borough) %>%
select(-neighborhood, -concession) %>%
summarise_all(~calc_index(.x))
## A tibble: 4 x 4
# borough albo aegyp Total_albo_aegyp
# <fct> <dbl> <dbl> <dbl>
#1 a1 75 75 75
#2 a2 33.3 66.7 66.7
#3 a3 100 100 100
#4 a4 66.7 66.7 100