我是 R 编程的新手,刚刚在 Rstudio 上完成了我的第一个数据分析项目。这是我关于 Stack Overflow 的第一个问题,我不确定这些细节是否足够。如果有更多经验的人指导,我很乐意进行编辑。我是否应该更改数据框和/或变量的名称以使其更容易?
问题:两周前,我使用以下代码创建了一个带有 % 标签的图,反映了会员与休闲。附上图片来说明这一点。
显示会员与休闲百分比值的条形图:
惊讶地看到现在使用相同代码创建的图,其中 % 值是月份而不是会员/临时。
按月显示百分比值的条形图:
Divvy22_clean %>%
group_by(start_mth, member_casual) %>%
dplyr::summarise(count = n()) %>%
mutate(percent = count/sum(count)*100) %>%
ggplot(aes(start_mth, count, fill = member_casual)) +
geom_col(position = "stack") +
geom_text(aes(label = paste0(round(percent), "%")),
position = position_stack(vjust = 0.5), size = 3, color = "white") +
scale_y_continuous(limits = c(0, 900000), expand = c(0, 0)) +
labs(x = "Months, 2022", y = "Total Rentals",
title = "Ride Distribution by Month and Membership Type")+
facet_wrap(~member_casual)
如果你需要模拟这个问题,我希望下面的代码能帮助你:
set.seed(123)
start_mth <- rep(c("Jan", "Feb", "Mar"), each = 2)
member_casual <- rep(c("member", "casual"), times = 3)
count <- round(runif(6, 0, 900000))
Divvy_example <- data.frame(start_mth, member_casual, count)
# Calculate percentages
Divvy_example <- Divvy_example %>%
group_by(start_mth, member_casual) %>%
mutate(percent = count/sum(count) * 100)
# Plot
ggplot(Divvy_example, aes(start_mth, count, fill = member_casual)) +
geom_col(position = "stack") +
geom_text(aes(label = paste0(round(percent), "%")),
position = position_stack(vjust = 0.5), size = 3, color = "white") +
scale_y_continuous(limits = c(0, 900000), expand = c(0, 0)) +
labs(x = "Months, 2022", y = "Total Rentals",
title = "Ride Distribution by Month and Membership Type") +
facet_wrap(~member_casual)
还有,我前两天已经把R更新到最新版本了。这是我所做的唯一改变。此外,另一个可能相关的细节是,当此代码在 RStudio 中创建第一个具有成员/临时值的图时,它正在 Kaggle 上创建具有按月百分比值的图,就像它现在在更新的 R Studio 上所做的一样.
我正在尝试了解有关 R 的更多信息。了解导致这种不一致的原因以及如何按成员/临时成员获取具有 % 值的图表将非常有帮助。感谢您的帮助!
我试图理解这种不一致背后的原因,因为这段代码在过去一个月里已经运行了好几次。语法是否受 R 更新版本的影响,如果发生这种情况,有什么方法可以避免?还是有其他原因导致我遇到这种不一致?
这是一些显示问题的模拟数据(问题出在创建
summarise
的count
部分,所以在计算之前从一些新数据开始)。
在复制第二个计算错误的图表的顶部图表中,
.groups = "drop"
参数然后具有未分组的输出。带有.groups = "drop_last"
的第二张图删除了member_casual
分组,然后通过start_mth
计算百分比,我认为这是你打算做的?这些默认值可能在你的两个图表的运行之间发生了变化,所以如果你特别声明.groups = "drop_last"
那么这将保持一致。
library(tidyverse)
df <-
tibble(
start_mth = sample(
month.abb,
size = 1000,
replace = TRUE,
prob = c(1:6, 6:1)
),
member_casual = sample(
c("member", "casual"),
1000,
replace = TRUE
)
) |>
mutate(start_mth = factor(start_mth, levels = month.abb))
df |>
group_by(start_mth, member_casual) %>%
summarise(count = n(), .groups = "drop") %>%
mutate(percent = count/sum(count)*100) %>%
ggplot(aes(start_mth, count, fill = member_casual)) +
geom_col(position = "stack") +
geom_text(aes(label = paste0(round(percent), "%")),
position = position_stack(vjust = 0.5), size = 3, color = "white") +
scale_y_continuous(expand = expansion(add = c(0, 10))) +
labs(x = "Months, 2022", y = "Total Rentals",
title = "Ride Distribution by Month and Membership Type")+
facet_wrap(~member_casual)
df |>
group_by(start_mth, member_casual) %>%
summarise(count = n(), .groups = "drop_last") %>%
mutate(percent = count/sum(count)*100) %>%
ggplot(aes(start_mth, count, fill = member_casual)) +
geom_col(position = "stack") +
geom_text(aes(label = paste0(round(percent), "%")),
position = position_stack(vjust = 0.5), size = 3, color = "white") +
scale_y_continuous(expand = expansion(add = c(0, 10))) +
labs(x = "Months, 2022", y = "Total Rentals",
title = "Ride Distribution by Month and Membership Type")+
facet_wrap(~member_casual)