我在绘制小提琴图和箱线图时遇到问题,这将显示我的数据样本的描述性统计数据。 我有一个包含 3 列变量的数据:channel、Year、durationMins,有 20 个观察值。
这是我的代码:
我想在 ShinyApp 中构建带有箱线图的小提琴图。
#Simulate Data for Reproducible Code
# Set the number of observations
{n <- 512
# Define channel names
channels <- c("Channel_A", "Channel_B", "Channel_C", "Channel_D")
# Define months and days of the week
months <- c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
# Create a data frame to store the data
df1 <- data.frame()
# Generate data for each channel
for (channel in channels) {
# Generate data for each year
for (year in 2017:2024) {
# Generate data for each month
for (month in months) {
# Sample durations for each day of the month
for (day in 1:30) { # Assuming 30 days per month
# Sample duration for the specific channel, year, month, and day
durations <- round(runif(n = 1, min = 4.59, max = 30.7), 1)
# Sample a random day of the week
day_of_week <- sample(days, 1)
# Append the data to the data frame
df1 <- rbind(df1, data.frame(channel = channel, Year = year, Month = month, Day = day, publishedDayName = day_of_week, durationMins = durations))
}
}
}
}
# Randomly shuffle the rows of the data frame
df1 <- df1[sample(nrow(df1)), ]}
# View the first few rows of the data frame
head(df1)
#ShinyApp
ui<-
fluidPage(
tags$script(HTML('$(document).ready(function() {
// Get the position of the sidebar
var sidebarPosition = $(".sidebar").offset().top;
// Function to fix or unfix the sidebar based on scrolling
function fixSidebar() {
var scrollTop = $(window).scrollTop();
if (scrollTop > sidebarPosition) {
$(".sidebar").addClass("fixed-sidebar");
} else {
$(".sidebar").removeClass("fixed-sidebar");
}
}
// Attach the function to the scroll event
$(window).scroll(fixSidebar);
// Call the function once to set the initial state
fixSidebar();
});
')),
titlePanel("Youtube Data science Channels Videon Duration Analytics"),
mainPanel(plotOutput('dur_Plot'))
)
server <- function(input, output) {
library(dplyr)
library(ggplot2)
library(lubridate)
#Duration plot
duration_pipeline_f <- reactive({
df1 %>%
group_by(channel, Year) %>%
summarize(durationMins = mean(durationMins))%>%
arrange(Year, .by_group = T)
})
output$dur_Plot <-renderPlot({
ggplot(duration_pipeline_f(), aes(x = Year, y = durationMins, fill = channel)) +
geom_violin(position = 'dodge', trim = FALSE, linewidth = .5, alpha = .9) +
geom_boxplot(width = 0.7, position = position_dodge(width = 0.9), fill = 'white', color = 'black') +
labs(title = 'Video duration distribution', y = 'Duration in mins') +
scale_fill_brewer(palette = 'Spectral') +
theme(text = element_text(size = 12))
})
}
shinyApp(ui, server)
在服务器部分的这段代码中,我过滤了数据,这就是为什么我的观察结果比原始样本少。也许这个模拟会帮助您理解我的问题。
实际上你的问题与闪亮无关,通过关注
ggplot2
代码会更容易解决。
问题只是分组。首先,当您在
channel
上映射时,您的小提琴图会按 fill
隐式分组,而对于箱线图,您可以通过设置 fill="white"
覆盖分组。因此,对于箱线图,您必须明确映射到 group
aes。此外,由于您的 Year
变量是一个数字,您会得到显示所有年份分布的小提琴(和/或箱线图)。当您将 Year
映射到 x
时,您可能需要每年的小提琴/箱线图。在这种情况下,您也必须将 Year
转换为 factor
或按 Year
显式分组,如下所示。
注意:我使用非聚合数据集,恕我直言,如果您想显示分布,聚合数据集是没有意义的。
library(ggplot2)
library(shiny)
server <- function(input, output) {
duration_pipeline_f <- reactive({
df1
})
output$dur_Plot <- renderPlot({
ggplot(duration_pipeline_f(), aes(
x = Year,
y = durationMins,
fill = channel,
group = interaction(Year, channel)
)) +
geom_violin(
position = position_dodge(width = 0.9),
trim = FALSE,
linewidth = .5, alpha = .9
) +
geom_boxplot(
aes(group = interaction(Year, channel)),
width = 0.7,
position = position_dodge(width = 0.9),
fill = "white", color = "black"
) +
labs(title = "Video duration distribution", y = "Duration in mins") +
scale_fill_brewer(palette = "Spectral") +
theme(text = element_text(size = 12))
})
}
shinyApp(ui, server)
#>
#> Listening on http://127.0.0.1:6007