我有一堆单独的代码块来在R中运行正常性测试,我希望能够将它们组合在一起,以便我可以测试特定变量而无需每次都复制代码。到目前为止,所有单独的代码块都在工作(以鸢尾花数据集为例):
library(datasets)
library(tidyverse)
library(skimr)
data(iris)
iris$Species <- NULL
# descriptive statistics and normality tests
skim(iris$Sepal.Length)
round(stat.desc(iris$Sepal.Length, basic = FALSE, norm = TRUE), digits = 3)
# histogram with normality curve
hist_sepal_length <- ggplot(iris, aes(Sepal.Length)) +
geom_histogram(aes(y = ..density..), bins = 10, colour = "black", fill = "white") +
labs(x = "Sepal.Length", y = "Density") +
stat_function(fun = dnorm, args = list(mean = mean(iris$Sepal.Length), sd = sd(iris$Sepal.Length)), colour = "black", size = 1)
hist_sepal_length
# qqplot
qqplot_sepal_length <- qplot(sample = iris$Sepal.Length)
qqplot_sepal_length
我可以使用sapply完成描述性统计的第一步
round(sapply(iris, stat.desc, basic = FALSE, norm = TRUE), digits = 3)
但是,我不确定如何在ggplot2中使用任何apply函数。我看了以下问题:
How to use lapply with ggplot2 while indexing variables
using an apply function with ggplot2 to create bar plots for more than one variable in a data.frame
Using apply functions with ggplot to plot a subset of dataframe columns
Using lapply to make boxplots of a variable list
但是,它们都不满足我的要求,因为我的ggplot还包括引用该变量的stat_function。我也想在单独的图表中输出。有没有一种方法可以编写ggplot代码,以使其一次运行所有变量(因此,隔片长度,隔片宽度,花瓣长度,花瓣宽度)?我有要在已保存到单独数据帧中运行正常性测试的变量,因此无需子集。
最后,有一种方法可以将3个步骤(正态性检验,直方图和qq图)打包为一个函数?
这里的目标是尝试在有通用变量的Sepal.Length
时替换。之后,您可以创建一个函数并为每个变量调用它。然后,很容易概括一个循环,该循环将立即返回所有结果。
library(datasets)
library(tidyverse)
library(skimr)
library(pastecs)
data(iris)
#-- Function
testVarNormality <- function(var, data) {
# descriptive statistics and normality tests
skim_res <- skim(data[,var])
desc_stats <- round(stat.desc(data[,var], basic = FALSE, norm = TRUE), digits = 3)
# histogram with normality curve
hist <- ggplot(data, aes_string(var)) +
geom_histogram(aes(y = ..density..), bins = 10, colour = "black", fill = "white") +
labs(x = var, y = "Density") +
stat_function(fun = dnorm, args = list(mean = mean(data[,var]), sd = sd(data[,var])), colour = "black", size = 1)
# qqplot
qqplot <- qplot(sample = data[,var])
list(skim_res = skim_res, desc_stats = desc_stats, histogram = hist, qqplot = qqplot)
}
#-- 1 function call
sepal_length_res <- testVarNormality("Sepal.Length", iris)
sepal_length_res$histogram
sepal_length_res$qqplot
#-- Calling for all columns (except species)
all_res <- lapply(colnames(iris)[1:4], testVarNormality, iris)
names(all_res) <- colnames(iris)[1:4]
#-- Get a result example
all_res$Sepal.Width$histogram
如何按物种分类:
irisBySpecies <- split(iris, iris$Species)
#-- Nested list
res_byGroup <- lapply(
irisBySpecies,
function(species_data) {
res4species <- lapply(colnames(species_data)[1:4], testVarNormality, species_data)
names(res4species) <- colnames(iris)[1:4]
return(res4species)
}
)
names(res_byGroup) <- names(irisBySpecies)
请注意,我必须使用匿名函数来执行此操作。可能有更优雅的方法来为原始函数编写代码,这将使每个组的应用更加容易,但是这种方法是可通用的。