我有一组中等大小的数据正在尝试可视化,nrow(df)=7810
。为了减少过度绘图,我使用了alpha=.3
。这大大减慢了R生成图形所需的时间。这是我的规格,
OS Name Microsoft Windows 10 Home
Version 10.0.18362 Build 18362
Processor Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz, 3401 Mhz, 4 Core(s), 8 Logical Processor(s)
Installed Physical Memory (RAM) 32.0 GB
System Type x64-based PC
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
ggplot2 version 3.2.1
这是正在发生的事情的示例,
> p <-ggplot(df, aes(x=x))
> t1<-function(){p + geom_point(aes(y=y), shape=4, size=.5)}
> t2<-function(){p + geom_point(aes(y=y), shape=4, size=.5, alpha=.3)}
> system.time(print(t1()))
user system elapsed
0.14 0.37 0.53
> system.time(print(t2()))
user system elapsed
0.25 29.69 30.04
有人知道导致此脚本运行缓慢的原因吗?
仅alpha值与减速无关。 alpha值与形状相结合似乎会减慢速度。
与shape = 4
渲染的“ x”之类的复杂矢量形状在与alpha值一起使用时似乎大大减慢了渲染时间。如果您不承诺shape = 4
,则使用shape = 16
之类的东西可以在使用所需的alpha值的同时加快速度。以下示例:
library(dplyr)
library(ggplot2)
df <- tibble(x = rnorm(n = 7810),
y = rnorm(n = 7810))
p1 <- function() {
p <- ggplot(df) +
geom_point(aes(x, y), shape=4, size=.5)
print(p)
}
p2 <- function() {
p <- ggplot(df) +
geom_point(aes(x, y), shape=4, size=.5, alpha = 0.3)
print(p)
}
p3 <- function() {
p <- ggplot(df) +
geom_point(aes(x, y), shape=16, size=.5, alpha = 0.3)
print(p)
}
p4 <- function() {
p <- ggplot(df) +
geom_point(aes(x, y), shape=22, size=.5, alpha = 0.3)
print(p)
}
test <- microbenchmark::microbenchmark(no_alpha = p1(),
alpha = p2(),
alpha_circle = p3(),
alpha_square = p4(),
times = 10)
print(test)
Unit: milliseconds
expr min lq mean median uq max neval
no_alpha 837.5163 851.7994 1025.0569 910.3687 1173.8753 1403.087 10
alpha 41456.3393 41708.0781 45831.6033 42589.4998 45219.8180 59578.347 10
alpha_circle 429.7718 536.9076 719.5507 549.7952 555.9002 1780.282 10
alpha_square 800.1380 806.5523 882.0163 815.6232 842.4669 1450.395 10
编辑:
我们可以使用microbenchmark
和purrr
来查看哪些形状导致最快的绘图时间。
library(purrr)
library(microbenchmark)
df <- tibble(x = rnorm(n = 7810),
y = rnorm(n = 7810))
s <- tibble(shape = c(0:24))
plot_fun <- function(shape) {
p <- ggplot(df) +
geom_point(aes(x, y),
shape = shape,
alpha = 0.3)
print(p)
}
test_fun <- function(shape) {
microbenchmark(plot_fun(shape = shape),
times = 10)
}
results <- map(s$shape, ~test_fun(shape = .x))
s <- s %>%
mutate(test = map(.$shape,
~test_fun(shape = .x)))
s %>%
tidyr::unnest(test) %>%
mutate(time = microbenchmark:::convert_to_unit(time, "ms")) %>%
ggplot() +
geom_boxplot(aes(x = shape, y = time, group = shape), outlier.shape = NA) +
scale_x_continuous(breaks = c(0:24)) +
scale_y_log10() +
coord_flip()
似乎形状值0、1和15到22提供了比其余值更快的渲染时间。