如何在ggplot中重现smoothScatter的离群值图？

Question

我试图在ggplot中获得类似于smoothScatter函数的功能。除了标出N个最稀疏点之外，我已经弄清楚了一切。有人可以帮我吗？

library(grDevices)
library(ggplot2)

# Make two new devices
dev.new()
dev1 <- dev.cur()
dev.new()
dev2 <- dev.cur()

# Make some data that needs to be plotted on log scales
mydata <- data.frame(x=exp(rnorm(10000)), y=exp(rnorm(10000)))

# Plot the smoothScatter version
dev.set(dev1)
with(mydata, smoothScatter(log10(y)~log10(x)))

# Plot the ggplot version
dev.set(dev2)
ggplot(mydata) + aes(x=x, y=y) + scale_x_log10() + scale_y_log10() + 
  stat_density2d(geom="tile", aes(fill=..density..^0.25), contour=FALSE) +
  scale_fill_gradientn(colours = colorRampPalette(c("white", blues9))(256))

注意，在基本图形版本中，如何在平滑的密度图上绘制100个最“稀疏”的点。稀疏度由该点坐标处的核密度估计值确定，重要的是，核密度估计是在对数变换（或其他坐标变换）之后计算的。我可以通过添加+ geom_point(size=0.5)来绘制all点，但我只需要稀疏点。有什么方法可以通过ggplot完成此操作吗？确实有两部分。第一个是找出异常值是什么[坐标变换，第二个是仅绘制那些点。

我试图仅在ggplot中获得与smoothScatter函数相似的功能。除了标出N个最稀疏点之外，我已经弄清楚了一切。谁能帮我这个？ ...

Answer 1

首先让我们根据从KernSmooth::bkde2D计算出的密度来计算每个观测值最可能的密度值，为方便起见，我们通过grDevices:::.smoothScatterCalcDensity对其进行调用，以便在未提供binwidth的情况下做出适当的猜测。此功能对other problems as well很有用。

densVals <- function(x, y = NULL, nbin = 128, bandwidth, range.x) { dat <- cbind(x, y) # limit dat to strictly finite values sel <- is.finite(x) & is.finite(y) dat.sel <- dat[sel, ] # density map with arbitrary graining along x and y map <- grDevices:::.smoothScatterCalcDensity(dat.sel, nbin, bandwidth) map.x <- findInterval(dat.sel[, 1], map$x1) map.y <- findInterval(dat.sel[, 2], map$x2) # weighted mean of the fitted density map according to how close x and y are # to the arbitrary grain of the map den <- mapply(function(x, y) weighted.mean(x = c( map$fhat[x, y], map$fhat[x + 1, y + 1], map$fhat[x + 1, y], map$fhat[x, y + 1]), w = 1 / c( map$x1[x] + map$x2[y], map$x1[x + 1] + map$x2[y + 1], map$x1[x + 1] + map$x2[y], map$x1[x] + map$x2[y + 1])), map.x, map.y) # replace missing density estimates with NaN res <- rep(NaN, length(sel)) res[sel] <- den res }

我将加权平均值用作“真实”密度值的（线性）近似值。也许，简单的查找也可以。
这里是实际计算。
mydata <- data.frame(x = exp(rnorm(10000)), y = exp(rnorm(10000))) # the transformation applied will affect the local density estimate mydata$point_density <- densVals(log10(mydata$x), log10(mydata$y))
现在，让我们绘图。（以特洛伊的答案为基础。）
library(ggplot2) ggplot(mydata, aes(x = x, y = y)) + stat_density2d(geom = "raster", aes(fill = ..density.. ^ 0.25), contour = FALSE) + scale_x_log10() + scale_y_log10() + scale_fill_gradientn(colours = colorRampPalette(c("white", blues9))(256)) + # select only the 100 sparesest points geom_point(data = dplyr::top_n(mydata, 100, -point_density), size = .5)
[(final plot)-抱歉，尚未嵌入图像。
不需要过度绘图。：）

如何在ggplot中重现smoothScatter的离群值图？

问题描述投票：12回答：2

2个回答

最新问题

如何在ggplot中重现smoothScatter的离群值图？

问题描述 投票：12回答：2

2个回答

最新问题

问题描述投票：12回答：2