Sales
和Clients
。我想使用sqldf::sqldf()
和merge()
在这些数据帧上执行交叉联接,并使用两种方法获得完全相同的结果。到目前为止,我只能获得两个数据帧,并且行的顺序不同。
这是用于生成Sales
和Clients
数据帧的代码:
set.seed(1)
Sales <- data.frame(
Product = sample(c("Toaster", "Radio", "TV"), size = 7, replace = TRUE),
CustomerID = c(rep("1_2019", 2), paste(2:3, "2019", sep = "_"), paste(1:3, "2020", sep = "_"))
)
Sales$Price <- round(ifelse(Sales$Product == "TV", rnorm(1, 400, 20),
ifelse(Sales$Product == "Toaster", rnorm(1, 40, 2),
rnorm(1, 35, 2))))
Clients <- data.frame(
CustomerID = c(paste(2:4, "2019", sep = "_"), paste(1:2, "2020", sep = "_")),
State = sample(c("CA", "AZ", "IL", "MA"), size = 5, replace = TRUE)
)
这就是我得到的:
library(sqldf)
# cross join with base R
out1 <- merge(x = Sales, y = Clients, by = NULL)
# cross join with sqldf
out2 <- sqldf("SELECT *
FROM Sales
CROSS JOIN Clients")
out1
和out2
具有不同的行顺序。如何调整sqldf()
调用以使out1
和out2
完全相同?
这是我得到的最接近的:
merge(x = Sales, y = Clients, by = NULL)
sqldf("SELECT *
FROM Sales
CROSS JOIN Clients
ORDER BY State DESC, Clients.CustomerID")
我有两个数据框:销售和客户。我想使用sqldf :: sqldf()并使用merge()在这些数据帧上执行交叉联接,并使用两种方法获得完全相同的结果。到目前为止,我是...
ORDER BY
中包括sqldf
是重要的,因为它使人明白,在SQL中,除非明确指示,否则无法保证排序。