我正在SQL Server上使用R Services。以下是我使用R来计算列的最大值的代码示例:
EXECUTE sp_execute_external_script @language = N'R'
, @script = N'
r = order(InputDataSet$Id)
InputDataSet = InputDataSet[r,]
library(dplyr)
OutputDataSet <- InputDataSet %>% group_by(Id) %>% mutate(
Max_Col1 = max(Col1, na.rm = TRUE),
Max_Col2 = max(Col2, na.rm = TRUE),
Max_Col3 = max(Col3, na.rm = TRUE),) %>% slice(1)
'
, @input_data_1 = N'SELECT * FROM table_name;'
这给我以下错误:
Msg 39004, Level 16, State 20, Line 26
A 'R' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004.
Msg 39019, Level 16, State 1, Line 26
An external script error occurred:
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Error in mutate_impl(.data, dots) :
attempt to use zero-length variable name
Calls: source ... mutate -> mutate_ -> mutate_.tbl_df -> mutate_impl -> .Call
Error in ScaleR. Check the output for more information.
Error in eval(expr, envir, enclos) :
Error in ScaleR. Check the output for more information.
Calls: source -> withVisible -> eval -> eval -> .Call
Execution halted
当我在RStudio上执行相同的代码时,它可以完美运行,但在SQL Server上却报错。我不明白这个错误是关于什么的。
R Version
在我的SQL Server上是:3.2.2(消防安全)SQL Server上的packageVersion("dplyr")
:0.4.3
问题将基于该列的class
。如果不是numeric
,请转换为numeric
,它应该可以工作
OutputDataSet <- InputDataSet %>%
group_by(Id) %>%
mutate(
Max_Col1 = max(as.numeric(as.character(Col1)), na.rm = TRUE),
Max_Col2 = max(as.numeric(as.character(Col2)), na.rm = TRUE),
Max_Col3 = max(as.numeric(as.character(Col3)), na.rm = TRUE),) %>%
slice(1)
[如果我们使用的是dplyr
的较新版本
InputDataSet %>%
type.convert(as.is = TRUE) %>% # should change the type
group_by(Id) %>%
mutate_at(vars(starts_with("Col")), list(Max = ~ max(., na.rm = TRUE))) %>%
slice(1)