R:在向量化功能时跳过空元素

问题描述 投票:0回答:1

嗨,我正在尝试学习R中的向量化。

我有以下代码:

set.seed(23)
obs_num=100
Observation=seq(1,obs_num)
Location_Type1=sample(1:2, obs_num, replace=T)
Location_Type2=sample(1:2, obs_num, replace=T) 
# The above does not lead to any errors

#Location_Type2=sample(1, obs_num, replace=T) 
##Error occurs when I use this formula instead.

low_bound = runif(obs_num,0,1)
mean = runif(obs_num,10,15)
df1= data.frame(Observation,Location_Type1,Location_Type2,mean,low_bound)

Vectorized_function=function(data){
  #Create groups
  i1= data[["Location_Type1"]] == 1 & data[["Location_Type2"]] == 1
  i2= data[["Location_Type1"]] == 2 & data[["Location_Type2"]] == 1
  i3= data[["Location_Type1"]] == 1 & data[["Location_Type2"]] == 2
  i4= data[["Location_Type1"]] == 2 & data[["Location_Type2"]] == 2
  #Draw values
  data[i1, "draw_value"] <- rtruncnorm(sum(i1),a=data[i1,'low_bound'],mean = data[i1, "mean"])
  data[i2, "draw_value"] <- rtruncnorm(sum(i2),a=data[i2,'low_bound'],mean = data[i2, "mean"])
  data[i3, "draw_value"] <- rtruncnorm(sum(i3),a=data[i3,'low_bound'],mean = data[i3, "mean"])
  data[i4, "draw_value"] <- rtruncnorm(sum(i4),a=data[i4,'low_bound'],mean = data[i4, "mean"])
  data
}

getvalue = Vectorized_function(data=df1)

在df1中,有两列Location_Type1和Location_Type2都可以取值1或2。当存在四种组合时,以上代码将起作用。

a)Location_Type1 = 1&Location_Type2 = 1;

b)Location_Type1 = 1&Location_Type2 = 2;

c)Location_Type1 = 2&Location_Type2 = 1

d)Location_Type1 = 2&Location_Type2 = 2]

我正在尝试根据上述四个条件从截断的正态分布中提取。在我的实际数据中,这可能并不总是发生。

为了复制这种情况,假设我们在上面的代码中更改了以下行,

Location_Type2=sample(1, obs_num, replace=T) #This implies LocatioN_Type2 is only one type

在这种情况下,我收到一条错误消息:

rtruncnorm(sum(i3)中的错误,a = data [i3,“ low_bound”],平均值= data [i3 ,:length(a)> 0不是TRUE

我可以看到发生了什么。本质上,不存在满足条件i3和i4(即sum(i3)和sum(i4)= 0)的任何观察结果。在这种情况下,下限部分(代码中的“ a”)会引起问题。

有人可以建议如何确保我可以在代码中处理这些情况。我希望向量化函数能够处理任何条件为空的情况。

r vectorization data-cleaning
1个回答
0
投票

[@ akrun的注释之后,我对该函数进行了如下调整:

Vectorized_function=function(data){
  #Create groups
  i1= data[["Location_Type1"]] == 1 & data[["Location_Type2"]] == 1
  i2= data[["Location_Type1"]] == 2 & data[["Location_Type2"]] == 1
  i3= data[["Location_Type1"]] == 1 & data[["Location_Type2"]] == 2
  i4= data[["Location_Type1"]] == 2 & data[["Location_Type2"]] == 2
  #Draw values
  data[i1, "draw_value"] <- try(rtruncnorm(sum(i1),a=data[i1,'low_bound'],mean = data[i1, "mean"]),silent = T)
  data[i2, "draw_value"] <- try(rtruncnorm(sum(i2),a=data[i2,'low_bound'],mean = data[i2, "mean"]),silent = T)
  data[i3, "draw_value"] <- try(rtruncnorm(sum(i3),a=data[i3,'low_bound'],mean = data[i3, "mean"]),silent = T)
  data[i4, "draw_value"] <- try(rtruncnorm(sum(i4),a=data[i4,'low_bound'],mean = data[i4, "mean"]),silent = T)
  data
}

这似乎从现在开始有效,并且可能由于缺少/没有观察而导致错误。

© www.soinside.com 2019 - 2024. All rights reserved.