[与lm一起使用并想减少迭代次数时，它会丢失变量名吗？

Question

我已经使用melt将我的所有32列合并为一个列，将其值合并为一个列，并将自变量合并为一个列。

然后我想使用lapply生成与以下行匹配的lmYears Species Farmland

我有两种方法可以做到这一点；1.取一个变量名称的lm，即所有年份的Starling值（1994：2013）2.取所有变量名称的lm，即八哥，云雀，田La...。每年的农田价值都在一起。

我的数据示例：

structure(list(Years = c(1994L, 1994L, 1995L, 1996L, 1997L, 1998L
), Species = structure(1:6, .Label = c("Starling", "Skylark", 
"YellowWagtail", "Kestrel", "Yellowhammer", "Greenfinch"), class = "factor"), 
    Farmland = c(13260L, 13520L, 8129L, 15575L, 18686L, 18541L
    )), row.names = c(1L, 20L, 40L, 60L, 80L, 100L), class = "data.frame")

另一个例子：

'data.frame':   570 obs. of  3 variables:
 $ Years   : int  1994 1995 1996 1997 1998 1999 2000 2002 2003 2004 ...
 $ Species : Factor w/ 30 levels "Starling","Skylark",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Farmland: int  13260 15551 16335 18997 18571 18376 15770 16054 15101 16276 ...

Q.1的lm的代码：

df_try <- lapply(1:n, function(x) lm(Farmland ~ Years + Species, work_practice))

输出：

Call:
lm(formula = Farmland ~ Years + Species, data = work_practice)

Coefficients:
         (Intercept)                 Years        SpeciesSkylark  
           -708278.6                 363.0                 578.8  
SpeciesYellowWagtail        SpeciesKestrel   SpeciesYellowhammer  
             -9329.8                -744.4                -238.7  
   SpeciesGreenfinch        SpeciesSwallow    SpeciesHousemartin  
               246.3                 506.6               -3928.5  
       SpeciesLinnet  SpeciesGreyPartridge     SpeciesTurtleDove  
              -680.2               -5825.1               -5417.4  
  SpeciesCornbunting      SpeciesBullfinch     SpeciesSongthrush  
            -12187.9               -5688.7                -279.1  
    SpeciesBlackbird        SpeciesDunnock    SpeciesWhitethroat  
               490.2                 299.0                 231.6  
         SpeciesRook    SpeciesReedBunting      SpeciesStockdove  
              -653.9               -6864.5               -1788.0  
    SpeciesGoldfinch        SpeciesJackdaw           SpeciesWren  
               156.6                -637.3                 553.1  
        SpeciesRobin        SpeciesBluetit       SpeciesGreatTit  
               328.7                 460.3                 384.3  
SpeciesLongtailedTit      SpeciesChaffinch        SpeciesBuzzard  
             -1359.8                 499.7               -6888.2  
  SpeciesSparrowhawk  
             -4458.5

这个问题；缺少Starling（第一个变量名），并且结果不需要Years（如何将其删除），这在调用时被迭代了19次，我认为是由于数据帧。有没有办法只调用一次？

当变量（种类）在列中时，我尝试过执行此操作，但输出仅调用一个变量19次...

Answer 1

这里的问题是，如果r必须用Species中的所有回归变量来估计模型，那么我们将达到完美的共线性。我将使用data.table::dcast将Species转换为假人：

df <- structure(list(Years = c(1994L, 1994L, 1995L, 1996L, 1997L, 1998L
), Species = structure(1:6, .Label = c("Starling", "Skylark", 
                                       "YellowWagtail", "Kestrel", "Yellowhammer", "Greenfinch"), class = "factor"), 
Farmland = c(13260L, 13520L, 8129L, 15575L, 18686L, 18541L
)), row.names = c(1L, 20L, 40L, 60L, 80L, 100L), class = "data.frame") 

dfDummies <- suppressWarnings(data.table::dcast(df, Years + Farmland ~ Species, fun.aggregate=function(x) 1, fill=0))

在控制台上：

> dfDummies
  Years Farmland Starling Skylark YellowWagtail Kestrel Yellowhammer Greenfinch
1  1994    13260        1       0             0       0            0          0
2  1994    13520        0       1             0       0            0          0
3  1995     8129        0       0             1       0            0          0
4  1996    15575        0       0             0       1            0          0
5  1997    18686        0       0             0       0            1          0
6  1998    18541        0       0             0       0            0          1

注意假人如何相互排斥：

> rowSums(dfDummies[, as.character(df[["Species"]])])
[1] 1 1 1 1 1 1

这意味着截距可以写为Species二进制变量的线性组合。 R知道这一点，它默默地从估计中删除了其中一列-完美的共线性使得无法找到OLS问题的唯一解。

关于完美共线性here的更多信息。

我不确定您为什么在这里使用lapply()。如果您只有一个data.frame，并且对所有观察值都对估计特定模型感兴趣，则可以运行：

lm(formula = Farmland ~ Years + Species, df)

[与lm一起使用并想减少迭代次数时，它会丢失变量名吗？

问题描述投票：0回答：1

1个回答

最新问题

[与lm一起使用并想减少迭代次数时，它会丢失变量名吗？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1