如何使用我在数据框中的值上创建的函数并将值替换为函数的结果?

问题描述 投票:0回答:2

我创建了一个名为getExpressionLevel的函数,问题要求我使用此函数将数字替换为下面的语句。那么我需要用什么来实现这一目标呢?

getExpressionLevel的函数;

function(a)    {
  if    (a<5)    {
  cat    ("none")
  }

 if    (a>=5&a<20)    {
  cat    ("low")
 }

 if    (a>=20&a<60)    {
  cat    ("medium")
 }

  if    (a>=60)    {
  cat    ("high")
  }
}
  • 表达水平低于5时“无”
  • 表达水平高于或等于5且低于20的“低”
  • 表达水平高于或等于20且低于60的“中等”
  • 表达水平高于或等于60的“高”

问题是;

创建一个名为data.frameexpression_levels,它有10行(每个基因一个)和3列(每个细胞系一个)。然后计算每个细胞系中每个基因的平均表达,并使用getExpressionLevel函数标记相应的表达。

这是我当前的data.frame。其中的数据需要替换为getExpression函数的结果。

  genename       Kc167         BG3         S2

1   Clic        7.333333      48.33333      75.00000

2   Treh        24.666667     12.66667      52.33333

3   bib         31.333333      79.33333     82.00000

4   CalpC       65.000000     69.33333      63.66667

5   tud         59.666667     81.66667      16.33333

6   cort        74.333333     50.66667      28.66667

7   S2P         72.000000     39.66667      50.66667

8   Mitofilin   38.333333     29.00000      54.66667

9   Oxp         73.666667     49.33333      42.66667

10  Ada1-2      87.333333     42.00000      28.00000

这是预期的data.frame:

          Kc167      BG3        S2

 Clic       low      medium     high

 Treh      medium     low      medium

 bib       medium     high      high

 CalpC      high      high      high

 tud       medium     high      low

 cort       high     medium    medium

 S2P        high      medium   medium

 MitofiliN medium    medium    medium

 Oxp        high      medium   medium

 Ada1-2     high      medium   medium
r function dataframe rstudio
2个回答
0
投票

功能方式。了解如何使用功能总是有帮助的。

## sample data
df <- data.table(genename = c('Clic','Treh','bib','CalpC'),
                 Kc167 = c(7.333,24.666,31.3333,65),
                 BG3 = c(48.33,12.66,79.33,69.33),
                 S2 = c(75.00,52.33,82.00,63.66))

## this function updates values based on following criterias
get_values <- function(x)
{
    if(x < 5) return ('None')
    else if ((x >= 5) && (x < 20)) return ('low')
    else if ((x >= 20) && (x < 60)) return ('medium')
    else if (x >= 60) return ('high')
}

## creating a new data frame with answers
df2 <- df$genename
df2$Kc167 <- sapply(df$Kc167, get_values)
df2$BG3 <- sapply(df$BG3, get_values)
df2$S2 <- sapply(df$S2, get_values)

  genename  Kc167    BG3     S2
1:     Clic    low medium   high
2:     Treh medium    low medium
3:      bib medium   high   high
4:    CalpC   high   high   high

1
投票

希望这可以帮助!

bin_breaks <- c(-Inf, 5, 20, 60, Inf)
bin_labels <- c("none", "low", "medium", "high")
df[,-1] <- sapply(df[,-1], function(x) cut(x, 
                                           breaks = bin_breaks, 
                                           labels = bin_labels, 
                                           right = F))
df

输出是:

    genename  Kc167    BG3     S2
1       Clic    low medium   high
2       Treh medium    low medium
3        bib medium   high   high
4      CalpC   high   high   high
5        tud medium   high    low
6       cort   high medium medium
7        S2P   high medium medium
8  Mitofilin medium medium medium
9        Oxp   high medium medium
10    Ada1-2   high medium medium

样本数据:

df <- structure(list(genename = c("Clic", "Treh", "bib", "CalpC", "tud", 
"cort", "S2P", "Mitofilin", "Oxp", "Ada1-2"), Kc167 = c(7.333333, 
24.666667, 31.333333, 65, 59.666667, 74.333333, 72, 38.333333, 
73.666667, 87.333333), BG3 = c(48.33333, 12.66667, 79.33333, 
69.33333, 81.66667, 50.66667, 39.66667, 29, 49.33333, 42), S2 = c(75, 
52.33333, 82, 63.66667, 16.33333, 28.66667, 50.66667, 54.66667, 
42.66667, 28)), .Names = c("genename", "Kc167", "BG3", "S2"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

编辑:在代码中添加适当的right参数以满足边界条件和OP的要求(由@drf提供)。

© www.soinside.com 2019 - 2024. All rights reserved.