单独的逗号分隔值不同的行[重复]

问题描述 投票:1回答:1

这个问题在这里已有答案:

我有这样的数据。

structure(list(structureId = c("1JDN", "1DP4", "1XS5", "1SW1", 
"1P99", "1IXH"), structureTitle = c("Crystal Structure of Hormone Receptor", 
"DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRIURETIC PEPTIDE RECEPTOR", 
"The Crystal Structure of Lipoprotein Tp32 from Treponema pallidum", 
"Crystal structure of ProX from Archeoglobus fulgidus in complex with proline betaine", 
"1.7A crystal structure of protein PG110 from Staphylococcus aureus", 
"PHOSPHATE-BINDING PROTEIN (PBP) COMPLEXED WITH PHOSPHATE"), 
    chainId = c("A", "A", "A", "A", "A", "A"), ligandId = c("BMA,CL,FUC,MAN,NAG,NDG", 
    "CL,NAG,SO4", "MET", "MSE,PBE,ZN", "GLY,MET", "PO4"), ligandName = c("BETA-D-MANNOSE,CHLORIDE ION,ALPHA-L-FUCOSE,ALPHA-D-MANNOSE,N-ACETYL-D-GLUCOSAMINE,2-(ACETYLAMINO)-2-DEOXY-A-D-GLUCOPYRANOSE", 
    "CHLORIDE ION,N-ACETYL-D-GLUCOSAMINE,SULFATE ION", "METHIONINE", 
    "SELENOMETHIONINE,1,1-DIMETHYL-PROLINIUM,ZINC ION", "GLYCINE,METHIONINE", 
    "PHOSPHATE ION")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

我想将ligandIdligandName的值分成不同的行。我的意思是,每排只有1个ligandIdligandName

我尝试过使用separate_rows,但是我的两列并没有很好地处理。

df %>% separate_rows(ligandId, ligandName, sep = ",")

但是我收到了这个错误:

> df %>% separate_rows(ligandId, ligandName, sep = ",")
Error: All nested columns must have the same number of elements.
Call `rlang::last_error()` to see a backtrace
> rlang::last_error()
<error>
message: All nested columns must have the same number of elements.
class:   `rlang_error`
backtrace:
  1. tidyr::separate_rows(., ligandId, ligandName, sep = ",")
 10. tidyr:::unnest.data.frame(data, !!!syms(vars), .drop = FALSE)
 12. tidyr::separate_rows(., ligandId, ligandName, sep = ",")
Call `rlang::last_trace()` to see the full backtrace

另外,我试过这个:Split comma-separated strings in a column into separate rows但是没有成功。

最后我想要这样的东西:

1JDN   A   BMA   BETA-D-MANNOSE
1JDN   A   CL    CHLORIDE ION
1JDN   A   FUC   ALPHA-L-FUCOSE
1JDN   A   MAN   ALPHA-D-MANNOSE
1JDN   A   NAG   N-ACETYL-D-GLUCOSAMINE
1JDN   A   NDG   2-(ACETYLAMINO)-2-DEOXY-A-D-GLUCOPYRANOSE
...
r split dplyr row tidyverse
1个回答
3
投票

我们可以使用separate_rows

library(tidyverse)
df1 %>% 
    separate_rows(ligandId, sep=",")

由于'ligandId'每行的单词数量,'ligandName'不一样,一个选项是将gather变成'long'格式,然后在'val'列上执行separate_rows,最后spread返回'宽'

df1 %>% 
  gather(key, val, ligandId, ligandName) %>%
  separate_rows(val, sep=",") %>% 
  group_by(structureId, key) %>% 
  mutate(rn = row_number()) %>% 
  spread(key, val) %>%
  select(-rn)
# A tibble: 17 x 5
# Groups:   structureId [6]
#   structureId structureTitle                                        chainId ligandId ligandName                
#   <chr>       <chr>                                                 <chr>   <chr>    <chr>                     
# 1 1DP4        DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRI… A       CL       CHLORIDE ION              
# 2 1DP4        DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRI… A       NAG      N-ACETYL-D-GLUCOSAMINE    
# 3 1DP4        DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRI… A       SO4      SULFATE ION               
# 4 1IXH        PHOSPHATE-BINDING PROTEIN (PBP) COMPLEXED WITH PHOSP… A       PO4      PHOSPHATE ION             
# 5 1JDN        Crystal Structure of Hormone Receptor                 A       BMA      BETA-D-MANNOSE            
# 6 1JDN        Crystal Structure of Hormone Receptor                 A       CL       CHLORIDE ION              
# 7 1JDN        Crystal Structure of Hormone Receptor                 A       FUC      ALPHA-L-FUCOSE            
# 8 1JDN        Crystal Structure of Hormone Receptor                 A       MAN      ALPHA-D-MANNOSE           
# 9 1JDN        Crystal Structure of Hormone Receptor                 A       NAG      N-ACETYL-D-GLUCOSAMINE    
#10 1JDN        Crystal Structure of Hormone Receptor                 A       NDG      2-(ACETYLAMINO)-2-DEOXY-A…
#11 1P99        1.7A crystal structure of protein PG110 from Staphyl… A       GLY      GLYCINE                   
#12 1P99        1.7A crystal structure of protein PG110 from Staphyl… A       MET      METHIONINE                
#13 1SW1        Crystal structure of ProX from Archeoglobus fulgidus… A       MSE      SELENOMETHIONINE          
#14 1SW1        Crystal structure of ProX from Archeoglobus fulgidus… A       PBE      1                         
#15 1SW1        Crystal structure of ProX from Archeoglobus fulgidus… A       ZN       1-DIMETHYL-PROLINIUM      
#16 1SW1        Crystal structure of ProX from Archeoglobus fulgidus… A       <NA>     ZINC ION                  
#17 1XS5        The Crystal Structure of Lipoprotein Tp32 from Trepo… A       MET      METHIONINE            

对于单词数量不同的多列,请使用cSplit

library(splitstackshape)
na.omit(cSplit(df1, c("ligandId", "ligandName"), sep=",", "long"))
© www.soinside.com 2019 - 2024. All rights reserved.