将化学和公式分解为基本成分

问题描述 投票:0回答:1

我有化学求和公式,例如C6H12ON2PS

我希望他们这样订购:

求和公式 C H O N P S
C6H12ON2PS 6 12 1 2 1 1
C6H12NP 6 12 0 1 1 0

我的主要错误是,如果一个字母/元素不存在,并且当一个元素没有数字时,这意味着我需要在该列中添加 1。

我不太擅长 R,因为我刚刚开始,我使用另一个人的脚本,该脚本使用这些格式,但我只有文本。

我试过了

str_split(strsplit(as.character(Form), '(?<=.)(?=[A-Z])', perl=TRUE))

但是当一封信丢失时这不起作用

r split
1个回答
0
投票

也许杀伤力太大而且效率不高:

library(tidyverse)

# define sample data
x <- c("C6H12ON2PS", "C6H12NP")

# get list of elements in periodic table
elements <- PeriodicTable::symb(1:116)

# define regex to spot elements without quantity (implicitly "1")
name_regex <- elements |> 
  str_flatten(collapse = "|") %>%
  str_c("(", ., ")(?!\\d)")

# add implicit quantity "1"
x <- c("C6H12ON2PS", "C6H12NP") |> 
  str_replace_all(name_regex, "\\11")

# define regex that captures both element name and quantity
regex <- str_c(elements, "(?<", elements, ">\\d*)") |> 
  str_flatten(collapse = "|")

# define helper function to collapse rows (one for each match)
collapse_rows <- function(x) {
  if (all(is.na(x))) return(0)
  x |> discard(is.na) |> as.numeric()
}

# define helper function to convert search results to tibble
match_to_tibble <- function(m) {
  # drop first column (complete match)
  m <- m[, -1]
  
  # convert to tibble and collapse rows (one for each capturing group)
  m |> 
    as_tibble() |> 
    summarize(across(everything(), collapse_rows))
}

# extract quantities
x |> 
  str_match_all(regex) |> 
  map(match_to_tibble) |> 
  bind_rows()
#> # A tibble: 2 × 116
#>       H    He    Li    Be     B     C     N     O     F    Ne    Na    Mg    Al
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    12     0     0     0     0     6     2     1     0     0     0     0     0
#> 2    12     0     0     0     0     6     1     0     0     0     0     0     0
#> # ℹ 103 more variables: Si <dbl>, P <dbl>, S <dbl>, Cl <dbl>, Ar <dbl>,
#> #   K <dbl>, Ca <dbl>, Sc <dbl>, Ti <dbl>, V <dbl>, Cr <dbl>, Mn <dbl>,
#> #   Fe <dbl>, Co <dbl>, Ni <dbl>, Cu <dbl>, Zn <dbl>, Ga <dbl>, Ge <dbl>,
#> #   As <dbl>, Se <dbl>, Br <dbl>, Kr <dbl>, Rb <dbl>, Sr <dbl>, Y <dbl>,
#> #   Zr <dbl>, Nb <dbl>, Mo <dbl>, Tc <dbl>, Ru <dbl>, Rh <dbl>, Pd <dbl>,
#> #   Ag <dbl>, Cd <dbl>, In <dbl>, Sn <dbl>, Sb <dbl>, Te <dbl>, I <dbl>,
#> #   Xe <dbl>, Cs <dbl>, Ba <dbl>, La <dbl>, Ce <dbl>, Pr <dbl>, Nd <dbl>, …

创建于 2023-11-02,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.