标准化多对多表

问题描述 投票:0回答:1

我有一个输入 data.frame

combined
,其中包含多对多关系,我想将其标准化为 2 个表。该输入表包含多个不同样品和位置的混合物成分。

我想从中导出 2 个表:一个 (

goal_1
) 包含位置、sampleid 和 mixid;第二个表应包含实际的混合物成分 (
goal_2
)。

library(tidyverse) 

goal_1 = tribble(
  ~sampleID, ~location, ~MixtureID,
  1, "A", 1,
  2, "B", 2,
  3, "C", 3,
  4, "A", 4,
  5, "A", 1,
  6, "B", 2,
  7, "B", 2
)

goal_2 = tribble(
  ~MixtureID, ~element, ~conc_pct,
  1, "He", 0,
  1, "H", 10,
  1, "C", 0,
  1, "O", 0,
  1, "N", 0,
  1, "Ca", 90,
  1, "Cs", 0,
  2, "Si", 0,
  2, "S", 100,
  2, "V", 0,
  3, "Nb", 100,
  3, "Fe", 0,
  4, "C", 20,
  4, "H", 10,
  4, "S", 70
);

combined = left_join(goal_1, goal_2, by = "MixtureID", relationship = "many-to-many") |>
  select(-MixtureID) 

基本上我想反转

combined = left_join(...)

的操作

我可以部分生成 goal_1 表:

goal_1a = distinct(combined, sampleID, location)

但是我被困在如何从

goal_2
表中派生
combined
表上。

r database-normalization
1个回答
0
投票
  1. 如果这是多对多关系,那么从技术上讲,这不应该是 full_join 而不是 left_join 吗? (在这种情况下,似乎会产生相同的数据)
combined <- full_join( goal_1, goal_2, 
    by = "MixtureID", relationship = "many-to-many" )
  1. 使用现有的 left_join 数据集,然后标准化您的
    combined
    数据集,我建议您尝试这样的操作:
( table_1 <- combined |> distinct( sampleID, location ))
( table_2 <- combined |> distinct( sampleID, element, conc_pct ))

然后您可以将两者连接并获得相同的 28 行:

table_1 |> 
    inner_join( table_2, by = 'sampleID')
© www.soinside.com 2019 - 2024. All rights reserved.