我从这里报告的示例开始“https://stackoverflow.com/questions/19664313/how-to-have-query-return-samples-of-row-values-as-columns”
实际上,我在其他帖子中找到解决方案没有问题,但在这种情况下则不然。
我有以下数据库:
| Name | Gene |
|A | gene1|
|A | gene2|
|A | gene3|
|B | gene1|
|B | gene2|
|C | gene1|
C基因3
如何更改我的数据框:
|Name | Gene1 | Gene2 | Gene3 |
|A |gene1 | gene2 | gene3 |
|B |gene1 | gene2 | ----- |
|C |gene1 | ----- | gene3 |
获取每个样本的原始数据也可能有用。
提前非常感谢
我尝试应用如何让查询将行值的样本作为列返回?中报告的代码,但没有成功。
您可以使用PIVOT功能。
SELECT
name,
(CASE WHEN gene1 IS NULL THEN '-----' ELSE gene1 END) AS gene1,
(CASE WHEN gene2 IS NULL THEN '-----' ELSE gene2 END) AS gene2,
(CASE WHEN gene3 IS NULL THEN '-----' ELSE gene3 END) AS gene3
FROM table_name
PIVOT (
FIRST(gene) AS gene
FOR (gene) IN ('gene1' as gene1, 'gene2' as gene2, 'gene3' as gene3)
);
+----+-----+-----+-----+
|name|gene1|gene2|gene3|
+----+-----+-----+-----+
|A |gene1|gene2|gene3|
|B |gene1|gene2|-----|
|C |gene1|-----|gene3|
+----+-----+-----+-----+
或
scala> df.printSchema
root
|-- name: string (nullable = true)
|-- gene: string (nullable = true)
df
.groupBy("name")
.pivot("gene")
.agg(
when(
first("gene").isNotNull,
first("gene")
).otherwise("-----")
).show(false)
+----+-----+-----+-----+
|name|gene1|gene2|gene3|
+----+-----+-----+-----+
|A |gene1|gene2|gene3|
|B |gene1|gene2|-----|
|C |gene1|-----|gene3|
+----+-----+-----+-----+