我有一个字符串序列
val listOfString : Seq[String] = Seq("a","b","c")
我怎样才能进行这样的转换呢?
def addColumn(example: Seq[String]): DataFrame => DataFrame {
some code which returns a transform which add these String as column to dataframe
}
input
+-------
| id
+-------
| 1
+-------
output
+-------+-------+----+-------
| id | a | b | c
+-------+-------+----+-------
| 1 | 0 | 0 | 0
+-------+-------+----+-------
我只想把它变成
你可以使用 transform
数据集的方法,并将其与单一的 select
声明。
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.lit
def addColumns(extraCols: Seq[String])(df: DataFrame): DataFrame = {
val selectCols = df.columns.map{col(_)} ++ extraCols.map{c => lit(0).as(c)}
df.select(selectCols :_*)
}
// usage example
val yourExtraColumns : Seq[String] = Seq("a","b","c")
df.transform(addColumns(yourExtraColumns))
资源
https:/towardsdatascience.comdataframe-transform-spark-function-composition-eb8ec296c108。
https:/mungingdata.comapache-sparkchaining-custom-dataframe-transformations。
使用 .toDF()
并通过你 listOfString.
Example:
//sample dataframe
df.show()
//+---+---+---+
//| _1| _2| _3|
//+---+---+---+
//| 0| 0| 0|
//+---+---+---+
df.toDF(listOfString:_*).show()
//+---+---+---+
//| a| b| c|
//+---+---+---+
//| 0| 0| 0|
//+---+---+---+
UPDATE:
使用 foldLeft
添加 专栏 对现有 数据框 与价值。
val df=Seq(("1")).toDF("id")
val listOfString : Seq[String] = Seq("a","b","c")
val new_df=listOfString.foldLeft(df){(df,colName) => df.withColumn(colName,lit("0"))}
//+---+---+---+---+
//| id| a| b| c|
//+---+---+---+---+
//| 1| 0| 0| 0|
//+---+---+---+---+
//or creating a function
import org.apache.spark.sql.DataFrame
def addColumns(extraCols: Seq[String],df: DataFrame): DataFrame = {
val new_df=extraCols.foldLeft(df){(df,colName) => df.withColumn(colName,lit("0"))}
return new_df
}
addColumns(listOfString,df).show()
//+---+---+---+---+
//| id| a| b| c|
//+---+---+---+---+
//| 1| 0| 0| 0|
//+---+---+---+---+