如何迭代 scala 数据帧行并将列名称存储在可用于 for 循环内的某些操作的变量中?

问题描述 投票:0回答:2

需要了解如何使用 for 循环迭代 scala 数据帧并在 for 循环内执行一些操作。我可以使用下面的代码进行迭代,但我无法执行任何其他操作,例如将列值存储在变量中或调用另一个函数。您可以帮助将列值存储在变量中吗?

import spark.implicits._
import org.apache.spark.sql._
case class cls_Employee(name:String, sector:String, age:Int);
val df = Seq(cls_Employee("Andy","aaa", 20), cls_Employee("Berta","bbb", 30), cls_Employee("Joe","ccc", 40)).toDF()
df.as[cls_Employee].take(df.count.toInt).foreach(t => 
{t.name}
)
scala apache-spark
2个回答
1
投票

你的意思是这样的吗?

import spark.implicits._
import org.apache.spark.sql._

case class Employee(name:String, sector:String, age:Int)
val df = Seq(Employee("Andy","aaa", 20), Employee("Berta","bbb", 30), Employee("Joe","ccc", 40)).toDF()

for (row <- df.as[Employee].collect()) {
  val name = row.name
  println(name)
}

要小心,因为在这种情况下,您正在进行收集,并且在存在大量数据的情况下,您可能会从驱动程序中获得内存不足


0
投票

查看以下代码

case class CLSEmployee(name: String, sector: String, age: Int)
val df = Seq(
    ("Andy", "aaa", 20),
    ("Berta", "bbb", 30),
    ("Joe", "ccc", 40)
).toDF("name", "sector", "age")
// extract data from DataFrame & store it in variable.
val names = df.as[CLSEmployee].take(df.count.toInt).map(_.name)
// for example, function
def lower(name: String): String = { name.toLowerCase}
// calling function
df.as[CLSEmployee].map(row => lower(row.name))
© www.soinside.com 2019 - 2024. All rights reserved.