需要了解如何使用 for 循环迭代 scala 数据帧并在 for 循环内执行一些操作。我可以使用下面的代码进行迭代,但我无法执行任何其他操作,例如将列值存储在变量中或调用另一个函数。您可以帮助将列值存储在变量中吗?
import spark.implicits._
import org.apache.spark.sql._
case class cls_Employee(name:String, sector:String, age:Int);
val df = Seq(cls_Employee("Andy","aaa", 20), cls_Employee("Berta","bbb", 30), cls_Employee("Joe","ccc", 40)).toDF()
df.as[cls_Employee].take(df.count.toInt).foreach(t =>
{t.name}
)
你的意思是这样的吗?
import spark.implicits._
import org.apache.spark.sql._
case class Employee(name:String, sector:String, age:Int)
val df = Seq(Employee("Andy","aaa", 20), Employee("Berta","bbb", 30), Employee("Joe","ccc", 40)).toDF()
for (row <- df.as[Employee].collect()) {
val name = row.name
println(name)
}
要小心,因为在这种情况下,您正在进行收集,并且在存在大量数据的情况下,您可能会从驱动程序中获得内存不足
查看以下代码
case class CLSEmployee(name: String, sector: String, age: Int)
val df = Seq(
("Andy", "aaa", 20),
("Berta", "bbb", 30),
("Joe", "ccc", 40)
).toDF("name", "sector", "age")
// extract data from DataFrame & store it in variable.
val names = df.as[CLSEmployee].take(df.count.toInt).map(_.name)
// for example, function
def lower(name: String): String = { name.toLowerCase}
// calling function
df.as[CLSEmployee].map(row => lower(row.name))