spark如何设置列数的数据集

问题描述 投票:0回答:1

我有这样一个文件:

test057 - 192.168.1.12 - 00:11:22:33:44:57 - 2ZZ66-1 node 6 -  - test052 - 192.168.1.16 - 00:11:22:33:44:61 - 2ZZ66-1 Node2 -
test058 - 192.168.1.13 - 00:11:22:33:44:58 - 2ZZ66-1 node 5 -  - test053 - 192.168.1.17 - 00:11:22:33:44:62 - 2ZZ66-1 Node1 -
test_a001 - 192.168.100.10 - 1234.5678.0123 - AZZDEF -  -  -  -  -  -
test_b001 - 192.168.100.11 - 4321.1234.1234 - GHIJKL -  -  -  -  -  -

我如何将其拆分为4列?

| name    |     ip        |     mac           |    tag         |
|--------------------------------------------------------------|    
|test057  | 192.168.1.12  | 00:11:22:33:44:57 | 2ZZ66-1 node 6 |
|test052  | 192.168.1.16  | 00:11:22:33:44:61 | 2ZZ66-1 Node2  |
|test058  | 192.168.1.13  | 00:11:22:33:44:58 | 2ZZ66-1 node 5 |    
|test053  | 192.168.1.17  | 00:11:22:33:44:62 | 2ZZ66-1 Node1  |
|test_a001| 192.168.100.10| 1234.5678.0123    | AZZDEF         |
|test_b001| 192.168.100.11| 4321.1234.1234    | GHIJKL         |
java apache-spark dataset
1个回答
0
投票

您可以加载文件,然后按“ - ”拆分并将列映射到类。

val ipFile = sc.textFile("file:////in_f/test/inpf.txt");
val ipSplit = ipFile.map(_.split("-"))
case class IP (name: String, ip: String, mac: String, tag: String)
val ipDF = ipSplit.map(x => IP(x(0),x(1), x(2),x(3))).toDF()
ipDF.select($"name", $"ip", $"mac", $"tag").take(4).foreach(println)

打印时,结果输出如下所示。

[test057, 192.168.1.12, 00:11:22:33:44:57, 2ZZ66] 
[test058, 192.168.1.13, 00:11:22:33:44:58, 2ZZ66] 
[test_a001, 192.168.100.10, 1234.5678.0123, AZZDEF]
[test_b001, 192.168.100.11, 4321.1234.1234, GHIJKL]

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.