我正在通过指定分区数从文本文件创建RDD(Spark 1.6)。但是它给我的分区数量不同于指定的分区数量。
案例1
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 1)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[50] at textFile at <console>:27
scala> people.getNumPartitions
res36: Int = 1
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 2)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[52] at textFile at <console>:27
scala> people.getNumPartitions
res37: Int = 2
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 3)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[54] at textFile at <console>:27
scala> people.getNumPartitions
res38: Int = 3
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 4)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[56] at textFile at <console>:27
scala> people.getNumPartitions
res39: Int = 4
案例2
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 0)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[72] at textFile at <console>:27
scala> people.getNumPartitions
res47: Int = 1
案例3
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 5)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[58] at textFile at <console>:27
scala> people.getNumPartitions
res40: Int = 6
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 6)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[60] at textFile at <console>:27
scala> people.getNumPartitions
res41: Int = 7
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 7)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[62] at textFile at <console>:27
scala> people.getNumPartitions
res42: Int = 8
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 8)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[64] at textFile at <console>:27
scala> people.getNumPartitions
res43: Int = 9
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 10)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[68] at textFile at <console>:27
scala> people.getNumPartitions
res45: Int = 11
案例4
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 9)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[66] at textFile at <console>:27
scala> people.getNumPartitions
res44: Int = 11
scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 11)
people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[70] at textFile at <console>:27
scala> people.getNumPartitions
res46: Int = 13
文件内容/ home / pvikash / data / test.txt为:
这是一个测试文件。将用于rdd分区
基于上述情况,我有几个问题。
任何解释将不胜感激。