根据pyspark中的图邻接表表示生成2跳投影

问题描述 投票:1回答:1

我已从以下边缘输入文件中在pyspark rdd中将邻接表作为[键,值]对生成:

7  10

7  8

7  4

8  9

8  5

9  5

9 10

10  6

4  5

5  6

4  6

1  4

1  3

2  3

2  6

3  4

3  6
rdd1 = sc.textFile("/cc/data/data_cc.txt")
rdd2 = rdd1.map(lambda value : value.split()).flatMap(lambda value : [[value[0] , value[1]],[value[1], value[0]]]).reduceByKey(lambda x,y : x+","+y).map(lambda x :[x[0], x[1].split(",")])
rdd2.collect()
[['4', ['6', '1', '3', '7', '5']], ['1', ['4', '3']], ['10', ['7', '9', '6']], ['8', ['7', '9', '5']], ['9', ['8', '5', '10']], ['7', ['10', '8', '4']], ['5', ['8', '9', '4', '6']], ['6', ['10', '5', '4', '2', '3']], ['3', ['1', '2', '4', '6']], ['2', ['3', '6']]]

现在,我想将这些键值对转换为2跳投影。

例如

对于键='4',该值是列表['6','1','3','7','5']以获取其2跳投影,我必须将'6'替换为['10','5','4','2','3'],将'1'替换为['4','3'],依此类推.. 2跳投影如下:

['4',[['10','5','4','2','3'],['4','3'],['1', '2', '4', '6'],['10', '8', '4'],[['8', '9', '4', '6']]].

类似地,我必须对所有键值对进行操作。

python pyspark mapreduce
1个回答
0
投票
rdd3 = rdd2.flatMap(lambda x : [[(x[0],x[1][k]),x[1]] for k in range(len(x[1]))])
rdd4 = rdd2.flatMap(lambda x : [[(x[1][k],x[0]),x[1]] for k in range(len(x[1]))])
rdd5 = rdd3.join(rdd4)
rdd6 = rdd5.map(lambda x: [x[0][0] , [x[0][1],x[1][1]]]).reduceByKey(lambda x,y : x + y)

这四行很不错。我得到以下输出:

[('1', ['4', ['7', '5', '6', '1', '3'], '3', ['1', '2', '4', '6']]), ('10', ['9', ['8', '5', '10'], '7', ['10', '8', '4'], '6', ['5', '4', '2', '3', '10']]), ('2', ['3', ['1', '2', '4', '6'], '6', ['5', '4', '2', '3', '10']]), ('3', ['1', ['4', '3'], '2', ['3', '6'], '6', ['5', '4', '2', '3', '10'], '4', ['7', '5', '6', '1', '3']]), ('4', ['1', ['4', '3'], '5', ['6', '8', '9', '4'], '6', ['5', '4', '2', '3', '10'], '3', ['1', '2', '4', '6'], '7', ['10', '8', '4']]), ('5', ['4', ['7', '5', '6', '1', '3'], '6', ['5', '4', '2', '3', '10'], '8', ['7', '9', '5'], '9', ['8', '5', '10']]), ('6', ['5', ['6', '8', '9', '4'], '2', ['3', '6'], '3', ['1', '2', '4', '6'], '4', ['7', '5', '6', '1', '3'], '10', ['7', '9', '6']]), ('7', ['8', ['7', '9', '5'], '10', ['7', '9', '6'], '4', ['7', '5', '6', '1', '3']]), ('8', ['7', ['10', '8', '4'], '5', ['6', '8', '9', '4'], '9', ['8', '5', '10']]), ('9', ['10', ['7', '9', '6'], '5', ['6', '8', '9', '4'], '8', ['7', '9', '5']])]
© www.soinside.com 2019 - 2024. All rights reserved.