如何合并数据帧A和B以将数据帧作为数据帧C

Question

DF A：

      X       Y  
    0-10   10-25 
    10-20  25-75 
    20-30  75-150

DF B：

Binned Name Value
0-10     X    20
10-20    X    100
20-30    X    200
10-25    Y    90
25-75    Y    25
75-150   Y    90

DF C：

    X    X_Val     Y     Y_Val  
   0-10   20      10-25   90
  10-20   100     25-75   25
  20-30   200     75-150  30

Answer 1

这应该工作。

# pivot B to make columns X & Y
df_b = df_b.pivot_table(values=['Value'], index=['Binned'], columns=['Name']).reset_index()
df_b.columns = ['Binned', 'X', 'Y']

# merge X & Y cols sequentially
df_c = pd.merge(df_a, df_b[['Binned', 'X']], how='left', left_on=['X'], right_on=['Binned'], suffixes=('', '_Val'))
df_c = pd.merge(df_c, df_b[['Binned', 'Y']], how='left', left_on=['Y'], right_on=['Binned'], suffixes=('', '_Val'))
df_c = df_c[['X', 'X_Val', 'Y', 'Y_Val']]

#        X  X_Val       Y  Y_Val
# 0   0-10   20.0   10-25   90.0
# 1  10-20  100.0   25-75   25.0
# 2  20-30  200.0  75-150   90.0

Answer 2

我想你需要：

#reshape dfA for inner merge with dfB
df1 = dfA.melt(var_name='Name', value_name='Binned')

df = dfB.merge(df1)

#reshape for multiple columns by groups
df = (df.set_index([df.groupby('Name').cumcount(), 'Name'])
        .unstack()
        .sort_index(axis=1, level=1)
        .rename(columns={'Binned':'','Value':'_Val'})
        .swaplevel(0,1,axis=1))
df.columns = df.columns.map(''.join)

print (df)
       X  X_Val       Y  Y_Val
0   0-10     20   10-25     90
1  10-20    100   25-75     25
2  20-30    200  75-150     90

Answer 3

为每个名称编写sql查询

    df1=spark.sqlContext("select * from DF_B where name='X'")
    df2=spark.sqlContext("select * from DF_B where name='Y'")

为每个具有行ID的数据框创建一列

    df1_id= df1.withColumn("id", monotonically_increasing_id())
    df2_id= df2.withColumn("id", monotonically_increasing_id())

现在我们可以加入数据帧df1_id和df2_id。

    df1_id.join(df2_id,"id").show()

如何合并数据帧A和B以将数据帧作为数据帧C

问题描述投票：0回答：3

3个回答

最新问题

如何合并数据帧A和B以将数据帧作为数据帧C

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3