Pyspark加入不需要5个位置参数?

问题描述 投票:0回答:1

我正在Pyspark的5个列上实现LEFT JOIN。但它正在抛出错误,如下所示

TypeError:join()获取2到4个位置参数,但给出了5个

代码实施:

Tgt_df_time_in_zone_detail = Tgt_df_view_time_in_zone_detail_dtaas.join(Tgt_df_individual_in_shift_tiz
,Tgt_df_view_time_in_zone_detail_dtaas.id_individual == Tgt_df_individual_in_shift_tiz.id_individual, 
(Tgt_df_view_time_in_zone_detail_dtaas.timestamp_start >= Tgt_df_individual_in_shift_tiz.swipein)
 &   (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_start <= Tgt_df_individual_in_shift_tiz.swipeout)
 & (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_end >= Tgt_df_individual_in_shift_tiz.swipein) 
&(Tgt_df_view_time_in_zone_detail_dtaas.timestamp_end <= Tgt_df_individual_in_shift_tiz.swipeout)
, "left_outer") 

为什么Pyspark不加入5列?那么最好的方法是什么呢?

pyspark left-join databricks azure-databricks
1个回答
1
投票

猜猜,你错过了第一和第二个条件。试试这个,如果有效的话。

Tgt_df_time_in_zone_detail = Tgt_df_view_time_in_zone_detail_dtaas.join(Tgt_df_individual_in_shift_tiz,
(Tgt_df_view_time_in_zone_detail_dtaas.id_individual == Tgt_df_individual_in_shift_tiz.id_individual)
& (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_start >= Tgt_df_individual_in_shift_tiz.swipein)
& (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_start <= Tgt_df_individual_in_shift_tiz.swipeout)
& (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_end >= Tgt_df_individual_in_shift_tiz.swipein) 
& (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_end <= Tgt_df_individual_in_shift_tiz.swipeout)
, "left_outer")
© www.soinside.com 2019 - 2024. All rights reserved.