pyrpark dataframe in rlike如何从数据帧列之一逐行传递字符串值

问题描述 投票:0回答:1

“ rlike”函数中的pyspark数据帧如何从数据帧列之一逐行传递字符串值

[enter code here运行时得到错误信息

df.withColumn("match_str", df.text1.rlike(df.match)).show(truncate=False)

    Py4JError: An error occurred while calling o2165.rlike. Trace:
    py4j.Py4JException: Method rlike([class org.apache.spark.sql.Column]) does not exist

any workaround or solution ?

df = spark.createDataFrame([
    (1, 'test1 test1_0|test1 test0', 'This is a test1 test1_0'),
    (2, 'test2 test2_0|test1 test0', None),
    (3, 'Nan', 5.2, 23, 'Nan'),
    (4, 'test4 test4_0|test1 test0', 'This is a test4 test4_0'),
   ], ['id', 'match', 'text1'])



+---+-------------------------+-----------------------+
|id |match                    |text1                  |
+---+-------------------------+-----------------------+
|1  |test1 test1_0|test1 test0|This is a test1 test1_0|
|2  |test2 test2_0|test1 test0|null                   |
|3  |Nan                      |Nan                    |
|4  |test4 test4_0|test1 test0|This is a test4 test4_0|
+---+-------------------------+-----------------------+

root
 |-- id: long (nullable = true)
 |-- match: string (nullable = true)
 |-- text1: string (nullable = true)


df.withColumn("match_str", df.text1.rlike(df.select(df.match).head()["match"])).show(truncate=False)

**注意:df.select(df.match).head()[“ match”]传递值第一行匹配,在这种情况下,将“ test1 test1_0 | test1 test0”匹配到所有行。我想逐行传递rlike值。像**

**

  1. id'1'将'test1 test1_0 | test1 test0'与“这是一个test1test1_0"
  2. id'2'将'test2 test2_0 | test1 test0'与“ None”匹配

**

+---+-------------------------+-----------------------+---------+
|id |match                    |text1                  |match_str|
+---+-------------------------+-----------------------+---------+
|1  |test1 test1_0|test1 test0|This is a test1 test1_0|true     |
|2  |test2 test2_0|test1 test0|null                   |null     |
|3  |Nan                      |Nan                    |false    |
|4  |test4 test4_0|test1 test0|This is a test4 test4_0|false    |
+---+-------------------------+-----------------------+---------+

df.withColumn("match_str", df.text1.rlike(df.match)).show(truncate=False)

    Py4JError: An error occurred while calling o2165.rlike. Trace:
    py4j.Py4JException: Method rlike([class org.apache.spark.sql.Column]) does not exist

**预期结果:**

+---+-------------------------+-----------------------+---------+
|id |match                    |text1                  |match_str|
+---+-------------------------+-----------------------+---------+
|1  |test1 test1_0|test1 test0|This is a test1 test1_0|true     |
|2  |test2 test2_0|test1 test0|null                   |false    |
|3  |Nan                      |Nan                    |true     |
|4  |test4 test4_0|test1 test0|This is a test4 test4_0|true     |
+---+-------------------------+-----------------------+---------+
apache-spark apache-spark-sql pyspark-sql apache-spark-2.0 pyspark-dataframes
1个回答
0
投票

[pyspark.sql.Column.rlike()方法很遗憾仅采用text模式,没有其他列作为模式(您可以根据需要进行调整,但是可以使用udf-s)。

© www.soinside.com 2019 - 2024. All rights reserved.