“ rlike”函数中的pyspark数据帧如何从数据帧列之一逐行传递字符串值
[enter code here
运行时得到错误信息
df.withColumn("match_str", df.text1.rlike(df.match)).show(truncate=False)
Py4JError: An error occurred while calling o2165.rlike. Trace:
py4j.Py4JException: Method rlike([class org.apache.spark.sql.Column]) does not exist
any workaround or solution ?
df = spark.createDataFrame([
(1, 'test1 test1_0|test1 test0', 'This is a test1 test1_0'),
(2, 'test2 test2_0|test1 test0', None),
(3, 'Nan', 5.2, 23, 'Nan'),
(4, 'test4 test4_0|test1 test0', 'This is a test4 test4_0'),
], ['id', 'match', 'text1'])
+---+-------------------------+-----------------------+
|id |match |text1 |
+---+-------------------------+-----------------------+
|1 |test1 test1_0|test1 test0|This is a test1 test1_0|
|2 |test2 test2_0|test1 test0|null |
|3 |Nan |Nan |
|4 |test4 test4_0|test1 test0|This is a test4 test4_0|
+---+-------------------------+-----------------------+
root
|-- id: long (nullable = true)
|-- match: string (nullable = true)
|-- text1: string (nullable = true)
df.withColumn("match_str", df.text1.rlike(df.select(df.match).head()["match"])).show(truncate=False)
**注意:df.select(df.match).head()[“ match”]传递值第一行匹配,在这种情况下,将“ test1 test1_0 | test1 test0”匹配到所有行。我想逐行传递rlike值。像**
**
**
+---+-------------------------+-----------------------+---------+
|id |match |text1 |match_str|
+---+-------------------------+-----------------------+---------+
|1 |test1 test1_0|test1 test0|This is a test1 test1_0|true |
|2 |test2 test2_0|test1 test0|null |null |
|3 |Nan |Nan |false |
|4 |test4 test4_0|test1 test0|This is a test4 test4_0|false |
+---+-------------------------+-----------------------+---------+
df.withColumn("match_str", df.text1.rlike(df.match)).show(truncate=False)
Py4JError: An error occurred while calling o2165.rlike. Trace:
py4j.Py4JException: Method rlike([class org.apache.spark.sql.Column]) does not exist
**预期结果:**
+---+-------------------------+-----------------------+---------+
|id |match |text1 |match_str|
+---+-------------------------+-----------------------+---------+
|1 |test1 test1_0|test1 test0|This is a test1 test1_0|true |
|2 |test2 test2_0|test1 test0|null |false |
|3 |Nan |Nan |true |
|4 |test4 test4_0|test1 test0|This is a test4 test4_0|true |
+---+-------------------------+-----------------------+---------+
[pyspark.sql.Column.rlike()
方法很遗憾仅采用text
模式,没有其他列作为模式(您可以根据需要进行调整,但是可以使用udf-s
)。