Pyspark替换DF列中的字符并转换为浮点数

问题描述 投票:0回答:2

在Pyspark上对此有任何想法吗?

我在“薪金”列中具有以下薪金。我试图删除$

df = df.withColumn('clean_salary', regexp_replace(col("Salary"), '$', ''))
df.show()

如您所见,它什么也没做-为什么有任何想法?

谢谢

+---+----------+----------+------+---------------+--------------------+---------+----------+-----------+------------+
| id|first_name| last_name|gender|           City|           Job Title|   Salary|  Latitude|  Longitude|clean_salary|
+---+----------+----------+------+---------------+--------------------+---------+----------+-----------+------------+
|  1|   Melinde| Shilburne|Female|      Nowa Ruda| Assistant Professor|$57438.18|50.5774075| 16.4967184|   $57438.18|
|  2|  Kimberly|Von Welden|Female|         Bulgan|       Programmer II|$62846.60|48.8231572|103.5218199|   $62846.60|
|  3|    Alvera|  Di Boldi|Female|           null|                null|$57576.52|39.9947462|116.3397725|   $57576.52|
|  4|   Shannon| O'Griffin|  Male|  Divnomorskoye|Budget/Accounting...|$61489.23|44.5047212| 38.1300171|   $61489.23|
|  5|  Sherwood|   Macieja|  Male|      Mytishchi|            VP Sales|$63863.09|      null| 37.6489954|   $63863.09|
|  6|     Maris|      Folk|Female|Kinsealy-Drinan|      Civil Engineer|$30101.16|53.4266145| -6.1644997|   $30101.16|
|  7|     Masha|    Divers|Female|         Dachun|                null|$25090.87| 24.879416| 118.930111|   $25090.87|
|  8|   Goddart|     Flear|  Male|      Trélissac|Desktop Support T...|$46116.36|45.1905186|  0.7423124|   $46116.36|
|  9|      Roth|O'Cannavan|  Male|         Heitan|VP Product Manage...|$73697.10| 32.027934| 106.657113|   $73697.10|
apache-spark pyspark pyspark-sql
2个回答
0
投票
尝试下面的regexp_replace代码

updatedDF = df.withColumn('clean_salary', regexp_replace(col("Salary"), "[\$]", "")) updatedDF.show()


0
投票
而不是正则表达式,只删除第一个字符会更容易(除非薪水列值不是那么简单)

>>> df = sc.parallelize([('$123',),('$873',)]).toDF(['salary']) >>> df.show() +------+ |salary| +------+ | $123| | $873| +------+ >>> df.select(df.salary.substr(2,100).alias('salary')).show() +------+ |salary| +------+ | 123| | 873| +------+

© www.soinside.com 2019 - 2024. All rights reserved.