我有一个 Athena 表,其中有一个
string
列,看起来像:
+-------------------+
| employee_size |
+-------------------+
| GREATER THAN 2000 |
+-------------------+
| 500 - 999 |
+-------------------+
| 28.00 |
+-------------------+
| unknown |
+-------------------+
| 563 |
+-------------------+
如果可能的话,我想将列值转换为
integer
,否则应该是null
。因此,所需的输出应如下所示:
+---------------+
| employee_size |
+---------------+
| |
+---------------+
| |
+---------------+
| 28 |
+---------------+
| |
+---------------+
| 563 |
+---------------+
我尝试使用我认为最接近的查询组合:
SELECT
CASE
WHEN employee_size LIKE '% %' THEN NULL
WHEN employee_size LIKE '%-%' THEN NULL
WHEN regexp_like(employee_size,'([A-Za-z]') THEN NULL
WHEN employee_size LIKE '%.%' THEN CAST(employee_size AS decimal)
ELSE CAST(employee_size AS integer)
END AS employee_size
FROM
"table_name";
但是这段代码会导致错误:
INVALID_FUNCTION_ARGUMENT:带有不匹配括号的结束模式
如果你们遇到过类似的事情,请提出解决方案。
编辑:我忘了提及,如果存在像
decimal
或28.00
这样的5.64
值,它应该忽略小数点后的任何内容,而只有28
或5
您可以尝试以下逻辑:
SELECT
CASE WHEN REGEXP_LIKE(employee_size, '^[0-9]+(\.[0-9]+)?$')
THEN REGEXP_EXTRACT(employee_size, '^[0-9]+') END AS employee_size
FROM yourTable;