我在 databricks 的 Spark 中有以下查询 -
SELECT bu.tenant_id,bu.service_location_id,bu.account_id,bu.commodity_type,bu.commodity_usage,bu.commodity_units,bu.charges,bu.billed_usage_start,bu.billed_usage_end,
CASE
WHEN al.industry_code_type IS NOT NULL AND al.industry_code IS NOT NULL AND al.industry_code_type= 'sic'
THEN (SELECT sics_name FROM dev_silver.trend_calculator_poc.mapping_business_type_codes_results
where sics_code=CAST(al.industry_code AS BIGINT) limit 1)
END
as business_type, bu.created AS created_at, bu.updated AS updated_at FROM dev_silver.trend_calculator_poc.billed_usage_temp bu,dev_silver.trend_calculator_poc.account_location_temp al
where al.tenant_id=bu.tenant_id AND al.service_location_id=bu.service_location_id and al.account_id=bu.account_id;
查询导致错误
Key not found: industry_code#15252
。如果删除 THEN 子句中子查询中的 where 条件,该错误就会消失。列 industry_code
存在于表 account_location_temp
中,并且拼写也正确。不知何故,它似乎与子查询内外部表别名 al 的访问有关,但我无法弄清楚。连接中也存在数据,因此绝对不是因为子查询无法获得任何结果。我是否正确访问子查询中的外表别名?
请建议如何解决这个问题
您没有在子查询中使用表引用作为 al。子查询必须是这样的:
SELECT sics_name FROM dev_silver.trend_calculator_poc.mapping_business_type_codes_results, dev_silver.trend_calculator_poc.account_location_temp al
WHERE sics_code = CAST(al.industry_code AS BIGINT) limit 1
这是因为子查询是一个全新的查询,所以你必须指定所有要使用的表