我构建了一个 CTE,并在子查询中使用“SELECT DISTINCT”。此查询失败并给出错误。但是,当我不使用不同的命令而仅使用 SELECT 时,查询会成功运行。我试图弄清楚为什么使用 SELECT DISTINCT 时会失败。为什么在这种情况下我不能使用 select unique 。该查询是从我正在调试的 R 包中自动生成的。
SQL 查询
with tab as (
SELECT coh.cohort_definition_id, pr.person_id,
'ADT' as codeset_tag, pr.procedure_date as drug_exposure_start_date,
coh.cohort_start_date,
coh.cohort_end_date
FROM truven_ccmr_claims_actual_omop.PROCEDURE_OCCURRENCE pr
JOIN sandbox_truven.PIONEER2023_US_MarketScan_stg coh
ON pr.person_id = coh.subject_id
WHERE procedure_concept_id in (
4012324, 4304921, 4073141, 4071936, 4073142,
4073143, 2103796, 2109975, 2109976,
4512827, 4314682, 4286887, 4341536, 4145907)
LIMIT 10
)
SELECT distinct *
FROM tab
WHERE cohort_end_date >= drug_exposure_start_date
AND cohort_start_date <= drug_exposure_start_date limit 10;
错误
An error occurred when executing the SQL command:
with tab as (SELECT coh.cohort_definition_id, pr.person_id,
'ADT' as codeset_tag, pr.procedure_date as drug_exposure_start_date,
coh...
[Amazon](500310) Invalid operation: failed to find conversion function from
"unknown" to text; [SQL State=XX000, DB Errorcode=500310]
1 statement failed.
为了立即修复,这里有一个应该可以工作的版本:
SELECT DISTINCT
cohort_definition_id,
person_id,
codeset_tag,
drug_exposure_start_date,
cohort_start_date,
cohort_end_date
FROM tab
WHERE cohort_end_date >= drug_exposure_start_date AND
cohort_start_date <= drug_exposure_start_date
-- ORDER BY <one or more columns>
LIMIT 10;
Redshift 似乎不喜欢
SELECT *
与 DISTINCT
混合,尽管作为最佳实践,您应该列出其组合应该不同的所有列。另请注意,使用 LIMIT
而不使用 ORDER BY
是相当没有意义的。