使用proc sql将具有迭代值的新列插入表中的每个唯一行

Question

我是否知道是否有技术将具有迭代值的新列插入表中的每个唯一行？

示例：

TABLE HAVE

ID  name
1   John
2   Matt
3   Pete

现在，我有一个计数器= 3，我想将每个计数器值最多3加到表HAVE中的每个唯一ID。

TABLE WANT

ID name count
1  John 1
1  John 2
1  John 3
2  Matt 1
2  Matt 2
2  Matt 3
3  Pete 1
3  Pete 2
3  Pete 3

我可以使用数据步骤结合使用by和first.var：

data want;
  set have;
    by ID;
  if first.ID then do;
    do i = 1 to count;
      output;
    end;
  end;
run;

我的主要问题是运行时，数据步骤按顺序处理数据集，可能需要一些时间才能运行。我想知道是否可以使用proc sql完成此操作？

Answer 1

使用内置功能无法特别轻松地完成proc sql。一种解决方案是，如果您有某种理货或编号表。然后，您可以做：

select id, t.name, n.n
from t join
     numbers n
     on n.n <= :counter;

实际上，如果您的ID是连续的且没有间隔（如您的示例），则可以使用自连接：

select t.id, t.name, n.id as count
from t join
     t n
     on n.id <= :counter;

如果知道特定值，则可以构建union all查询：

select id, name, 1 as count from t
union all
select id, name, 2 as count from t
union all
select id, name, 3 as count from t;

现代SQL现在具有简化此过程的结构（例如，窗口函数和递归CTE）。但是，这些不能直接在proc sql中使用。

Answer 2

结果集是一个外部联接，如果有N行全部不同，它将包含N ²行。

示例：

SASHELP.CLASS具有19个不同的行，并且每行将具有18个重复项，从而导致19 ** 2或361行。

一个助手查询仅创建一个count值的助手表（我称它们为index）

data class;
  set sashelp.class;
run;

proc sql; 
* monotonic() trusted by Richard for this create/select only ;
* commented out for fear of mono (pun intended);
* create table indexes as 
  select index from
  ( select distinct *, monotonic() as index from class);

  * one mark per distinct row;
  create table distinct_marks(keep=mark) as
  select distinct *, 1 as mark from class;

* create table of traditionally computed monotonic indexes;
data indexes(keep=index);
  set distinct_marks;
  index + 1;
run;

proc sql;
  create table want as
  select 
    self.*, 
    each.index 
  from 
    class as self 
  cross join 
    indexes as each
  ;
quit;

将以上内容与您的原始方法进行比较

proc sql noprint;
  select count (*) into :count trimmed
  from 
  ( select distinct * from class );
quit;

data want;
  set class;
  do _n_ = 1 to &count;
    output;
  end;
run;```

Regardless of what approach you choose, OUTER JOINS can get BIG QUICK, and thus cause lots of time consuming disk i/o.

使用proc sql将具有迭代值的新列插入表中的每个唯一行

问题描述投票：0回答：2

2个回答

最新问题

使用proc sql将具有迭代值的新列插入表中的每个唯一行

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2