正在寻找一种更有效的方法来基于日期和SAS中的分组变量来重组数据

问题描述 投票:0回答:1

原始数据:

subject medgrp  stdt        endt
1       A       7/1/2014    7/31/2014
1       A       7/29/2014   8/30/2014
1       B       7/1/2014    8/15/2014
1       C       8/1/2014    9/1/2014
2       A       4/15/2014   5/15/2014
2       A       5/10/2014   6/10/2014
2       A       6/5/2014    6/15/2014
2       A       7/1/2014    8/1/2014
3       A       6/5/2014    6/15/2014
3       A       6/16/2014   8/1/2014

重组数据:

subject med_pattern stdt_new    endt_new
1       A*B         7/1/2014    7/31/2014
1       A*B*C       8/1/2014    8/15/2014
1       A*C         8/16/2014   8/30/2014
1       C           8/31/2014   9/1/2014
2       A           4/15/2014   6/15/2014
2       A           7/1/2014    8/1/2014
3       A           6/5/2014    8/1/2014

通过将所有记录的stdt输出到endt,然后将每个subject/medgrp保留一个日期,重新设置日期周期并创建变量med_pattern,我能够将原始数据转换为重组数据。

但是,此方法需要很长时间才能运行,尤其是对于大数据(> 3m条记录)。

任何使它更有效的建议将不胜感激!

date sas retain
1个回答
0
投票

通过subject,您可以使用日期键控多数据散列来跟踪medgrpstdt定义的日期范围内每个日期的endt活动。哈希的迭代将使您计算medgrps crossings值。

data have; input 
subject medgrp $ stdt: mmddyy8. endt: mmddyy8.; format stdt endt mmddyy10.;
datalines;
1       A       7/1/2014    7/31/2014
1       A       7/29/2014   8/30/2014
1       B       7/1/2014    8/15/2014
1       A       7/15/2014   7/15/2014
1       C       8/1/2014    9/1/2014
2       A       4/15/2014   5/15/2014
2       A       5/10/2014   6/10/2014
2       A       6/5/2014    6/15/2014
2       A       7/1/2014    8/1/2014
3       A       6/5/2014    6/15/2014
3       A       6/16/2014   8/1/2014
;

data crossings_by_date / view=crossings_by_date;
  if 0 then set have; * prep PDV;

  if _n_ then do;    
    declare hash dg(multidata:'yes', ordered:'a');         %* 1st hash for subject dates;
    dg.defineKey('date');
    dg.defineData('date', 'medgrp');
    dg.defineDone();
    call missing (date); format date adate cdate mmddyy10.;

    declare hash crossing(ordered:'a');                    %* 2nd hash for deduping a list of medgrps ;
    crossing.defineKey('medgrp');
    crossing.defineData('medgrp');
    crossing.defineDone();

    declare hiter dgi('dg');
    declare hiter xi('crossing');
  end;

  dg.clear();

  do _n_ = 1 by 1 until (last.subject);  * process subjects one by one;
    set have;
    by subject;
    do date = stdt to endt; * load multidata hash with medgrp over date range;
      dg.add();
    end;
  end;

  * examine each date in which subject had activity; 
  adate = .;
  cdate = -1e9;
  do _i_ = 1 by 1 while (dgi.next() = 0);
    if date eq adate 
      then continue;          * hiter over multi-data will return each node;
      else adate = date;      * track activity date;

    * load hash to dedupe tracking of medgrp on date;
    crossing.clear();
    do _i_ = 1 by 1 while (dg.do_over() = 0);
      crossing.replace();
    end;

    * compute crossing representation on date, A*B*... by traversing 2nd hash;
    xi.first();      length cross $20;
    cross = medgrp;
    do while(0 = xi.next());
      cross = catx('*',cross,medgrp);
    end;

    if date - cdate > 1 then cluster + 1;    %* track cluster based on date continuities;
    cdate = date;

    output;  * <------------ view OUTPUT;
  end;

  keep subject date cross cluster;
run;

* 2nd data step processes view (1st data step);
* determine when date continuity ends or medgrp changes;

data want;
  length subject 8 medgrps $20;
  format stdt endt mmddyy10.;

  do _n_ = 1 by 1 until (last.medgrps);
    set crossings_by_date (rename=cross=medgrps);
    by cluster medgrps notsorted;

    if stdt = . then 
      stdt = date;
  end;

  endt = date;

  keep subject medgrps stdt endt; 
run;
© www.soinside.com 2019 - 2024. All rights reserved.