这是先前问题的扩展。 (Drop observations once condition is met by multiple variables)。
我有以下数据,并使用现有的已回答问题之一来解决我的数据问题,但无法获得我想要的。这是我的数据中的内容
有:
id Date Evt_Type Flag Amt1 Amt2
101 2/2/2019 Fee 5
101 2/3/2019 REF1 Y 5
101 2/4/2019 Fee 10
101 2/6/2019 REF2 Y 10
101 2/7/2019 Fee 4
101 2/8/2019 REF1
102 2/2/2019 Fee 25
102 2/2/2019 REF1 N 25
103 2/3/2019 Fee 10
103 2/4/2019 REF1 Y 10
103 2/5/2019 Fee 10
想要:
id Date Evt_Type Flag Amt1 Amt2
101 2/2/2019 Fee 5
101 2/3/2019 REF1 Y 5
101 2/4/2019 Fee 10
101 2/6/2019 REF2 Y 10
101 2/7/2019 Fee 4
101 2/8/2019 REF1
102 2/2/2019 Fee 25
102 2/2/2019 REF1 N 25
103 2/3/2019 Fee 10
103 2/4/2019 REF1 Y 10
我尝试了以下内容
data want;
_max_n_with_Y = 1e12;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if flag='Y' then _max_n_with_Y = _n_;
end;
do _n_ = 1 to _n_;
set have;
if _n_ <= _max_n_with_Y then OUTPUT;
end;
drop _:;
run;
感谢您的任何帮助。
谢谢
重要的“地标”是带有flag='Y'
的行
输出行的额外条件后置地标使正在被编码以跟踪(或计算)组最后输出的行号(_n_
)的状态机复杂化。
row='Y'
状态很容易知道。无条件使用LAG
可用于检查Y后状态。 SAS IF
语句不具有短路评估功能,因此,只要LAG
不在从属THEN
子句中,LAG
堆栈将适合该任务。
示例:
data have;
attrib
id format=4.
date informat=mmddyy10. format=mmddyy10.
evt_type length=$4
flag length=$1
amt1 amt2 format=4.
;
input
id Date Evt_Type Flag Amt1 Amt2; datalines;
101 2/2/2019 Fee . 5 .
101 2/3/2019 REF1 Y . 5
101 2/4/2019 Fee . 10 .
101 2/6/2019 REF2 Y . 10
101 2/7/2019 Fee . 4 .
101 2/8/2019 REF1 . . .
102 2/2/2019 Fee . 25 .
102 2/2/2019 REF1 N 25 .
103 2/3/2019 Fee . 10 .
103 2/4/2019 REF1 Y . 10
103 2/5/2019 Fee . 10 .
;
data want;
_y_n = 1e12;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if flag='Y' then _y_n = _n_;
/* rule: post Y output of two rows should only occur once, and at the rows
* immediately succeeding the Y row
*/
if _n_ = _y_n + 2 /* is this row 2 after a Y */
and lag(evt_type) = 'Fee' /* is first row after Y Fee */
and evt_type =: 'REF' /* is second row after Y REF# */
then
_upto_n = _n_;
end;
_upto_n = max (_upto_n, _y_n);
do _n_ = 1 to _n_;
set have;
if _n_ <= _upto_n then OUTPUT;
end;
drop _:;
run;
注意,关于:
if _n_ = _y_n + 2 /* is this row 2 after a Y */
and lag(evt_type) = 'Fee' /* is first row after Y Fee */
and evt_type =: 'REF' /* is second row after Y REF# */
then
_upto_n = _n_;
对于Y之后的第2行,
LAG2(<var>) is the <var> value from the Y row
LAG (<var>) is the <var> value from the Y row+1
<var> is the <var> value from the Y row+2, which is the current row