从文本字段中提取所有 16 位帐号并为该帐号创建一个字段

问题描述 投票:0回答:1

我有一个包含长自由流动文本值的文本字段的数据集,我需要从该文本字段中识别并提取所有 16 位帐号,并从这些提取的帐号中创建一个列

我拥有的数据

    input acct_num   txt_field  ;
    DATALINES; 
    3435436     Payment issue reported 3456123789065322 to 0909876789432123 dated 9 mar 2024  
    7789976     Data declined and assigned to 7890512323454545  

我需要的数据

acct_num              txt_field                                                                       acct1               acct2
    3435436     Payment issue reported 3456123789065322 to 0909876789432123 dated 9 mar 2024   3456123789065322    0909876789432123
    7789976     Data declined and assigned to 7890512323454545                                 7890512323454545  

到目前为止,我已经使用了 Prxparse 和 prxmatch 函数,但是当您知道在文本字段中到底要查找什么内容时,这些函数才有效,这里我只是查找任何 16 位数字值

sql sas sas-macro
1个回答
0
投票

使用正则表达式,您走在正确的轨道上。使用

call prxnext()
迭代 16 位帐号的所有实例。正则表达式
\b\d{16}\b
会找到这些。

data want;
    set have;

    retain exprid;
   
    /* Generate an expression ID */
    if(_N_ = 1) then exprid = prxparse('/\b\d{16}\b/');

    /* Scan all of the text */
    stop = length(txt_field);

    /* Find the first value */
    call prxnext(exprid, 1, stop, txt_field, pos, len);

    /* Keep scanning until there are no more account numbers found */
    do while (pos > 0);
        acct_num_16 = substr(txt_field, pos, len);
        output;
        call prxnext(exprid, 1, stop, txt_field, pos, len);
    end;

    keep acct_num txt_field acct_num_16;
run;
acct_num    txt_field                      acct_num_16
3435436     Payment issue reported ...     3456123789065322
3435436     Payment issue reported ...     0909876789432123
7789976     Data declined and assigned ... 7890512323454545
© www.soinside.com 2019 - 2024. All rights reserved.