从多行文本文件中提取多个字符串 - 更大的测试文件

问题描述 投票:0回答:1

如何使用这些规则从多行文本文件中提取多个字符串? 搜索字符串为“String server”、“pac”和“String method”。 它们在封闭的“{}”中可能只出现一次,也可能不出现一次。 搜索字符串匹配后,提取“”内不含“()”的值。 搜索字符串“String server”或“pac”的值仅出现一次 - 不重复。 它的值将出现在搜索字符串“String method”的值之前。 例如示例文本文件 在:

public AResponse retrieveA(ARequest req){
    String server = "AAA";
    String method =  "retrieveA()";
    log.info(method,
            server,
            req);
    return res;
}

public BResponse retrieveB(BRequest req){
    String method =  "retrieveB()";
    BBB pac = new BBB();
    log.info(method,
            pac,
            req);
    return res;
}

public CResponse retrieveC(CRequest req) {
    String server = "CCC";
    log.info(server,
            req);
    return res;
}

public DResponse retrieveD(DRequest req) {
    String method = "retrieveD()";
    log.info(method,req);
    return res;
}

public EResponse retrieveE(ERequest req){
    EEE pac = new EEE();
    String method =  "retrieveE()";
    String server = "EEE";
    log.info(method,
            server,
            pac,
            req);
    return res;
}

public FResponse callretrieveF(FRequest req) throws InvalidDataException {
        String server = "FFFFF";
        //retrieveF
        String method =  "retrieveF()";
        try {
            log.info(method,
                     server,
                     req);

            FFFFF pac = new FFFFF();
        }
}

/**
 * callgetG
* getG
*/
public GResponse callgetG(GRequest req) throws InvalidDataException {
        //getG
        String method =  "getG()";
        String server = "GGGGGG";
        try {
            try {
                GGGGGG pac = new GGGGGG();
                log.info(method,
                     server,
                     req);
            }
        }
}

/**
 * getH
*/
    public HResponse getH(HRequest req) 
                                throws InvalidDataException {

        //getH
        String method =  "getH()";
        String server = "HHHHHHH";
        String calledMethod =  "getH2()";

        ARequest aReq = new ARequest(req.getH(),
                                     req.getR());
        ProgramAccountInformationResponse resp = null;
        try {
            log.info(LogMessages.msgInfoMethodStartPrivate(method,
                                                           server,
                                                           calledMethod,
                                                           req));
            return resp;
        }catch(InvalidDataException ide){
            log.error(method);
            throw ide;
        }
    }

}}}}}

预期输出:

AAA retrieveA
BBB retrieveB
CCC 
retrieveD
EEE retrieveE
FFFFF retrieveF
GGGGGG getG
HHHHHHH getH

我尝试了以下解决方案: 从多行文本文件中提取多个字符串

awk -v OFS='\t' -F= '
/\{[[:blank:]]*$/ {++n}
NF==2 && /String | pac/ {
   gsub(/^[[:blank:]]*("|new +)|[()";]+$/, "", $2)
   if ($1 ~ / (server|pac)/)
      col1[n] = $2
   else if ($1 ~ / method/)
      col2[n] = $2
}
END {
   for (i=1; i<=n; ++i)
      print col1[i], col2[i]
}' in
awk
1个回答
0
投票

TXR 解决方案进行微小更改即可处理此问题:

$ txr extract2.txr longer-data
AAA retrieveA
BBB retrieveB
CCC
retrieveD
EEE retrieveE
FFFFF retrieveF
GGGGGG getG
HHHHHHH getH

代码:

@(repeat)
@(freeform 2)
@/ */public@nil{
@  (gather :vars ((server nil) (meth nil) (pac nil)))
 String server = "@server";
 String method = "@meth()";
 @pac pac = new @pac();
@  (until)
@/ */}
@  (end)
@  (do
     (put-line
       (cond
         ((and server meth) `@server @meth`)
         ((and meth pac) `@pac @meth`)
         (server)
         (meth))))
@(end)

一个关键细节是

@(freeform 2)
,它指示 TXR 将接下来的两行视为一行(嵌入
\n
字符),然后匹配它们。任何不匹配的材料都会被分成多行并推回到输入中。这可以轻松处理分成两行的
public ...
函数头。我们识别出一些可选空格,因为第一列中没有出现
public
,而且第一列中也没有出现右大括号。

© www.soinside.com 2019 - 2024. All rights reserved.