如何使用这些规则从多行文本文件中提取多个字符串? 搜索字符串为“String server”、“pac”和“String method”。 它们在封闭的“{}”中可能只出现一次,也可能不出现一次。 搜索字符串匹配后,提取“”内不含“()”的值。 搜索字符串“String server”或“pac”的值仅出现一次 - 不重复。 它的值将出现在搜索字符串“String method”的值之前。 例如示例文本文件 在:
public AResponse retrieveA(ARequest req){
String server = "AAA";
String method = "retrieveA()";
log.info(method,
server,
req);
return res;
}
public BResponse retrieveB(BRequest req){
String method = "retrieveB()";
BBB pac = new BBB();
log.info(method,
pac,
req);
return res;
}
public CResponse retrieveC(CRequest req) {
String server = "CCC";
log.info(server,
req);
return res;
}
public DResponse retrieveD(DRequest req) {
String method = "retrieveD()";
log.info(method,req);
return res;
}
public EResponse retrieveE(ERequest req){
EEE pac = new EEE();
String method = "retrieveE()";
String server = "EEE";
log.info(method,
server,
pac,
req);
return res;
}
public FResponse callretrieveF(FRequest req) throws InvalidDataException {
String server = "FFFFF";
//retrieveF
String method = "retrieveF()";
try {
log.info(method,
server,
req);
FFFFF pac = new FFFFF();
}
}
/**
* callgetG
* getG
*/
public GResponse callgetG(GRequest req) throws InvalidDataException {
//getG
String method = "getG()";
String server = "GGGGGG";
try {
try {
GGGGGG pac = new GGGGGG();
log.info(method,
server,
req);
}
}
}
/**
* getH
*/
public HResponse getH(HRequest req)
throws InvalidDataException {
//getH
String method = "getH()";
String server = "HHHHHHH";
String calledMethod = "getH2()";
ARequest aReq = new ARequest(req.getH(),
req.getR());
ProgramAccountInformationResponse resp = null;
try {
log.info(LogMessages.msgInfoMethodStartPrivate(method,
server,
calledMethod,
req));
return resp;
}catch(InvalidDataException ide){
log.error(method);
throw ide;
}
}
}}}}}
预期输出:
AAA retrieveA
BBB retrieveB
CCC
retrieveD
EEE retrieveE
FFFFF retrieveF
GGGGGG getG
HHHHHHH getH
我尝试了以下解决方案: 从多行文本文件中提取多个字符串
awk -v OFS='\t' -F= '
/\{[[:blank:]]*$/ {++n}
NF==2 && /String | pac/ {
gsub(/^[[:blank:]]*("|new +)|[()";]+$/, "", $2)
if ($1 ~ / (server|pac)/)
col1[n] = $2
else if ($1 ~ / method/)
col2[n] = $2
}
END {
for (i=1; i<=n; ++i)
print col1[i], col2[i]
}' in
对 TXR 解决方案进行微小更改即可处理此问题:
$ txr extract2.txr longer-data
AAA retrieveA
BBB retrieveB
CCC
retrieveD
EEE retrieveE
FFFFF retrieveF
GGGGGG getG
HHHHHHH getH
代码:
@(repeat)
@(freeform 2)
@/ */public@nil{
@ (gather :vars ((server nil) (meth nil) (pac nil)))
String server = "@server";
String method = "@meth()";
@pac pac = new @pac();
@ (until)
@/ */}
@ (end)
@ (do
(put-line
(cond
((and server meth) `@server @meth`)
((and meth pac) `@pac @meth`)
(server)
(meth))))
@(end)
一个关键细节是
@(freeform 2)
,它指示 TXR 将接下来的两行视为一行(嵌入 \n
字符),然后匹配它们。任何不匹配的材料都会被分成多行并推回到输入中。这可以轻松处理分成两行的 public ...
函数头。我们识别出一些可选空格,因为第一列中没有出现 public
,而且第一列中也没有出现右大括号。