我正在从一个PDF页面解析文本数据。我能够应用正则表达式(被称为stackoverflow解决方案之一)并仅获得我正在寻找的文本。问题是,在每次间歇迭代时将解析的字符串附加到StringBuilder会在SOPing时在控制台中显示数据,但不会显示最终值(该特定页面的整个文本数据)。我尝试了以下操作:
1)转到窗口>首选项>运行/调试>控制台取消选中“限制控制台输出”
2)尝试在文本文件中写入相同的字符串。
3)检查了显示3746的字符串的长度(这意味着其中有很多数据)。
package com.PDFReaderApp2;
private static ReentrantLock counterLock = new ReentrantLock(true);
private static Pattern PARAGRAPH = Pattern.compile("\\s*^\\s*$\\s*", Pattern.MULTILINE);
private static Pattern MULTISPACE = Pattern.compile("\\s+");
private static boolean flag = false;
private static BufferedWriter writer = null;
public static StringBuilder processString2(String args) {
StringBuilder builder = new StringBuilder();
String x = args;
x = compactLines(x);
Matcher m = Pattern.compile("\\((.*?)\\)").matcher(x);
while (m.find()) {
x = m.group(1);
builder = builder.append(x);
x = builder.toString();
// x= x.replaceAll(System.lineSeparator(), " ");
System.out.println("-->> " + x);
}
System.out.println("is x empty?: " + x.length());
x = builder.toString();
return builder;
}
public static String compactLines(String source) {
return Stream.of(PARAGRAPH.split(source)).map(para -> MULTISPACE.matcher(para).replaceAll(" "))
.collect(Collectors.joining("\n"));
}
public void readThisPage(int pageNum) throws IOException {
writer = new BufferedWriter(new FileWriter("C:\\Users\\bhard\\Desktop\\output.txt", true));
InputStream inputStream = document.getPages().get(pageNum).getContents();
String text = new String();
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8.name());
System.out.println(text);
StringBuilder mofo = processString2(text);
text = mofo.toString();
System.out.println("is text empty?: " + text.length());
writer.write(text);
System.out.println("text of " + pageNum + "\n " + mofo);
}
public static void main(String[] args) throws InvalidPasswordException, IOException {
ReadPDFFile2 pdf2 = new ReadPDFFile2();
pdf2.setFileInstance("E:\\E Books\\Novels and Story books\\Adler-Mortimer-How-To-Read-A-Book.pdf",
"Adler-Mortimer-How-To-Read-A-Book.pdf");
pdf2.readThisPage(3);
}
}
在writer上调用close()或flush()方法,它应该显示输出。确保清除资源。