如何使用Java/Apache POI获取文件摘要信息

问题描述 投票:0回答:3

我正在尝试使用 JAVA 从文件中获取摘要信息,但我找不到任何内容。我尝试使用

org.apache.poi.hpsf.*

我需要作者、主题、评论、关键词和标题。

       File rep = new File("C:\\Cry_ReportERP006.rpt");


        /* Read a test document <em>doc</em> into a POI filesystem. */
        final POIFSFileSystem poifs = new POIFSFileSystem(new FileInputStream(rep));
        final DirectoryEntry dir = poifs.getRoot();
        DocumentEntry dsiEntry = null;
        try
        {
            dsiEntry = (DocumentEntry) dir.getEntry(DocumentSummaryInformation.DEFAULT_STREAM_NAME);
        }
        catch (FileNotFoundException ex)
        {
            /*
             * A missing document summary information stream is not an error
             * and therefore silently ignored here.
             */
        }

        /*
         * If there is a document summry information stream, read it from
         * the POI filesystem.
         */
        if (dsiEntry != null)
        {
            final DocumentInputStream dis = new DocumentInputStream(dsiEntry);
            final PropertySet ps = new PropertySet(dis);
            final DocumentSummaryInformation dsi = new DocumentSummaryInformation(ps);
            final SummaryInformation si = new SummaryInformation(ps);


            /* Execute the get... methods. */
            System.out.println(si.getAuthor());
java apache-poi summary
3个回答
2
投票

正如 http://poi.apache.org/overview.html 的 POI 概述中所解释的,有更多类型的文件解析器。 以下示例从 2003 Office 文件中提取作者/创建者:

public static String parseOLE2FileAuthor(File file) {
    String author=null;
    try {

        FileInputStream stream = new FileInputStream(file);
        POIFSFileSystem poifs = new POIFSFileSystem(stream);
        DirectoryEntry dir = poifs.getRoot();
        DocumentEntry siEntry =      (DocumentEntry)dir.getEntry(SummaryInformation.DEFAULT_STREAM_NAME);
        DocumentInputStream dis = new DocumentInputStream(siEntry);
        PropertySet ps = new PropertySet(dis);
        SummaryInformation si = new SummaryInformation(ps);

        author=si.getAuthor();
        stream.close();
  
    } catch (IOException ex) {
        ex.getStackTrace();
    } catch (NoPropertySetStreamException ex) {
        ex.getStackTrace();
    } catch (MarkUnsupportedException ex) {
        ex.getStackTrace();
    } catch (UnexpectedPropertySetTypeException ex) {
        ex.getStackTrace();
    }
    return author;
}

对于 DOCX、PPTX 和 XSLX,POI 有专门的类。

.docx
文件示例:

public static String parseDOCX(File file){
    String author=null;
    FileInputStream stream;
    try {
        stream = new FileInputStream(file);
        XWPFDocument docx = new XWPFDocument(stream);
        CoreProperties props = docx.getProperties().getCoreProperties();
        author=props.getCreator();
        stream.close();
    } catch (FileNotFoundException ex) {
       ex.printStackTrace();
    } catch (IOException ex) {
        ex.printStackTrace();
    }
     

    return author;
}

对于 PPTX,请使用

XMLSlideShow
XMLWorkbook
代替
XMLDocument


1
投票

请在此处找到示例代码 - Appache POI 如何

简而言之,你可以成为一名听众

MyPOIFSReaderListener

    SummaryInformation si = (SummaryInformation)
             PropertySetFactory.create(event.getStream());
    String title = si.getTitle();
    String Author= si.getLastAuthor();
    ......

并将其注册为:

    POIFSReader r = new POIFSReader();
    r.registerListener(new MyPOIFSReaderListener(),
                   "\005SummaryInformation");
    r.read(new FileInputStream(filename));

0
投票

对于2003 Office文件,您可以使用从POIDocument继承的类。这是 doc 文件的示例:

FileInputStream in = new FileInputStream(file);
HWPFDocument doc = new HWPFDocument(in);
author = doc.getSummaryInformation().getAuthor();

和用于 ppt 的 HSLFSlideShowImpl,
HSSF 工作簿 xls,
vsd 的 HDGF 图。

SummaryInformation 类中还有许多其他文件信息。

2007年或以上的office文件,请参阅@Dragos Catalin Trieanu的回答

© www.soinside.com 2019 - 2024. All rights reserved.