如何使用python捕获XML文件中另一个标签下的标签

问题描述 投票:0回答:0
        myroot = xmltree.getroot()

        for i in myroot.iter('record'):
            for j in i.iter(): 
                if(j.tag not in ignoretags):
                    if file_header_print:
                        headstr = "{}:{}".format(headstr,j.tag)
                    datastr = "{}:{}".format(datastr,j.text)

            if file_header_print:
                file_header_print = False
                logger.info(headstr)
            logger.info(datastr)

logger.info("Audit Report capture process end")

我的主代码中的上述代码捕获了xml文件中的标头标记,并将其打印在我的日志报告中。我在下面给出了示例日志报告。

2023-03-10 14:36:34,211 : INFO : audreport : Client:SeqNo:Filename:organization:case_id:invoice_number:supplier_number:supplier_name:posting_date:currency_code:gross_amount:tax_amount:net_amount:order_number:invoice_source:invoice_capture_date:document_type:data_capture_provider_code:data_capture_provider_reference:document_capture_provide_code:document_capture_provider_ref:data_capture_issue:from_email:to_email:box_number:data_captured:data_container:pdf_file_name
2023-03-10 14:36:34,227 : INFO : audreport : AD:8:ADI_08f08153-e4d9-4c06-97f9-5894153121fa_2340050.xml:1010:2340050:4122553691:0000165881:CINTAS CORPORATION:2022-06-15:USD:30.37:0.00:30.37:45830626:Email:2022-12-08:INVOICE:00001:19577:00002:19577:None:[email protected]:None:None:None:None:ADI_08f08153-e4d9-4c06-97f9-5894153121fa_2340050.pdf
2023-03-10 14:36:34,227 : INFO : audreport : AD:9:ADI_08f08153-e4d9-4c06-97f9-5894153121fa_2340065.xml:1014:2340065:1350132720:0000125498:HUBER SUHNER INC:2022-09-14:USD:159.20:0.00:159.20:45963685:Email:2022-12-08:INVOICE:00001:19592:00002:19592:None:[email protected]:None:None:None:None:ADI_08f08153-e4d9-4c06-97f9-5894153121fa_2340065.pdf
2023-03-10 14:36:34,227 : INFO : audreport : AD:10:ADI_08f08153-e4d9-4c06-97f9-5894153121fa_2340067.xml:1014:2340067:1350135380:0000125498:HUBER SUHNER INC:2022-10-17:USD:455.82:0.00:455.82:45970392:Email:2022-12-08:INVOICE:00001:19594:00002:19594:None:[email protected]:None:None:None:None:ADI_08f08153-e4d9-4c06-97f9-5894153121fa_2340067.pdf
2023-03-10 14:36:34,227 : INFO : audreport : AD:11:rowpresent.xml:1020:2296920:129729:0000155186:HIGH PURITY PRODUCTS INC:2022-10-31:USD:12909.00:0.00:12909.00:45907133:Email:2022-11-01:INVOICE:00001:588529:00002:588529:None:[email protected]:None:None:None:None:ADI_875a41cf-079e-4ca8-a1cc-9d945c711bf0_2296920.pdf:1:45907133:65-6610:None:None:482.0000:EA:3.00:1446.00:2:45907133:65-6513:None:None:68.0000:EA:1.00:68.00:3:45907133:65-6507:None:None:64.0000:EA:6.00:384.00:4:45907133:65-6506:None:None:73.0000:EA:6.00:438.00:5:45907133:65-6509M:None:None:64.0000:EA:27.00:1728.00:6:45907133:65-6611:None:None:472.0000:EA:3.00:1416.00:7:45907133:65-6512:None:None:64.0000:EA:27.00:1728.00:8:45907133:65-6575:None:None:92.0000:EA:3.00:276.00:9:45907133:65-6502:None:None:184.0000:EA:1.00:184.00:10:45907133:65-6607:None:None:266.0000:EA:1.00:266.00:11:45907133:65-6508:None:None:78.0000:EA:27.00:2106.00:12:45907133:65-6508T:None:None:2734.0000:EA:1.00:2734.00:13:45907133:FREIGHT CHARGES:None:None:135.0000:EA:1.00:135.00

上面代码的第一行是有值的标签,下面是每个xml文件的值。我在下面给出了我的示例 xml 文件,我们可以在其中看到具有值的标签。

organization
case_id
invoice_number
等是我在上面给出的日志文件顶部的那个。

<?xml version="1.0" encoding="UTF-8"?>
<objects>
   <object>
      <record>
            <organization>1020</organization>
            <case_id>2296919</case_id>
            <invoice_number>1007173604</invoice_number>
            <supplier_number>0000155186</supplier_number>
            <supplier_name>HIGH PURITY PRODUCTS INC</supplier_name>
            <posting_date>2022-10-28</posting_date>
            <currency_code>USD</currency_code>
            <gross_amount>21325.00</gross_amount>
            <tax_amount>0.00</tax_amount>
            <net_amount>21325.00</net_amount>
            <order_number>45907133</order_number>
            <invoice_source>Email</invoice_source>
            <invoice_capture_date>2022-11-01</invoice_capture_date>
            <document_type>INVOICE</document_type>
            <data_capture_provider_code>00001</data_capture_provider_code>
            <data_capture_provider_reference>588528</data_capture_provider_reference>
            <document_capture_provide_code>00002</document_capture_provide_code>
            <document_capture_provider_ref>588528</document_capture_provider_ref>
            <data_capture_issue/>
            <from_email>[email protected]</from_email>
            <to_email/>
            <box_number/>
            <data_captured/>
            <data_container/>
            <pdf_file_name>ADI_875a41cf-079e-4ca8-a1cc-9d945c711bf0_2296919.pdf</pdf_file_name>
            <rows>
                <row>
                    <row_number>1</row_number>
                    <order_number>45907133</order_number>
                    <product_code>65-6554</product_code>
                    <contract_number></contract_number>
                    <bill_of_lading></bill_of_lading>
                    <unit_price>316.0000</unit_price>
                    <unit>EA</unit>
                    <quantity>27.00</quantity>
                    <amount>8532.00</amount>
                </row>
                <row>
                    <row_number>13</row_number>
                    <order_number>45907133</order_number>
                    <product_code>FREIGHT CHARGES</product_code>
                    <contract_number></contract_number>
                    <bill_of_lading></bill_of_lading>
                    <unit_price>135.0000</unit_price>
                    <unit>EA</unit>
                    <quantity>1.00</quantity>
                    <amount>135.00</amount>
                </row>              
            </rows>
        </record>
    </object>
</objects>

现在我的问题是

record
标签内的标签只在我的日志中捕获。可以看到xml文件中有一个
row
标签。那些
row_number
order_number
product_code
等未在日志文件的标题中捕获。但价值被捕获。我想修改问题顶部的上述代码,以捕获
row
标签内的标签以及日志文件中的标题。

预期结果:

2023-03-10 14:36:34,211 : INFO : audreport : Client:SeqNo:Filename:organization:case_id:invoice_number:supplier_number:supplier_name:posting_date:currency_code:gross_amount:tax_amount:net_amount:order_number:invoice_source:invoice_capture_date:document_type:data_capture_provider_code:data_capture_provider_reference:document_capture_provide_code:document_capture_provider_ref:data_capture_issue:from_email:to_email:box_number:data_captured:data_container:pdf_file_name:row_number:order_number:product_code:contract_number:bill_of_loadingunit_price:unit:quantity:amount
2023-03-10 14:36:34,227 : INFO : audreport : AD:11:rowpresent.xml:1020:2296920:129729:0000155186:HIGH PURITY PRODUCTS INC:2022-10-31:USD:12909.00:0.00:12909.00:45907133:Email:2022-11-01:INVOICE:00001:588529:00002:588529:None:[email protected]:None:None:None:None:ADI_875a41cf-079e-4ca8-a1cc-9d945c711bf0_2296920.pdf:1:45907133:65-6610:None:None:482.0000:EA:3.00:1446.00:2:45907133:65-6513:None:None:68.0000:EA:1.00:68.00:3:45907133:65-6507:None:None:64.0000:EA:6.00:384.00:4:45907133:65-6506:None:None:73.0000:EA:6.00:438.00:5:45907133:65-6509M:None:None:64.0000:EA:27.00:1728.00:6:45907133:65-6611:None:None:472.0000:EA:3.00:1416.00:7:45907133:65-6512:None:None:64.0000:EA:27.00:1728.00:8:45907133:65-6575:None:None:92.0000:EA:3.00:276.00:9:45907133:65-6502:None:None:184.0000:EA:1.00:184.00:10:45907133:65-6607:None:None:266.0000:EA:1.00:266.00:11:45907133:65-6508:None:None:78.0000:EA:27.00:2106.00:12:45907133:65-6508T:None:None:2734.0000:EA:1.00:2734.00:13:45907133:FREIGHT CHARGES:None:None:135.0000:EA:1.00:135.00
2023-03-10 14:36:34,227 : INFO : audreport : AD:11:rowpresent.xml:1020:2296920:129729:0000155186:HIGH PURITY PRODUCTS INC:2022-10-31:USD:12909.00:0.00:12909.00:45907133:Email:2022-11-01:INVOICE:00001:588529:00002:588529:None:[email protected]:None:None:None:None:ADI_875a41cf-079e-4ca8-a1cc-9d945c711bf0_2296920.pdf:1:45907133:65-6610:None:None:482.0000:EA:3.00:1446.00:2:45907133:65-6513:None:None:68.0000:EA:1.00:68.00:3:45907133:65-6507:None:None:64.0000:EA:6.00:384.00:4:45907133:65-6506:None:None:73.0000:EA:6.00:438.00:5:45907133:65-6509M:None:None:64.0000:EA:27.00:1728.00:6:45907133:65-6611:None:None:472.0000:EA:3.00:1416.00:7:45907133:65-6512:None:None:64.0000:EA:27.00:1728.00:8:45907133:65-6575:None:None:92.0000:EA:3.00:276.00:9:45907133:65-6502:None:None:184.0000:EA:1.00:184.00:10:45907133:65-6607:None:None:266.0000:EA:1.00:266.00:11:45907133:65-6508:None:None:78.0000:EA:27.00:2106.00:12:45907133:65-6508T:None:None:2734.0000:EA:1.00:2734.00:13:45907133:FREIGHT CHARGES:None:None:135.0000:EA:1.00:135.00

捕获

row
标签中的值没有问题。唯一的问题是
row
标签的标题没有被打印出来。我该如何解决?

NP:日志中的header表示:

2023-03-10 14:36:34,211 : INFO : audreport : Client:SeqNo:Filename:organization:case_id:invoice_number:supplier_number:supplier_name:posting_date:currency_code:gross_amount:tax_amount:net_amount:order_number:invoice_source:invoice_capture_date:document_type:data_capture_provider_code:data_capture_provider_reference:document_capture_provide_code:document_capture_provider_ref:data_capture_issue:from_email:to_email:box_number:data_captured:data_container:pdf_file_name
python xml logging tags
© www.soinside.com 2019 - 2024. All rights reserved.