比较子元素的XML无视秩序

问题描述 投票:16回答:11

是否有人知道一个工具,将比较两个XML文档。贝莱说嘲弄......还有更多。我需要的东西,将确保在文件1的每个节点也是在文件2,无论顺序。我认为XML间谍将与子节点选项的顺序忽略做到这一点,但事实并非如此。以下将被认为是相同的:

<Node>
    <Child name="Alpha"/>
    <Child name="Beta"/>
    <Child name="Charlie"/>
</Node>

<Node>
    <Child name="Beta"/>
    <Child name="Charlie"/>
    <Child name="Alpha"/>
</Node>
xml compare
11个回答
3
投票

你可能想谷歌的“XML diff tool”,这将给你足够有余的结果。其中之一是OxygenXml,一个工具,我经常使用。您也可以尝试微软XML Diff and Patch Tool

祝好运。


-1
投票

作为一种(非常)快速和肮脏的方法,我在紧要关头做到了这一点:

  1. 打开Excel
  2. 糊文件1到塔A中,每行一个行。命名范围“FILE1”
  3. 糊文件2到B列,每行一个行。命名范围“FILE2”
  4. 在C1,输入公式: =IF(ISERROR(VLOOKUP(B1,FILE1,1,FALSE)),"DIFF","")
  5. 在D1,输入公式: qazxsw POI
  6. 向下填充柱C和d到文件的底部。

这将突出其出现在一个文件而不是其他文件中的任何行。这不是整齐任何延伸,但有时你不得不与你有什么工作。


-1
投票
=IF(ISERROR(VLOOKUP(A1,FILE2,1,FALSE)),"DIFF","")

8
投票

我写了这个叫xmldiffs一个简单的Python的工具:

比较两个XML文件,忽略元素和属性的顺序。

用法:xmldiffs [OPTION] FILE1 FILE2

任何额外的选项传递给diff命令。

https://github.com/joh/xmldiffs得到它


3
投票

随着Beyond Compare你可以在File Formats-设置的XML Sort转换使用。使用此选项的XML孩子的差异之前进行排序。

超越的试用/便携版本比较是available


2
投票

我会用XMLUnit这个,因为它可以迎合不同的顺序是元素。


1
投票

我今天晚上也有类似的需求,并不能找到适合我的要求的东西。

我的解决方法是,我想DIFF两个XML文件进行排序,由该元素的名称按字母顺序排序。一旦他们在一个一致的顺序都是,我可以使用普通可视化差异工具diff的两个排序的文件。

如果这种方法听起来别人有用的,我与他们分享的Python脚本我写信给做在http://dalelane.co.uk/blog/?p=3225排序


0
投票

我最近给这里一个类似的答案(Open source command line tool for Linux to diff XML files ignoring element order),但我会提供更多详细信息...

如果你写一个程序,两棵树一起走,你可以自定义逻辑识别树木之间的“匹配”,同时也用于处理不匹配的节点。这里是XSLT 2.0的例子(抱歉它是如此之长):

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"

                xmlns:set="http://exslt.org/sets"

                xmlns:primary="primary"
                xmlns:control="control"

                xmlns:util="util"

                exclude-result-prefixes="xsl xs set primary control">

    <!-- xml diff tool

         import this stylesheet from another and call the "compare" template with two args:

             primary: the root of the primary tree to submit to comparison
             control: the root of the control tree to compare against

         the two trees will be walked together. the primary tree will be walked in document order, matching elements
         and attributes from the control tree along the way, building a tree of common content, with appendages
         containing primary and control only content. that tree will then be used to generate the diff.

         the process of matching involves finding, for an element or attribute in the primary tree, the
         equivalent element or attribute in the control tree, *at the same level*, and *regardless of ordering*.

             matching logic is encoded as templates with mode="find-match", providing a hook to wire in specific
             matching logic for particular elements or attributes. for example, an element may "match" based on an
             @id attribute value, irrespective of element ordering; encode this in a mode="find-match" template.

             the treatment of diffs is encoded as templates with mode="primary-only" and "control-only", providing
             hooks for alternate behavior upon encountering differences.

          -->

    <xsl:output method="text"/>

    <xsl:strip-space elements="*"/>

    <xsl:param name="full" select="false()"/><!-- whether to render the full doc, as opposed to just diffs -->

    <xsl:template match="/">
        <xsl:call-template name="compare">
            <xsl:with-param name="primary" select="*/*[1]"/><!-- first child of root element, for example -->
            <xsl:with-param name="control" select="*/*[2]"/><!-- second child of root element, for example -->
        </xsl:call-template>
    </xsl:template>

    <!-- OVERRIDES: templates that can be overridden to provide targeted matching logic and diff treatment -->

    <!-- default find-match template for elements
         (by default, for "complex" elements, name has to match, for "simple" elements, name and value do)
         for context node (from "primary"), choose from among $candidates (from "control") which one matches
         (override with more specific match patterns to effect alternate behavior for targeted elements)
         -->
    <xsl:template match="*" mode="find-match" as="element()?">
        <xsl:param name="candidates" as="element()*"/>
        <xsl:choose>
            <xsl:when test="text() and count(node()) = 1"><!-- simple content -->
                <xsl:sequence select="$candidates[node-name(.) = node-name(current())][text() and count(node()) = 1][. = current()][1]"/>
            </xsl:when>
            <xsl:when test="not(node())"><!-- empty -->
                <xsl:sequence select="$candidates[node-name(.) = node-name(current())][not(node())][1]"/>
            </xsl:when>
            <xsl:otherwise><!-- presumably complex content -->
                <xsl:sequence select="$candidates[node-name(.) = node-name(current())][1]"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <!-- default find-match template for attributes
         (by default, name and value have to match)
         for context attr (from "primary"), choose from among $candidates (from "control") which one matches
         (override with more specific match patterns to effect alternate behavior for targeted attributes)
         -->
    <xsl:template match="@*" mode="find-match" as="attribute()?">
        <xsl:param name="candidates" as="attribute()*"/>
        <xsl:sequence select="$candidates[. = current()][node-name(.) = node-name(current())][1]"/>
    </xsl:template>

    <!-- default primary-only template (override with more specific match patterns to effect alternate behavior) -->
    <xsl:template match="@* | *" mode="primary-only">
        <xsl:apply-templates select="." mode="illegal-primary-only"/>
    </xsl:template>

    <!-- write out a primary-only diff -->
    <xsl:template match="@* | *" mode="illegal-primary-only">
        <primary:only>
            <xsl:copy-of select="."/>
        </primary:only>
    </xsl:template>

    <!-- default control-only template (override with more specific match patterns to effect alternate behavior) -->
    <xsl:template match="@* | *" mode="control-only">
        <xsl:apply-templates select="." mode="illegal-control-only"/>
    </xsl:template>

    <!-- write out a control-only diff -->
    <xsl:template match="@* | *" mode="illegal-control-only">
        <control:only>
            <xsl:copy-of select="."/>
        </control:only>
    </xsl:template>

    <!-- end OVERRIDES -->

    <!-- MACHINERY: for walking the primary and control trees together, finding matches and recursing -->

    <!-- compare "primary" and "control" trees (this is the root of comparison, so CALL THIS ONE !) -->
    <xsl:template name="compare">
        <xsl:param name="primary"/>
        <xsl:param name="control"/>

        <!-- write the xml diff into a variable -->
        <xsl:variable name="diff">
            <xsl:call-template name="match-children">
                <xsl:with-param name="primary" select="$primary"/>
                <xsl:with-param name="control" select="$control"/>
            </xsl:call-template>
        </xsl:variable>

        <!-- "print" the xml diff as textual output -->
        <xsl:apply-templates select="$diff" mode="print">
            <xsl:with-param name="render" select="$full"/>
        </xsl:apply-templates>

    </xsl:template>

    <!-- assume primary (context) element and control element match, so render the "common" element and recurse -->
    <xsl:template match="*" mode="common">
        <xsl:param name="control"/>

        <xsl:copy>
            <xsl:call-template name="match-attributes">
                <xsl:with-param name="primary" select="@*"/>
                <xsl:with-param name="control" select="$control/@*"/>
            </xsl:call-template>

            <xsl:choose>
                <xsl:when test="text() and count(node()) = 1">
                    <xsl:value-of select="."/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:call-template name="match-children">
                        <xsl:with-param name="primary" select="*"/>
                        <xsl:with-param name="control" select="$control/*"/>
                    </xsl:call-template>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:copy>

    </xsl:template>

    <!-- find matches between collections of attributes in primary vs control -->
    <xsl:template name="match-attributes">
        <xsl:param name="primary" as="attribute()*"/>
        <xsl:param name="control" as="attribute()*"/>
        <xsl:param name="primaryCollecting" as="attribute()*"/>

        <xsl:choose>
            <xsl:when test="$primary and $control">
                <xsl:variable name="this" select="$primary[1]"/>
                <xsl:variable name="match" as="attribute()?">
                    <xsl:apply-templates select="$this" mode="find-match">
                        <xsl:with-param name="candidates" select="$control"/>
                    </xsl:apply-templates>
                </xsl:variable>

                <xsl:choose>
                    <xsl:when test="$match">
                        <xsl:copy-of select="$this"/>
                        <xsl:call-template name="match-attributes">
                            <xsl:with-param name="primary" select="subsequence($primary, 2)"/>
                            <xsl:with-param name="control" select="remove($control, 1 + count(set:leading($control, $match)))"/>
                            <xsl:with-param name="primaryCollecting" select="$primaryCollecting"/>
                        </xsl:call-template>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:call-template name="match-attributes">
                            <xsl:with-param name="primary" select="subsequence($primary, 2)"/>
                            <xsl:with-param name="control" select="$control"/>
                            <xsl:with-param name="primaryCollecting" select="$primaryCollecting | $this"/>
                        </xsl:call-template>
                    </xsl:otherwise>
                </xsl:choose>

            </xsl:when>
            <xsl:otherwise>
                <xsl:if test="$primaryCollecting | $primary">
                    <xsl:apply-templates select="$primaryCollecting | $primary" mode="primary-only"/>
                </xsl:if>
                <xsl:if test="$control">
                    <xsl:apply-templates select="$control" mode="control-only"/>
                </xsl:if>
            </xsl:otherwise>
        </xsl:choose>

    </xsl:template>

    <!-- find matches between collections of elements in primary vs control -->
    <xsl:template name="match-children">
        <xsl:param name="primary" as="node()*"/>
        <xsl:param name="control" as="element()*"/>

        <xsl:variable name="this" select="$primary[1]" as="node()?"/>

        <xsl:choose>
            <xsl:when test="$primary and $control">
                <xsl:variable name="match" as="element()?">
                    <xsl:apply-templates select="$this" mode="find-match">
                        <xsl:with-param name="candidates" select="$control"/>
                    </xsl:apply-templates>
                </xsl:variable>

                <xsl:choose>
                    <xsl:when test="$match">
                        <xsl:apply-templates select="$this" mode="common">
                            <xsl:with-param name="control" select="$match"/>
                        </xsl:apply-templates>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:apply-templates select="$this" mode="primary-only"/>
                    </xsl:otherwise>
                </xsl:choose>
                <xsl:call-template name="match-children">
                    <xsl:with-param name="primary" select="subsequence($primary, 2)"/>
                    <xsl:with-param name="control" select="if (not($match)) then $control else remove($control, 1 + count(set:leading($control, $match)))"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:when test="$primary">
                <xsl:apply-templates select="$primary" mode="primary-only"/>
            </xsl:when>
            <xsl:when test="$control">
                <xsl:apply-templates select="$control" mode="control-only"/>
            </xsl:when>
        </xsl:choose>

    </xsl:template>

    <!-- end MACHINERY -->

    <!-- PRINTERS: print templates for writing out the diff -->

    <xsl:template match="*" mode="print">
        <xsl:param name="depth" select="-1"/>
        <xsl:param name="render" select="false()"/>
        <xsl:param name="lineLeader" select="' '"/>
        <xsl:param name="rest" as="element()*"/>

        <xsl:if test="$render or descendant::primary:* or descendant::control:*">

            <xsl:call-template name="whitespace">
                <xsl:with-param name="indent" select="$depth"/>
                <xsl:with-param name="leadChar" select="$lineLeader"/>
            </xsl:call-template>
            <xsl:text>&lt;</xsl:text>
            <xsl:value-of select="name(.)"/>

            <xsl:apply-templates select="@* | primary:*[@*] | control:*[@*]" mode="print">
                <xsl:with-param name="depth" select="$depth"/>
                <xsl:with-param name="render" select="$render"/>
                <xsl:with-param name="lineLeader" select="$lineLeader"/>
            </xsl:apply-templates>

            <xsl:choose>
                <xsl:when test="text() and count(node()) = 1"><!-- field element (just textual content) -->
                    <xsl:text>&gt;</xsl:text>
                    <xsl:value-of select="."/>
                    <xsl:text>&lt;/</xsl:text>
                    <xsl:value-of select="name(.)"/>
                    <xsl:text>&gt;</xsl:text>
                </xsl:when>
                <xsl:when test="count(node()) = 0"><!-- empty (self-closing) element -->
                    <xsl:text>/&gt;</xsl:text>
                </xsl:when>
                <xsl:otherwise><!-- complex content -->
                    <xsl:text>&gt;&#10;</xsl:text>
                    <xsl:apply-templates select="*[not(self::primary:* and @*) and not(self::control:* and @*)]" mode="print">
                        <xsl:with-param name="depth" select="$depth + 1"/>
                        <xsl:with-param name="render" select="$render"/>
                        <xsl:with-param name="lineLeader" select="$lineLeader"/>
                    </xsl:apply-templates>
                    <xsl:call-template name="whitespace">
                        <xsl:with-param name="indent" select="$depth"/>
                        <xsl:with-param name="leadChar" select="$lineLeader"/>
                    </xsl:call-template>
                    <xsl:text>&lt;/</xsl:text>
                    <xsl:value-of select="name(.)"/>
                    <xsl:text>&gt;</xsl:text>
                </xsl:otherwise>
            </xsl:choose>

            <xsl:text>&#10;</xsl:text>

        </xsl:if>

        <!-- write the rest of the elements, if any -->
        <xsl:apply-templates select="$rest" mode="print">
            <xsl:with-param name="depth" select="$depth"/>
            <xsl:with-param name="render" select="$render"/>
            <xsl:with-param name="lineLeader" select="$lineLeader"/>
            <xsl:with-param name="rest" select="()"/><!-- avoid implicit param pass to recursive call! -->
        </xsl:apply-templates>

    </xsl:template>

    <xsl:template match="@*" mode="print">
        <xsl:param name="depth" select="0"/>
        <xsl:param name="render" select="false()"/>
        <xsl:param name="lineLeader" select="' '"/>
        <xsl:param name="rest" as="attribute()*"/>

        <xsl:if test="$render">

            <xsl:text>&#10;</xsl:text>
            <xsl:call-template name="whitespace">
                <xsl:with-param name="indent" select="$depth + 3"/>
                <xsl:with-param name="leadChar" select="$lineLeader"/>
            </xsl:call-template>
            <xsl:value-of select="name(.)"/>
            <xsl:text>="</xsl:text>
            <xsl:value-of select="."/>
            <xsl:text>"</xsl:text>
        </xsl:if>

        <xsl:apply-templates select="$rest" mode="print">
            <xsl:with-param name="depth" select="$depth"/>
            <xsl:with-param name="render" select="$render"/>
            <xsl:with-param name="lineLeader" select="$lineLeader"/>
            <xsl:with-param name="rest" select="()"/><!-- avoid implicit param pass to recursive call! -->
        </xsl:apply-templates>

    </xsl:template>

    <xsl:template match="primary:* | control:*" mode="print">
        <xsl:param name="depth"/>

        <xsl:variable name="diffType" select="util:diff-type(.)"/>
        <xsl:variable name="primary" select="self::primary:*"/>
        <xsl:variable name="lineLeader" select="if ($primary) then '+' else '-'"/>

        <!-- only if this is the first in a sequence of control::* elements, since the rest are handled along with the first... -->
        <xsl:if test="util:diff-type(preceding-sibling::*[1]) != $diffType">
            <xsl:if test="@*">
                <xsl:text>&#10;</xsl:text>
            </xsl:if>
            <xsl:call-template name="diffspace">
                <xsl:with-param name="indent" select="if (@*) then $depth + 3 else $depth"/>
                <xsl:with-param name="primary" select="$primary"/>
            </xsl:call-template>
            <b><i>&lt;!-- ... --&gt;</i></b><!-- something to identify diff sections in output -->
            <xsl:if test="node()">
                <xsl:text>&#10;</xsl:text>
            </xsl:if>
            <xsl:variable name="rest" select="set:leading(following-sibling::*, following-sibling::*[util:diff-type(.) != $diffType])"/>
            <xsl:apply-templates select="@* | node()" mode="print">
                <xsl:with-param name="depth" select="$depth"/>
                <xsl:with-param name="render" select="true()"/>
                <xsl:with-param name="lineLeader" select="$lineLeader"/>
                <xsl:with-param name="rest" select="$rest/@* | $rest/*"/>
            </xsl:apply-templates>
        </xsl:if>
    </xsl:template>

    <xsl:template name="whitespace">
        <xsl:param name="indent" select="0" as="xs:integer"/>
        <xsl:param name="leadChar" select="' '"/>
        <xsl:choose>
            <xsl:when test="$indent > 0">
                <xsl:value-of select="$leadChar"/>
                <xsl:text> </xsl:text>
                <xsl:for-each select="0 to $indent - 1">
                    <xsl:text>  </xsl:text>
                </xsl:for-each>
            </xsl:when>
            <xsl:otherwise>
                <xsl:for-each select="0 to $indent">
                    <xsl:text>  </xsl:text>
                </xsl:for-each>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <xsl:template name="diffspace">
        <xsl:param name="indent" select="0" as="xs:integer"/>
        <xsl:param name="primary" select="false()"/>
        <xsl:for-each select="0 to $indent">
            <xsl:choose>
                <xsl:when test="$primary">
                    <xsl:text>++</xsl:text>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:text>--</xsl:text>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:for-each>
    </xsl:template>

    <!-- just an "enum" for deciding whether to group adjacent diffs -->
    <xsl:function name="util:diff-type" as="xs:integer">
        <xsl:param name="construct"/>
        <xsl:sequence select="if ($construct/self::primary:*[@*]) then 1 else
                              if ($construct/self::control:*[@*]) then 2 else
                              if ($construct/self::primary:*) then 3 else
                              if ($construct/self::control:*) then 4 else
                              if ($construct) then 5 else 0"/>
    </xsl:function>

    <!-- end PRINTERS -->

</xsl:stylesheet>

考虑这个例子中输入的基础上,你的:

<test>
    <Node>
        <Child name="Alpha"/>
        <Child name="Beta"/>
        <Child name="Charlie"/>
    </Node>
    <Node>
        <Child name="Beta"/>
        <Child name="Charlie"/>
        <Child name="Alpha"/>
    </Node>
</test>

与样式表作为是,下面是当施加到示例的输出:

<Node>
  <Child
++++++++<!-- ... -->
+       name="Alpha"
--------<!-- ... -->
-       name="Beta">
  </Child>
  <Child
++++++++<!-- ... -->
+       name="Beta"
--------<!-- ... -->
-       name="Charlie">
  </Child>
  <Child
++++++++<!-- ... -->
+       name="Charlie"
--------<!-- ... -->
-       name="Alpha">
  </Child>
</Node>

但是,如果添加此自定义模板:

<xsl:template match="Child" mode="find-match" as="element()?">
    <xsl:param name="candidates" as="element()*"/>
    <xsl:sequence select="$candidates[@name = current()/@name][1]"/>
</xsl:template>

这说来匹配基于其Child属性@name元素,那么你就得不到任何输出(意思是没有差异)。


0
投票

下面是使用SWI-Prolog的一个差异的解决方案

:- use_module(library(xpath)).
load_trees(XmlRoot1, XmlRoot2) :-
    load_xml('./xml_source_1.xml', XmlRoot1, _),
    load_xml('./xml_source_2.xml', XmlRoot2, _).

find_differences(Reference, Root1, Root2) :-
    xpath(Root1, //'Child'(@name=Name), Node),
    not(xpath(Root2, //'Child'(@name=Name), Node)),
    writeln([Reference, Name, Node]).

diff :-
    load_trees(Root1, Root2),
    (find_differences('1', Root1, Root2) ; find_differences('2', Root2, Root1)).

序言将统一Name变量以从文件1上的节点变量确实在“diff”检测匹配节点和文件2中统一。

下面是下面的一些示例输出:

% file 1 and file 2 have no differences 
?- diff.
false.

% "Alpha" was updated  in file 2
?- diff.
[1,Alpha,element(Child,[name=Alpha],[])]
[2,Alpha,element(Child,[name=Alpha,age=7],[])]
false.

0
投票

在C#中你可以做到这一点,此后,它与任何比较工具比较。

public void Run()
{
    LoadSortAndSave(@".. first file ..");
    LoadSortAndSave(@".. second file ..");
}

public void LoadSortAndSave(String path)
{
    var xdoc = XDocument.Load(path);
    SortXml(xdoc.Root);
    File.WriteAllText(path + ".sorted", xdoc.ToString());
}

private void SortXml(XContainer parent)
{
    var elements = parent.Elements()
        .OrderBy(e => e.Name.LocalName)
        .ToArray();

    Array.ForEach(elements, e => e.Remove());

    foreach (var element in elements)
    {
        parent.Add(element);
        SortXml(element);
    }
}

0
投票

写了一个简单的Java程序这样做。存储被比较两个XML的一个HashMap中,与主要作为元素的XPath(包括元素的文本值)和值作为该元素的出现次数。然后比较了两种HashMap中的两个键集和值。

/ ** *创建地图上以文本的值和没有嵌套节点元件。 *在这里,图的关键是与元素的文本值链接元素的XPATH,该元素的值是该元素的出现次数。 * * @参数xmlContent * @返回* @throws的ParserConfigurationException * @throws的SAXException *引发IOException * /

private static Map<String, Long> getMapOfElementsOfXML(String xmlContent)

        throws ParserConfigurationException, SAXException, IOException {

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

    dbf.setValidating(false);

    DocumentBuilder db = dbf.newDocumentBuilder();

    Document doc1 = db.parse(new ByteArrayInputStream(xmlContent.getBytes()));

    NodeList entries = doc1.getElementsByTagName("*");

    Map<String, Long> mapElements = new HashMap<>();

    for (int i = 0; i < entries.getLength(); i++) {

        Element element = (Element) entries.item(i);

        if (element.getChildNodes().getLength() == 1&&element.getTextContent()!=null) {

            final String elementWithXPathAndValue = getXPath(element.getParentNode())

                    + "/"

                    + element.getParentNode().getNodeName()

                    + "/"

                    + element.getTagName()

                    + "/"

                    + element.getTextContent();

            Long countValue = mapElements.get(elementWithXPathAndValue);

            if (countValue == null) {

                countValue = Long.valueOf(0l);

            } else {

                ++countValue;

            }

            mapElements.put(elementWithXPathAndValue, countValue);

        }

    }

    return mapElements;

}

static String getXPath(Node node) {

    Node parent = node.getParentNode();

    if (parent == null) {

        return "";

    }

    return getXPath(parent) + "/" + parent.getNodeName();

}

完整的程序是在这里https://comparetwoxmlsignoringstanzaordering.blogspot.com/2018/12/java-program-to-compare-two-xmls.html

© www.soinside.com 2019 - 2024. All rights reserved.