Python / ete3:将最密切相关的叶子定位到系统树中的特定物种

问题描述 投票:1回答:1

我正在使用Python软件包ete3。我有以下树木:

((Species1_order1,(Species2_order2,Species3_order2)),Species4_order3,Species5_order5);

我想看到与树中特定节点(这里的树为Species1_order1)关系最密切的叶。在此示例中,最紧密相关的叶子是Species2_order2 / Species3_order2Species4_order3 / Species5_order5

代码:

tree = ete3.Tree('((Species1_order1, \
                    (Species2_order2, Species3_order2)), \
                   Species4_order3, Species5_order5);')

新示例:

tree=ete3.Tree('((((((A,B),C),D),(E,F)),G),(H,I));')

我得到的结果是:

     A    B    C    D    E    F    G    H    I
A  0.0  2.0  3.0  4.0  6.0  6.0  6.0  8.0  8.0
B  2.0  0.0  3.0  4.0  6.0  6.0  6.0  8.0  8.0
C  3.0  3.0  0.0  3.0  5.0  5.0  5.0  7.0  7.0
D  4.0  4.0  3.0  0.0  4.0  4.0  4.0  6.0  6.0
E  6.0  6.0  5.0  4.0  0.0  2.0  4.0  6.0  6.0
F  6.0  6.0  5.0  4.0  2.0  0.0  4.0  6.0  6.0
G  6.0  6.0  5.0  4.0  4.0  4.0  0.0  4.0  4.0
H  8.0  8.0  7.0  6.0  6.0  6.0  4.0  0.0  2.0
I  8.0  8.0  7.0  6.0  6.0  6.0  4.0  2.0  0.0

但是例如,树中的E和F与A,B,C和D的距离相等,因此它们似乎比D更衣。

一个好的矩阵结果应该是:

A   B   C   D   E   F   G   H   I
A   0   1   2   3   4   4   5   6   6
B   1   0   2   3   4   4   5   6   6
C   2   2   0   3   4   4   5   6   6
D   3   3   3   0   4   4   5   6   6
E   4   4   4   4   0   1   5   6   6
F   4   4   4   4   1   0   5   6   6
G   5   5   5   5   5   5   0   6   6
H   6   6   6   6   6   6   6   0   1
I   6   6   6   6   6   6   6   1   0

不是吗?

python phylogeny ete3
1个回答
1
投票

如评论中所讨论,ete3给我们提供了一个名为Tree.get_closest_leaf的函数,但它的输出不是预期的(而且我不确定该值在这里代表什么):

>>> t=ete3.Tree('((Species1_order1,(Species2_order2,Species3_order2)),Species4_order3,Species5_order5);')
>>> t.get_closest_leaf('Species2_order2')
(Tree node 'Species4_order3' (0x115b2f29), 0.0)

相反,您可以像这样获得节点距离:

import ete3
import pandas as pd

def make_matrix(tree):
    def get_root_path(node):
        root_path = [node]
        if node.up:
            root_path.extend(get_root_path(node.up))
        return root_path
    leaves = tree.get_leaves()
    leaf_ct = len(leaves)
    paths = {node.name: set(get_root_path(node)) for node in leaves}
    col_lbls = [leaf.name for leaf in leaves]
    dist_matrix = pd.np.array([pd.np.zeros(leaf_ct)] * leaf_ct)
    df = pd.DataFrame(dist_matrix, index=col_lbls, columns=col_lbls)
    for node1_name, col in df.iteritems():
        for node2_name in col.keys():
            path = paths[node2_name].symmetric_difference(paths[node1_name])
            dist = sum(node.dist for node in path)
            df.at[node1_name, node2_name] = dist
            df.at[node2_name, node1_name] = dist
    return df

注意:由于种种原因,这是次优的解决方案,但是这个问题并不是在寻求最有效的解决方案。有关系统发育距离矩阵方法的更多信息,请参见this link

此解决方案还使用了pandas,这实在太过分了,因为它实际上只是为了方便行/列标签。删除pandas依赖性并使用本机列表代替它并不困难。

这里是输出:

>>> tree=ete3.Tree('((Species1_order1, (Species2_order2, Species3_order2)), Species4_order3, Species5_order5);')
>>> make_matrix(tree)
                 Species1_order1  Species2_order2  Species3_order2  Species4_order3  Species5_order5
Species1_order1              0.0              3.0              3.0              3.0              3.0
Species2_order2              3.0              0.0              2.0              4.0              4.0
Species3_order2              3.0              2.0              0.0              4.0              4.0
Species4_order3              3.0              4.0              4.0              0.0              2.0
Species5_order5              3.0              4.0              4.0              2.0              0.0

对于发布的更新,我没有发现任何错误。它似乎给出正确的结果。这是ete3渲染的树:

这里是Interest_sequence对应的矩阵列:

>>> m['Interest_sequence']
Rhopalosiphum_maidis__Hemiptera            4.0
Drosophila_novamexicana__Hemiptera         5.0
Drosophila_arizonae__Hemiptera             6.0
Drosophila_navojoa__Hemiptera              6.0
Interest_sequence                          0.0
Heliothis_virescens_droso_3a__nan          5.0
Mythimna_separata_droso__nan               6.0
Heliothis_virescens_droso_3i__nan          6.0
Scaptodrosophila_lebanonensis__Diptera     5.0
Mythimna_unipuncta_droso_A__nan            6.0
Xestia_c-nigrum_droso__nan                 8.0
Helicoverpa_armigera_droso__nan            8.0
Mocis_latipes_droso__nan                   7.0
Drosophila_busckii__Diptera                4.0
Drosophila_bipectinata__Diptera            5.0
Drosophila_mojavensis__Diptera             7.0
Drosophila_yakuba__Diptera                 7.0
Drosophila_hydei__Diptera                  7.0
Drosophila_serrata__Diptera                8.0
Drosophila_takahashii__Diptera             9.0
Drosophila_eugracilis__Diptera            11.0
Drosophila_ficusphila__Diptera            11.0
Drosophila_erecta__Diptera                12.0
Drosophila_melanogaster__Diptera          13.0
Sequence_A_nan__nan                       14.0
Drosophila_sechellia__Diptera             15.0
Drosophila_simulans__Diptera              15.0
Drosophila_suzukii__Diptera               12.0
Drosophila_biarmipes__Diptera             12.0
Name: Interest_sequence, dtype: float64
© www.soinside.com 2019 - 2024. All rights reserved.