是否有在Biopython中使用PhyML和智能模型选择的方法?

问题描述 投票:0回答:1

现在,我正在使用PhyML和Biopython中的智能模型选择进行搜索。

根据PhyML的官方文献(SMS:PhyML中的智能模型选择:https://academic.oup.com/mbe/article/34/9/2422/3788860),有模型选择的命令行界面(但我找不到任何地方)。 Biopython有一个名为from Bio.Phylo.Applications import PhymlCommandline的模块,可以从python脚本执行PyhML。

是否有在Python中集成PhymlCommandline和智能模型选择的方法?

python biopython
1个回答
0
投票

您可以从ATGC Montpelier生物信息平台的网站Docs and Materials section下载SMS的源代码。

该代码是几个shell脚本的组合,在后台使用R和PhyML。

因此,请下载zip文件,将其解压缩并确保已安装R。

$ unzip sms-1.8.1.zip
[...]
$ cd sms-1.8.1
sms-1.8.1 $ R --version
R version 3.5.1 (2018-07-02) -- "Feather Spray"

该代码依赖PhyML来计算似然分数。 PhyML源嵌入在SMS tarball中。需要编译的软件包:

  • 对于Debian:'build-essential'

    sudo apt-get install build-essential

  • 对于SuSE:'devel_basis'

    sudo zypper install --type pattern devel_basis

  • 对于Red Hat:'C开发工具和库'

    sudo yum -y -v groupinstall "C Development Tools and Libraries"

请注意,目录名称中的空格可能会引起问题。现在我们需要构建PhyML源。

sms-1.8.1 $ make all
[...]

然后使外壳程序脚本可执行。

sms-1.8.1 $ chmod +x ./sms.sh

您现在可以使用以下命令结构运行SMS

 ./sms.sh -i [input-msa] -d [data-type]

其中:

  • [[input-msa]是您的PHYLIP格式的输入数据对齐方式
  • [data-type]可以是氨基酸数据的'aa'或DNA的'nt'

可选参数为:

 -o : Path to the SMS output directory
 -c : Criterion to use 'aic' or 'bic'
 -u : Input tree in Newick format
 -t : Add this option to infer a PhyML tree with selected model
 -s : Type of tree improvement 'NNI' or 'SPR'
 -r : Number of random starting trees
 -b - Branch support (aLRT or bootstrap replicates)
 -h : Prints help

例如,使用primatesNT.phy中的The Phylogenetic Handbook示例文件:

sms-1.8.1 $ ./sms.sh -i primatesNT.phy -d nt -c bic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                     Starting SMS v1.8.1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Input alignment    : primatesNT.phy
Data type          : DNA
Number of taxa     : 21
Number of sites    : 1500
Number of branches : 39
Criterion          : BIC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Step 1 : Set a fixed topology
        BIC=13172.69687
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Step 2 : Select the best decoration
        BIC=13165.48751 decoration : '+G'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Step 3 : Select the best matrix
        BIC=13155.36733 matrix : 'HKY85'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Step 4 : Select the best final decoration
        BIC=13155.36733 decoration : '+G'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Selected model                          : HKY85 +G
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Substitution model                      : HKY85
Equilibrium frequencies                 : ML optimized
Transition / transversion ratio         : estimated
Proportion of invariable sites          : fixed (0.0)
Number of substitution rate categories  : 4
Gamma shape parameter                   : estimated (0.587)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Suggested citations:
SMS
 Vincent Lefort, Jean-Emmanuel Longueville, Olivier Gascuel.
 "SMS: Smart Model Selection in PhyML."
 Molecular Biology and Evolution, msx149, 2017.
PhyML
 S. Guindon, JF. Dufayard, V. Lefort, M. Anisimova, W. Hordijk, O. Gascuel
 "New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0."
 Systematic Biology. 2010. 59(3):307-321.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
© www.soinside.com 2019 - 2024. All rights reserved.