如何使用Bio.Entrez (python)从ncbi生物样本论文摘要中获取多个属性?

问题描述 投票:0回答:1

我试图从Ncbi生物样本Esummary中收集多个属性。

handle=Entrez.esummary(db='biosample',id='6451159')
recs=Entrez.read(handle)
attributes=recs['DocumentSummarySet']['DocumentSummary'][0]['SampleData']

属性的打印就像附件的截图一样,没有太多的结构或行间分隔。我想知道如何才能一次获得每个属性项或多个属性项。

result screen shot

attributes biopython
1个回答
0
投票

您的 attributes 包含一个XML数据的字符串,所以你需要一个XML库来解析这些数据。这里我使用 元素树:

>>> import xml.etree.ElementTree as ET

>>> root = ET.fromstring(attributes)
>>> root
<Element 'BioSample' at 0x7f8309ec54a8>

>>> root.attrib
{'access': 'public',
 'accession': 'SAMN06451159',
 'id': '6451159',
 'last_update': '2017-08-15T13:14:37.194',
 'publication_date': '2017-02-27T00:00:00.000',
 'submission_date': '2017-02-27T11:04:22.920'}

>>> for elem in root.getiterator():
        if elem.tag:
            print('tag: ', elem.tag)
        if elem.text is not None and elem.text.strip():
            print('text: ', elem.text)
        if elem.attrib:
            print('attrib: ', elem.attrib)
tag:  BioSample
attrib:  {'access': 'public', 'publication_date': '2017-02-27T00:00:00.000', 'last_update': '2017-08-15T13:14:37.194', 'submission_date': '2017-02-27T11:04:22.920', 'id': '6451159', 'accession': 'SAMN06451159'}
tag:  Ids
tag:  Id
text:  SAMN06451159
attrib:  {'db': 'BioSample', 'is_primary': '1'}
tag:  Id
text:  Bacillales bacterium UBA786
attrib:  {'db_label': 'Sample name'}
tag:  Id
text:  SRS2035273
attrib:  {'db': 'SRA'}
tag:  Description
tag:  Title
text:  Uncultivated Bacillales bacterium UBA786 genome recovered from SRX834653
tag:  Organism
attrib:  {'taxonomy_id': '1950363', 'taxonomy_name': 'Bacillales bacterium UBA786'}
tag:  OrganismName
text:  Bacillales bacterium UBA786
tag:  Comment
tag:  Paragraph
text:  Uncultivated genome recovered from an assembly of the SRX834653 metagenome.
tag:  Owner
tag:  Name
text:  University of Queensland
tag:  Contacts
tag:  Contact
attrib:  {'email': '[email protected]'}
tag:  Name
tag:  First
text:  Donovan
tag:  Last
text:  Parks
tag:  Models
tag:  Model
text:  Microbe, viral or environmental
tag:  Package
text:  Microbe.1.0
attrib:  {'display_name': 'Microbe; version 1.0'}
tag:  Attributes
tag:  Attribute
text:  CLC de novo assembler
attrib:  {'attribute_name': 'assembly_method'}
tag:  Attribute
text:  4.4.1
attrib:  {'attribute_name': 'assembly_method_version'}
tag:  Attribute
text:  not applicable
attrib:  {'attribute_name': 'collection_date', 'harmonized_name': 'collection_date', 'display_name': 'collection date'}
tag:  Attribute
text:  96.63%
attrib:  {'attribute_name': 'completeness_estimated'}
tag:  Attribute
text:  0.00%
attrib:  {'attribute_name': 'contamination_estimated'}
tag:  Attribute
text:  true
attrib:  {'attribute_name': 'environmental_sample'}
tag:  Attribute
text:  23.09
attrib:  {'attribute_name': 'genome_coverage'}
tag:  Attribute
text:  Kenya
attrib:  {'attribute_name': 'geo_loc_name', 'harmonized_name': 'geo_loc_name', 'display_name': 'geographic location'}
tag:  Attribute
text:  UBA786
attrib:  {'attribute_name': 'isolate', 'harmonized_name': 'isolate', 'display_name': 'isolate'}
tag:  Attribute
text:  feces
attrib:  {'attribute_name': 'isolation-source', 'harmonized_name': 'isolation_source', 'display_name': 'isolation source'}
tag:  Attribute
text:  BWA (BWA-MEM)
attrib:  {'attribute_name': 'mapping_method'}
tag:  Attribute
text:  0.7.12-r1039
attrib:  {'attribute_name': 'mapping_method_version'}
tag:  Attribute
text:  gut metagenome
attrib:  {'attribute_name': 'metagenome_source'}
tag:  Attribute
text:  true
attrib:  {'attribute_name': 'metagenomic'}
tag:  Attribute
text:  CheckM
attrib:  {'attribute_name': 'quality_assessment_method'}
tag:  Attribute
text:  1.0.6
attrib:  {'attribute_name': 'quality_assessment_method_version'}
tag:  Attribute
text:  metagenomic assembly
attrib:  {'attribute_name': 'sample_type', 'harmonized_name': 'sample_type', 'display_name': 'sample type'}
tag:  Attribute
text:  Genome binned from sequencing reads available in SRX834653 metagenome
attrib:  {'attribute_name': 'subsrc_note', 'harmonized_name': 'subsrc_note', 'display_name': 'subsource note'}
tag:  Attribute
text:  This BioSample is a metagenomic assembly obtained from the gut metagenome reads: SRR1747052.
attrib:  {'attribute_name': 'value'}
tag:  Links
tag:  Link
text:  348753
attrib:  {'type': 'entrez', 'target': 'bioproject', 'label': 'PRJNA348753'}
tag:  Status
attrib:  {'status': 'live', 'when': '2017-02-27T11:04:22.923'}
© www.soinside.com 2019 - 2024. All rights reserved.