从PubChem FTP数据中生成分子的二维图像。

Question

比起抓取PubChem的网站，我更愿意好一点，从PubChem的ftp网站本地生成图像。

ftp:/ftp.ncbi.nih.govpubchemspecifications。

唯一的问题是，我仅限于OSX和Linux，我似乎找不到一种方法来编程生成他们网站上的2D图像。请看这个例子。

https:/pubchem.ncbi.nlm.nih.govcompound6#section=Top：

在 "二维结构 "的标题下，我们这里有这张图片。

https:/pubchem.ncbi.nlm.nih.govimageimgsrv.fcgi?cid=6&t=l。

这就是我想生成的东西。

Answer 1

如果你想要的东西工作的盒子，我建议使用醇酐来自ChemAxon的Marvin (https:/www.chemaxon.comproductsmarvin)，它对学术界是免费的。它可以很容易地从命令行使用，它支持大量的输入和输出格式。所以对于你的例子来说，它应该是。

molconvert "png" -s "C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl" -o cdnb.png

结果如下图

它还允许你设置参数，比如宽度，高度，质量，背景颜色等等。

但是，如果你是一个程序员，我一定会推荐你使用以下软件。RDKit. 按照一个代码，生成一对化合物的图像，给定为微笑。

from rdkit import Chem
from rdkit.Chem import Draw

ms_smis = [["C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl", "cdnb"],
           ["C1=CC(=CC(=C1)N)C(=O)N", "3aminobenzamide"]]
ms = [[Chem.MolFromSmiles(x[0]), x[1]] for x in ms_smis]

for m in ms: Draw.MolToFile(m[0], m[1] + ".svg", size=(800, 800))

这给你以下图像。

Answer 2

所以我也发邮件给PubChem的人他们很快就给我回复了这个答复。

我们唯一的批量访问图像是通过下载服务。 https:/pubchem.ncbi.nlm.nih.govpc_fetchpc_fetch.cgi。你一次最多可以申请5万张图片。

这比我预期的要好，但仍然不是。惊人因为它需要下载理论上我可以在本地生成的东西。所以我对这个问题保持开放态度，直到有好心人写出一个开源库来做同样的事情。

编辑。

我想如果人们和我做同样的事情，我也可以节省一些时间。我已经在Mechanize上创建了一个Ruby Gem来自动下载图片。请对他们的服务器好一点，只下载你需要的东西。

https:/github.comzachaysanpubchem

gem install pubchem

Answer 3

一个开源的选择是靛蓝工具箱它还有Linux、Windows和MacOS的预编译包以及Python、Java、.NET和C库的语言绑定。我选择了1.4.0测试版。

我有一个类似的兴趣，你在转换SMILES到2D结构，并调整我的Python来解决你的问题，并捕捉定时信息。它使用PubChem FTP（CompoundExtras）下载的CID-SMILES.gz。下面的脚本是一个本地SMILES-to-2D结构转换器的实现，它从PubChem CID-SMILES文件的同分异构体SMILES（其中包含超过1.02亿条化合物记录）中读取一系列行，并将SMILES转换为2D结构的PNG图像。在1000个SMILES到结构转换的三次测试中，在我的Windows 10笔记本电脑（英特尔i7-7500U CPU，2.70GHz）上，用固态硬盘，运行Python 3.7.4，在文件行偏移量为0、100,000和10,000,000时，转换1000个SMILES需要35、50和60秒。这3000个文件的总大小为100MB。

from indigo import *
from indigo.renderer import *
import subprocess
import datetime

def timerstart():
    # start timer and print time, return start time
    start = datetime.datetime.now()
    print("Start time =", start)

    return start

def timerstop(start):
    # end timer and print time and elapsed time, return elapsed time
    endtime = datetime.datetime.now()
    elapsed = endtime - start
    print("End time =", endtime)
    print("Elapsed time =", elapsed)

    return elapsed

numrecs = 1000
recoffset = 0 # 10000000    # record offset
starttime = timerstart()

indigo = Indigo()
renderer = IndigoRenderer(indigo)

# set render options
indigo.setOption("render-atom-color-property", "color")
indigo.setOption("render-coloring", True)
indigo.setOption("render-comment-position", "bottom")
indigo.setOption("render-comment-offset", "20")
indigo.setOption("render-background-color", 1.0, 1.0, 1.0)
indigo.setOption("render-output-format", "png")

# set data path (including data file) and output file path
datapath = r'../Download/CID-SMILES'
pngpath = r'./2D/'

# read subset of rows from data file
mycmd = "head -" + str(recoffset+numrecs) + " " + datapath + " | tail -" + str(numrecs) 
print(mycmd)
(out, err) = subprocess.Popen(mycmd, stdout=subprocess.PIPE, shell=True).communicate()

lines = str(out.decode("utf-8")).split("\n")
count = 0
for line in lines: 
    try:
        cols = line.split("\t")   # split on tab
        key = cols[0]             # cid in cols[0]
        smiles = cols[1]          # smiles in cols[1]
        mol = indigo.loadMolecule(smiles)
        s = "CID=" + key
        indigo.setOption("render-comment", s)
        #indigo.setOption("render-image-size", 200, 250)
        #indigo.setOption("render-image-size", 400, 500)
        renderer.renderToFile(mol, pngpath + key + ".png")
        count += 1
    except:
        print("Error processing line after", str(count), ":", line)
        pass

elapsedtime = timerstop(starttime)
print("Converted", str(count), "SMILES to PNG")

从PubChem FTP数据中生成分子的二维图像。

问题描述投票：6回答：3

3个回答

最新问题

从PubChem FTP数据中生成分子的二维图像。

问题描述 投票：6回答：3

3个回答

最新问题

问题描述投票：6回答：3