CRAN(或其任何相关产品)有 API 吗?

问题描述 投票:0回答:3

我有兴趣检索有关 R 包的机器可读元信息。

例如,当我访问 CRAN 时,我可以在下载之前看到有关该包的简短描述:https://cran.r-project.org/web/packages/MASS/

我找不到任何方法从 CRAN 服务器检索与 HTML 不同的输出。我想避免解析 HTML,而是以更方便的格式(例如 JSON)检索有关包的元信息。

我看到每个R包(至少据我所知)在其源代码包内都有一个类似yaml(?)的描述文本(该文件称为

DESCRIPTION
)。然而,到目前为止我只能在 tar 档案中找到这种描述,这意味着我必须先下载该包才能访问其描述。

这里是 MASS 包中的

DESCRIPTION
的示例:

Package: MASS
Priority: recommended
Version: 7.3-55
Date: 2022-01-12
Revision: $Rev: 3559 $
Depends: R (>= 3.3.0), grDevices, graphics, stats, utils
Imports: methods
Suggests: lattice, nlme, nnet, survival
Authors@R: c(person("Brian", "Ripley", role = c("aut", "cre", "cph"),
                    email = "[email protected]"),
         person("Bill", "Venables", role = "ctb"),
         person(c("Douglas", "M."), "Bates", role = "ctb"),
         person("Kurt", "Hornik", role = "trl",
                     comment = "partial port ca 1998"),
         person("Albrecht", "Gebhardt", role = "trl",
                     comment = "partial port ca 1998"),
         person("David", "Firth", role = "ctb"))
Description: Functions and datasets to support Venables and Ripley,
  "Modern Applied Statistics with S" (4th edition, 2002).
Title: Support Functions and Datasets for Venables and Ripley's MASS
LazyData: yes
ByteCompile: yes
License: GPL-2 | GPL-3
URL: http://www.stats.ox.ac.uk/pub/MASS4/
Contact: <[email protected]>
NeedsCompilation: yes
Packaged: 2022-01-13 05:06:37 UTC; ripley
Author: Brian Ripley [aut, cre, cph],
  Bill Venables [ctb],
  Douglas M. Bates [ctb],
  Kurt Hornik [trl] (partial port ca 1998),
  Albrecht Gebhardt [trl] (partial port ca 1998),
  David Firth [ctb]
Maintainer: Brian Ripley <[email protected]>
Repository: CRAN
Date/Publication: 2022-01-13 08:05:04 UTC

有什么建议如何直接以机器可读且方便的形式获得它吗?

我试图查找它,但搜索引擎到目前为止没有给我带来任何有用的结果。

编辑/澄清:我正在寻找一种不依赖于R的解决方案,而是一个不依赖于元数据检索所使用的框架/语言的Web API。

r cran
3个回答
4
投票

tools::CRAN_package_db()
有您想要的所有信息吗? (请参阅此处进行一些讨论)

> dd <- tools::CRAN_package_db()
> names(dd)
 [1] "Package"                 "Version"                
 [3] "Priority"                "Depends"                
 [5] "Imports"                 "LinkingTo"              
 [7] "Suggests"                "Enhances"               
 [9] "License"                 "License_is_FOSS"        
[11] "License_restricts_use"   "OS_type"                
[13] "Archs"                   "MD5sum"                 
[15] "NeedsCompilation"        "Additional_repositories"
[17] "Author"                  "Authors@R"              
[19] "Biarch"                  "BugReports"             
[21] "BuildKeepEmpty"          "BuildManual"            
[23] "BuildResaveData"         "BuildVignettes"         
[25] "Built"                   "ByteCompile"            
[27] "Classification/ACM"      "Classification/ACM-2012"
[29] "Classification/JEL"      "Classification/MSC"     
[31] "Classification/MSC-2010" "Collate"                
[33] "Collate.unix"            "Collate.windows"        
[35] "Contact"                 "Copyright"              
[37] "Date"                    "Description"            
[39] "Encoding"                "KeepSource"             
[41] "Language"                "LazyData"               
[43] "LazyDataCompression"     "LazyLoad"               
[45] "MailingList"             "Maintainer"             
[47] "Note"                    "Packaged"               
[49] "RdMacros"                "StagedInstall"          
[51] "SysDataCompression"      "SystemRequirements"     
[53] "Title"                   "Type"                   
[55] "URL"                     "UseLTO"                 
[57] "VignetteBuilder"         "ZipData"                
[59] "Published"               "Path"                   
[61] "X-CRAN-Comment"          "Reverse depends"        
[63] "Reverse imports"         "Reverse linking to"     
[65] "Reverse suggests"        "Reverse enhances"       

我要补充一点,虽然第一步确实需要 R,但您可以轻松生成 JSON 文件并将其存储在本地以供其他机器使用:

library(jsonlite)
(tools::CRAN_package_db()
   |> jsonlite::toJSON()
   |> writeLines("R_packages.json")
)

(这会生成一个 30Mb 的文件,没有换行符,但我认为它应该仍然可用......)


2
投票

一个可接受的解决方案是 METACRAN API,可在此处获取: https://crandb.r-pkg.org/


1
投票

您可以下载https://cloud.r-project.org/src/contrib/PACKAGES.gz(甚至以未压缩的形式https://cloud.r-project.org/src/contrib/PACKAGES )。它包含有关 DCF 格式的所有当前可用包的信息,使用描述文件中的一些字段以及其他一些字段。

您不需要使用

cloud.r-project.org
,任何 CRAN 镜像都可以。

© www.soinside.com 2019 - 2024. All rights reserved.