如何在 R 中正确解析 JSON?

问题描述 投票:0回答:1

我正在尝试将 Twitter 数据转换为数据框,但不断收到:

Error in parse_con(txt, bigint_as_char) : 
  parse error: after array element, I expect ',' or ']'
          etweeted":false,"lang":"es"} {"created_at":"Fri Feb 01 05:54
                     (right here) ------^

尝试使用stream_in代替,但辅助列无法正常工作。

我的代码:

try2 <- fromJSON("/Users/malana/Downloads/tweets.20130201_055232.10000lines", flatten=TRUE)

JSON 文件如下所示:


{"created_at":"Fri Feb 01 05:54:47 +0000 2013","id":297221438333652992,"id_str":"297221438333652992","text":"RT @iEnterate: Por lo general, aquellos que reprimen la ira tienen tendencia a ser violentos despu\u00e9s de beber.","source":"\u003ca href=\"http:\/\/blackberry.com\/twitter\" rel=\"nofollow\"\u003eTwitter for BlackBerry\u00ae\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":1055846485,"id_str":"1055846485","name":"Arturo Cameras Lara","screen_name":"CamerasLara","location":"Tapachula Chiapas","url":null,"description":"Gran visi\u00f3n para el futuro, empezar desde hoy el sistema empresarial, para llegar a lo grande, con instinto pol\u00edtico en futuro. PRI. ","protected":false,"followers_count":41,"friends_count":72,"listed_count":0,"created_at":"Wed Jan 02 18:56:40 +0000 2013","favourites_count":14,"utc_offset":null,"time_zone":null,"geo_enabled":true,"verified":false,"statuses_count":1872,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3172272497\/1e624e13eb43b3cc47049b494c757d98_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3172272497\/1e624e13eb43b3cc47049b494c757d98_normal.jpeg","profile_banner_url":"https:\/\/si0.twimg.com\/profile_banners\/1055846485\/1357160274","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Thu Jan 31 17:01:13 +0000 2013","id":297026763945549824,"id_str":"297026763945549824","text":"Por lo general, aquellos que reprimen la ira tienen tendencia a ser violentos despu\u00e9s de beber.","source":"\u003ca href=\"http:\/\/www.hootsuite.com\" rel=\"nofollow\"\u003eHootSuite\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":385621677,"id_str":"385621677","name":"\u00a1Ent\u00e9rate!","screen_name":"iEnterate","location":"[email protected]","url":null,"description":"\u00a1Ent\u00e9rate de lo m\u00e1s actual! Ciencia,historia, tecnolog\u00eda,salud, psicolog\u00eda, citas, C u r i o s i d a d e s, Deportes. \u00a1Disfruta!","protected":false,"followers_count":253729,"friends_count":46,"listed_count":557,"created_at":"Wed Oct 05 20:33:20 +0000 2011","favourites_count":1595,"utc_offset":-36000,"time_zone":"Hawaii","geo_enabled":false,"verified":false,"statuses_count":17030,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"EDF4F7","profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/753934440\/26f50f5fc7cc2f774bdba1bb7c3c0602.jpeg","profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/753934440\/26f50f5fc7cc2f774bdba1bb7c3c0602.jpeg","profile_background_tile":true,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/2712142543\/81d9ec6c938b23cfb98cda090a281aa9_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/2712142543\/81d9ec6c938b23cfb98cda090a281aa9_normal.jpeg","profile_banner_url":"https:\/\/si0.twimg.com\/profile_banners\/385621677\/1357540686","profile_link_color":"0084B4","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":146,"entities":{"hashtags":[],"urls":[],"user_mentions":[]},"favorited":false,"retweeted":true,"lang":"es"},"retweet_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"iEnterate","name":"\u00a1Ent\u00e9rate!","id":385621677,"id_str":"385621677","indices":[3,13]}]},"favorited":false,"retweeted":false,"lang":"es"}
{"created_at":"Fri Feb 01 05:54:47 +0000 2013","id":297221438337867776,"id_str":"297221438337867776","text":"RT @nolavoy: Asi que armate uno armate un Hern\u00e1n \u266a","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":207797664,"id_str":"207797664","name":"Bazinga","screen_name":"ferchug22","location":"","url":null,"description":"No llores por quien no te ama , ama a quien llora por ti \u266a. fan de holasoygerman Pastii , Bionico , NTVG\ue022","protected":false,"followers_count":28,"friends_count":238,"listed_count":0,"created_at":"Tue Oct 26 01:46:23 +0000 2010","favourites_count":4,"utc_offset":-10800,"time_zone":"Buenos Aires","geo_enabled":false,"verified":false,"statuses_count":261,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1825125711\/las_pastillas_normal.jpg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1825125711\/las_pastillas_normal.jpg","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Fri Feb 01 03:22:45 +0000 2013","id":297183175455690753,"id_str":"297183175455690753","text":"Asi que armate uno armate un Hern\u00e1n \u266a","source":"web","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":842643985,"id_str":"842643985","name":"No La Voy","screen_name":"nolavoy","location":"C\u00f3rdoba","url":"http:\/\/nolavoy.blogspot.com.ar","description":"El futbol es una joda, nuestro humor es cosa seria. Humildes y con muchos huevos. Campeones Tweets Awards 2012 terna Humor en 140 caracteres.","protected":false,"followers_count":25778,"friends_count":182,"listed_count":45,"created_at":"Mon Sep 24 00:13:39 +0000 2012","favourites_count":4929,"utc_offset":-10800,"time_zone":"Brasilia","geo_enabled":true,"verified":false,"statuses_count":16997,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"030E14","profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/720721348\/b41f8d6d71cb21f42aaaa7c8ef6e6add.jpeg","profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/720721348\/b41f8d6d71cb21f42aaaa7c8ef6e6add.jpeg","profile_background_tile":true,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/2886958628\/6c32bef218d4cd7f0032909a8f53cd94_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/2886958628\/6c32bef218d4cd7f0032909a8f53cd94_normal.jpeg","profile_banner_url":"https:\/\/si0.twimg.com\/profile_banners\/842643985\/1351510749","profile_link_color":"3E08F2","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":35,"entities":{"hashtags":[],"urls":[],"user_mentions":[]},"favorited":false,"retweeted":true,"lang":"es"},"retweet_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"nolavoy","name":"No La Voy","id":842643985,"id_str":"842643985","indices":[3,11]}]},"favorited":false,"retweeted":false,"lang":"es"}
{"created_at":"Fri Feb 01 05:54:47 +0000 2013","id":297221438312701953,"id_str":"297221438312701953","text":"#SometimesYouHaveTo Move on with life","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":724874196,"id_str":"724874196","name":"Morgan\u270c\u2764","screen_name":"MorganAuderer","location":"","url":null,"description":"If nothing goes right, go left. \n\n\n\n\n\n\n\n\nIowa","protected":false,"followers_count":212,"friends_count":205,"listed_count":0,"created_at":"Mon Jul 30 00:02:45 +0000 2012","favourites_count":716,"utc_offset":null,"time_zone":null,"geo_enabled":false,"verified":false,"statuses_count":1298,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"0099B9","profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme4\/bg.gif","profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme4\/bg.gif","profile_background_tile":false,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3162469229\/752f4a8a2b3a97125ceb9acdc15e4c42_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3162469229\/752f4a8a2b3a97125ceb9acdc15e4c42_normal.jpeg","profile_banner_url":"https:\/\/si0.twimg.com\/profile_banners\/724874196\/1359354683","profile_link_color":"0099B9","profile_sidebar_border_color":"5ED4DC","profile_sidebar_fill_color":"95E8EC","profile_text_color":"3C3940","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"entities":{"hashtags":[{"text":"SometimesYouHaveTo","indices":[0,19]}],"urls":[],"user_mentions":[]},"favorited":false,"retweeted":false,"lang":"en"}
{"created_at":"Fri Feb 01 05:54:47 +0000 2013","id":297221438308507649,"id_str":"297221438308507649","text":"@nacchanx2_n \u3084\u3070\u304b\u3063\u305f\u3067\u7b11","source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":297188576653869057,"in_reply_to_status_id_str":"297188576653869057","in_reply_to_user_id":1006505846,"in_reply_to_user_id_str":"1006505846","in_reply_to_screen_name":"nacchanx2_n","user":{"id":982494234,"id_str":"982494234","name":"\u3055\u304d\u3093\u3061\u3087","screen_name":"sakincho_s2","location":"","url":null,"description":null,"protected":false,"followers_count":50,"friends_count":47,"listed_count":0,"created_at":"Sat Dec 01 13:04:03 +0000 2012","favourites_count":6,"utc_offset":null,"time_zone":null,"geo_enabled":false,"verified":false,"statuses_count":141,"lang":"ja","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3055034572\/b4a412b50cccc012194bb91e6ec932e0_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3055034572\/b4a412b50cccc012194bb91e6ec932e0_normal.jpeg","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"nacchanx2_n","name":"\u306a\u3063\u3061\u3083\u3093","id":1006505846,"id_str":"1006505846","indices":[0,12]}]},"favorited":false,"retweeted":false,"lang":"ja"}

编辑: 我根据另一个搜索将括号添加到 JSON 文件的开头和结尾,但如果没有它们,我会收到此错误:

Error in parse_con(txt, bigint_as_char) : parse error: trailing garbage
          etweeted":false,"lang":"es"} {"created_at":"Fri Feb 01 05:54
                     (right here) ------^
r json twitter
1个回答
0
投票

您的文件不是有效的 json。然而,每一行都是。本质上,方法是逐行读取文件,解析 json,然后将其绑定在一起。

但是,由于某些列可能包含多个值,因此会创建重复的行。我们可以通过使用

tibble::tibble_row()
来避免这种情况,这保证创建一行小标题。

library(rlang) # for the !!! (splice) operator
txt <- readLines("/Users/malana/Downloads/tweets.20130201_055232.10000lines")

lapply(txt, \(line) {
    dat <- rlist::list.flatten(jsonlite::fromJSON(line))
    # Collapse list columns into
    list_cols <- names(dat)[lengths(dat) > 1]
    dat[list_cols] <- lapply(dat[list_cols], paste, collapse = ";")
    dat[list_cols] <- lapply(dat[list_cols], list)
    tibble::tibble_row(!!!dat)
}) |>
    dplyr::bind_rows()

# There are some more columns not shown here
# A tibble: 4 × 101
#   created_at                    id id_str text  source truncated user.id user.id_str user.name user.screen_name user.location user.description user.protected user.followers_count
#   <chr>                      <dbl> <chr>  <chr> <chr>  <lgl>       <int> <chr>       <chr>     <chr>            <chr>         <chr>            <lgl>                         <int>
# 1 Fri Feb 01 05:54:47 +00… 2.97e17 29722… RT @… "<a h… FALSE      1.06e9 1055846485  Arturo C… CamerasLara      "Tapachula C… "Gran visión pa… FALSE                            41
# 2 Fri Feb 01 05:54:47 +00… 2.97e17 29722… RT @… "<a h… FALSE      2.08e8 207797664   Bazinga   ferchug22        ""            "No llores por … FALSE                            28
# 3 Fri Feb 01 05:54:47 +00… 2.97e17 29722… #Som… "<a h… FALSE      7.25e8 724874196   Morgan✌❤  MorganAuderer    ""            "If nothing goe… FALSE                           212
# 4 Fri Feb 01 05:54:47 +00… 2.97e17 29722… @nac… "<a h… FALSE      9.82e8 982494234   さきんち… sakincho_s2      ""             NA              FALSE                            50
© www.soinside.com 2019 - 2024. All rights reserved.