我在R中有一个数据帧,其中有一些行,如下所示:
c("LouDobbs", "gen_jackkeane") || RT @LouDobbs: #AmericaFirst- @gen_jackkeane: The Taliban for 9 months have told their fighters to kill as many people as you can, to includ…
以上是2列的示例,其中第1列(我正在使用分隔符||)具有多个用户名,第2列具有tweet文本。我希望该行应复制为2(用户数量),并且对于每个在tweet文本中列出了1个以上用户的数据框中的所有此类行,每个用户都可以单独放置在列1中。
structure(list(user = list("Dandhy_Laksono", c("LouDobbs", "gen_jackkeane"
), "DeepStateExpose", "AndruewJamess", "jrossman12", "BiLLRaY2019",
"DeepStateExpose", "Dandhy_Laksono", "DeepStateExpose", "DeepStateExpose"),
full_text = c("RT @Dandhy_Laksono: Sebagian pendukung Jokowi ini mengalami bagaimana fitnah \"komunis dan PKI\" digunakan selama pemilu.\n\nSekarang mereka me…",
"RT @LouDobbs: #AmericaFirst- @gen_jackkeane: The Taliban for 9 months have told their fighters to kill as many people as you can, to includ…",
"RT @DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…",
"RT @AndruewJamess: @BillOReilly @KamalaHarris is wrong. @realDonaldTrump has accomplished a lot. He set a record for incoherent toilet twe…",
"RT @jrossman12: @SaraCarterDC Pakistan won't allow that as you already know. Your husband and the other U.S. troops have been forced to fig…",
"RT @BiLLRaY2019: JOKOWI TIDAK MEMBUNUH KPK..!\nMarkibong…\"Selamat tinggal Taliban di dalam KPK. Kalian kalah lagi, kalah lagi..!\"\n\n#JumatBer…",
"RT @DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…",
"RT @Dandhy_Laksono: Sebagian pendukung Jokowi ini mengalami bagaimana fitnah \"komunis dan PKI\" digunakan selama pemilu.\n\nSekarang mereka me…",
"RT @DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…",
"RT @DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…"
)), row.names = c(NA, 10L), class = "data.frame")
我们可以使用lengths
来获取list
列中每个元素的长度。它应该足够快,因为lengths
很快
l1 <- lengths(df$user)
out <- data.frame(user = unlist(df$user), n = rep(l1, l1),
text = rep(df$full_text, l1))