拆分文本并构建表格

问题描述 投票:0回答:1

完成问卷后,我收到了答案。 问题之一是:您在工作中多久使用这些语言?答案的格式如下:

"A - Spanish 60 \r\nB - Both of them 10 \r\n C - English 30"
"B - Both of them 50 \r\n C - English 50"
"A - Spanish 30 \r\nC - English 70"

如您所见,每个答案都由三个不同的答案组成,前面有

A
B
C
(西班牙语、两者或英语)。然而,这三个答案并不总是出现,我想要得到的是下表:

Spanish | Both of them | English
   60          10           30
    0          50           50
   30           0           70

strsplit(x, "\r\n")
我分开了答案,但我不知道如何继续。

r text split
1个回答
0
投票

让我分享一下我对实现这一目标的见解:

# Pseudocode
# 1. Init: an empty matrix result,
#          #of rows equal to the number of responses
#          3 columns for Spanish, Both of them, and English.
# 2. for each response on strsplit(response, "\r\n") and extra spaces removed.
#      2.1. for each line, split it into parts using strsplit(line, " "), and
#           extract the option (A, B, or C) and the value.
#           2.1.1. Based on the option, update the corresponding cell in the result matrix.
# 3. Print the result matrix.

以下是示例代码:

# init:
result <- matrix(0, nrow=length(responses), ncol=3)
colnames(result) <- c("Spanish", "Both of them", "English")

for (i in seq_along(responses)) {
  response <- responses[i]
  lines <- strsplit(response, "\r\n")[[1]]
  
  for (line in lines) {
    line <- gsub("^\\s+|\\s+$", "", line)  # Remove extra spaces
    parts <- strsplit(line, " ")[[1]]
    option <- parts[1]
    value <- as.numeric(parts[length(parts)])
    
    if (option == "A") {
      result[i, "Spanish"] <- value
    } else if (option == "B") {
      result[i, "Both of them"] <- value
    } else if (option == "C") {
      result[i, "English"] <- value
    }
  }
}

演示在这里

© www.soinside.com 2019 - 2024. All rights reserved.