R代码从JSON日志文件中提取字段的值

Question

我有一个文件，其中包含来自日志收集的50,000条记录。我需要为每个记录拉出“ State”：＆“ Code”：之后的值。我已经尝试过正则表达式，但无法正常工作。取而代之的是，我尝试使用此命令查看是否可以得到其中一个值，但是它只是超时了。

#this never completes
sub(".*?Code(.*?);.*", "\\1", logfile

我没有这类工作的经验，所以我感谢您的帮助！这就是日志文件的格式设置（实际上是JSON）。我的目标是返回以下值（如果不能包含State＆Code，则可以）：

（（状态：红色，代码：空（状态：蓝色，代码：无收据）

下面是logfile的确切语法，其中有2条记录：

 "
    2020-05-12 00:07:00.9681200, z123-asddfas,"
    ========== mode for SKU ==========
    ========== Records found ==========
    No records found
    ========== DRecords found ==========
    No drecords found
    "
    2020-05-12 00:08:46.5076411,qwer98-asdha,"
    ========== mode for SKU ==========
    ========== records found ==========
    {
        "State":  "Red",
        "Code":  null
    }
    ========== DRecords found ==========
    No drecords found
    "
    2020-05-12 00:10:02.6607640,qweaso-34324-asda,"
    ========== mode for SKU ==========
    ========== records found ==========
    {
        "State":  "Blue",
        "Code":  "no receipt"
    }

Answer 1

阅读您的文字

logIn <-  read_lines('"
    2020-05-12 00:07:00.9681200, z123-asddfas,"
========== mode for SKU ==========
  ========== Records found ==========
  No records found
========== DRecords found ==========
  No drecords found
"
    2020-05-12 00:08:46.5076411,qwer98-asdha,"
========== mode for SKU ==========
  ========== records found ==========
  {
    "State":  "Red",
    "Code":  null
  }
========== DRecords found ==========
  No drecords found
"
    2020-05-12 00:10:02.6607640,qweaso-34324-asda,"
========== mode for SKU ==========
  ========== records found ==========
  {
    "State":  "Blue",
    "Code":  "no receipt"
  }')

将其放入可缠绕的形式，清理并过滤

@library(tidyverse)    
tibble(lines = logIn) %>% 
     # Keep only the lines with 'state' or 'code'
  filter(str_detect(lines, "(?ix) ( state | code )")) %>% 
     # Clean out all the whitespace and punct, except the ':'
  mutate(lines = str_replace_all(lines, '["\\s,]', '')) %>% 
     # Use separate to divide into two new columns
  separate(lines, c("ATTR", "VALUE"), sep = ":")

我们得到了什么？

# A tibble: 4 x 2
  ATTR  VALUE    
  <chr> <chr>    
1 State Red      
2 Code  null     
3 State Blue     
4 Code  noreceipt

＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃按要求

tibble(lines = logIn) %>% 
  # Keep only the lines with 'state' or 'code'
  filter(str_detect(lines, "(?ix) ( state | code )")) %>% 
    # This ID will come in useful
  rowid_to_column("ID") %>% 
  # Clean out all the whitespace and punct, except the ':'
  mutate(lines = str_replace_all(lines, '["\\s,]', ''),
         # Give each State and Code the same ID.
         ID = floor((ID + 1) / 2)) %>% 
  # Use separate to divide into two new columns
  separate(lines, c("ATTR", "VALUE"), sep = ":") %>% 
    # spread take it from longform to wideform
  spread(key = ATTR, value = VALUE) %>% 
  select(ID, State, Code)

# A tibble: 2 x 3
     ID State Code     
  <dbl> <chr> <chr>    
1     1 Red   null     
2     2 Blue  noreceipt

R代码从JSON日志文件中提取字段的值

问题描述投票：0回答：1

1个回答

最新问题

R代码从JSON日志文件中提取字段的值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1