当数据未通过 R 函数中的某些检查时将数据分配给列表

问题描述 投票:0回答:1

我正在使用一些测试数据在 R 中创建一个函数,该函数将循环遍历 data.table 并检查每一列是否符合特定条件。该函数应根据已知的列名称对列进行分组,然后循环遍历这些列,并且它发现的任何不符合该列条件的值都应添加到“invalid_rows_list”中。目前我的代码中有一些额外的打印语句来帮助调试:

cat(sprintf("Invalid data found in column '%s':\n", col))
        print(data[invalid_rows, .(Row = .I, Value = column_data[invalid_rows])])

使用这些打印语句和测试数据,我知道会输出什么,它没有正确标记事物。根据我掌握的数据,它应该标记第 10 行的“programyear”,因为它有 5 位数字 (20201),而不是 4 位;它应该标记第 9 行的“亚洲”列,因为它有一个 A 而不是 0、1 或 9;它应该标记第 2 行的“夏威夷”列,因为它有 2 而不是 0、1 或 9。打印语句显示它正在标记这些,但它也标记了在列名无效,即使其中只有数字值,这是唯一的要求。

该函数应该将包含无效数据的行添加到列表中,并将该列表打印到最后的表中,但即使打印语句显示找到了无效数据,列表也会打印 NULL。

这是我目前拥有的功能:

detox_validation <- function(data) {
  setDT(data)
  setnames(data, tolower(names(data)))
  
  # Define columns by type and requirements
  colnames <- names(data)
  agencystaff_cols <- colnames[grepl("agencystaff", colnames)]
  agencypurchase_cols <- colnames[grepl("agencypurchase", colnames)]
  compserviceprovider_cols <- colnames[grepl("compserviceprovider(?!type)", colnames, perl = TRUE)]
  purchaseprovider_cols <- colnames[grepl("purchaseprovider", colnames)]
  numeric_cols <- colnames[grepl("expend|wage|hoursworked|age_", colnames)]
  date_cols <- grep("date|eligibilityext|compdisenrollmsg", names(data), value = TRUE)
  programyear_cols <- grep("programyear", colnames, value = TRUE)
  sex_cols <- grep("sex", colnames, value = TRUE, ignore.case = TRUE)
  demographic_cols <- grep("^(amerindian|asian|black|hawaiian|white|
                           hispanic|veteran|disability|adult|adulted|
                           dislocatedworker|jobcorps|wpempservice|youth|
                           longtermunemp|exhausttanf|fostercareyouth|
                           homelessorrunaway|exoffenderstatus|lowincomestatus|
                           englishlearner|basicskillsdeficient|culturalbarriers|
                           singleparent|dishomemaker)$",
                           colnames, value = TRUE, ignore.case = TRUE)
  
  # Exclude columns explicitly related to funds expended from certain checks
  funds_expended_cols <- colnames[grepl("fundsexpended", colnames)]
  # Ensure we do not validate these columns with incorrect conditions
  numeric_cols <- setdiff(numeric_cols, funds_expended_cols)
  
  # Initialize a list to store indices of invalid rows
  invalid_rows_list <- list()
  
  # Function to check and record invalid entries
  validate_and_list <- function(col_names, condition, error_message) {
    for (col in col_names) {
      if (!col %in% names(data)) {
        cat(sprintf("Warning: Column '%s' does not exist in the data.\n", col))
        next
      }
      column_data <- data[[col]]
      invalid_rows <- which(!condition(column_data) & !is.na(column_data))
      if (length(invalid_rows) > 0) {
        cat(sprintf("Invalid data found in column '%s':\n", col))
        print(data[invalid_rows, .(Row = .I, Value = column_data[invalid_rows])])
        invalid_rows_list[[length(invalid_rows_list) + 1]] <- data.table(
          Column = col,
          Row_Index = invalid_rows,
          Value = column_data[invalid_rows],
          Message = error_message
        )
      }
    }
  }
  
  # Apply checks
  validate_and_list(numeric_cols, function(x) is.numeric(x),
                    "Must be numeric and not NA")
  validate_and_list(date_cols, function(x) !is.na(ymd(x, quiet = TRUE)),
                    "Invalid date format or value. Expected YYYY-MM-DD or NA.")
  validate_and_list(programyear_cols, function(x) nchar(as.character(x)) ==
                      4 & grepl("^\\d{4}$", x),
                    "Program year must be exactly four digits.")
  validate_and_list(demographic_cols, function(x) x %in% c(0, 1, 9),
                    "Demographic values must be numeric and either 0, 1, or 9")
  validate_and_list(sex_cols, function(x) x %in% c(1, 2),
                    "Sex values must be numeric and either 1 or 2")
  
  
  # Combine all invalid entries into a single data.table
  if (length(invalid_rows_list) > 0) {
    all_invalid_rows <- rbindlist(invalid_rows_list)
    return(all_invalid_rows)
  } else {
    return(NULL)
  }
}

这是我用来检查功能的数据

> dput(check_val)
structure(list(v1 = 1:10, programyear = c(2020, 2020, 2020, 2020, 
2020, 2020, 2020, 2020, 2020, 20201), agencycode = c(6L, 35L, 
52L, 42L, 48L, 48L, 48L, 91L, 91L, 91L), applicationdate = structure(c(18444, 
18449, 16743, 18548, 14551, 17403, 12241, 14886, 15216, 15805
), class = "Date"), sex = c(1, 2, 1, 1, 1, 1, 1, 2, 2, 1), amerindian = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), asian = c("0", "0", "0", "0", "0", 
"0", "0", "0", "A", "0"), black = c(1, 0, 1, 1, 1, 0, 0, 0, 0, 
0), hawaiian = c(0, 2, 0, 0, 0, 0, 0, 0, 0, 0), white = c(0, 
1, 0, 0, 1, 1, 1, 1, 1, 1), hispanic = c(0, 0, 0, 0, 0, 1, 0, 
0, 0, 0), veteran = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), referral = c(19L, 
29L, 1L, 9L, 16L, 29L, 19L, 16L, 16L, 19L), student = c(3L, 0L, 
0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L), eligibilitydate = structure(c(18467, 
18507, NA, NA, 14573, 17423, 12262, 14886, 15217, 15805), class = "Date"), 
    eligibilityext = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), class = "Date"), oosplacementdate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), oosexitdate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), disability = c(1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1), primdisability = c(17L, 19L, 
    18L, 0L, 2L, 1L, 2L, 1L, 1L, 1L), primdisabilitycause = c(34L, 
    2L, 13L, 0L, 30L, 30L, 30L, 10L, 0L, 13L), seconddisability = c(0L, 
    19L, NA, 0L, 8L, 0L, 13L, 0L, 0L, 0L), seconddisabilitycause = c(0L, 
    18L, NA, 0L, 0L, 0L, 36L, 0L, 0L, 0L), disabilitysigcode = c(1L, 
    1L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 1L), twestartdate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), tweenddate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), ipesupportedempgoal = c(0L, 
    0L, 1L, NA, 0L, 0L, 1L, 0L, 0L, 0L), ipeempstatus = c(8L, 
    NA, 10L, NA, 7L, 8L, 10L, 8L, 8L, 8L), ipeprimaryocc = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), ipehourlywage = c(0, 
    NA, 0, 0, 0, 0, 0, 0, 0, 0), ipeweeklyhoursworked = c(0, 
    NA, 0, NA, 0, 0, 0, 0, 0, 0), adult = c(0, 9, 0, 9, 0, 0, 
    0, 0, 0, 0), adulted = c(0, 9, 0, 9, 9, 0, 9, 0, 0, 0), dislocatedworker = c(0, 
    9, 0, 9, 0, 0, 0, 0, 0, 0), jobcorps = c(0, 9, 0, 9, 9, 0, 
    0, 0, 0, 0), vocrehab = c(1L, NA, 0L, NA, 1L, 1L, 1L, 0L, 
    1L, 0L), wpempservice = c(0, 9, 0, 9, 9, 9, 9, 0, 0, 0), 
    youth = c(0, 9, 0, 9, 0, 0, 0, 0, 0, 0), youthbuild = c("", 
    "NULL", "", "", "", "", "", "", "", ""), longtermunemp = c(0, 
    9, 0, 9, 0, 0, 0, 1, 1, 0), exhausttanf = c(9, 9, 0, 9, 0, 
    9, 0, 0, 0, 0), fostercareyouth = c(0, 9, 0, 9, 0, 0, 0, 
    0, 0, 0), homelessorrunaway = c(0, 9, 0, 9, 0, 0, 0, 0, 0, 
    0), exoffenderstatus = c(0, 9, 0, 9, 9, 0, 9, 0, 0, 0), lowincomestatus = c(1, 
    9, 0, 9, 0, 1, 0, 0, 0, 1), englishlearner = c(0, 9, 0, 9, 
    0, 0, 0, 0, 0, 0), basicskillsdeficient = c(0, 9, 0, 9, 0, 
    0, 0, 0, 0, 0), culturalbarriers = c(0, 9, 0, 9, 9, 9, 9, 
    0, 0, 9), singleparent = c(9, 9, 0, 9, 9, 1, 9, 0, 0, 0), 
    dishomemaker = c(0, 9, 0, 9, 0, 0, 0, 0, 0, 0), migrantfarmworker = c(0L, 
    NA, 0L, NA, 0L, 0L, 0L, 0L, 0L, 0L), statedisstudentagerange = c("16;22", 
    "14;21", "14;21", "", "14;22", "14;22", "14;22", "14;21", 
    "14;21", "14;21"), schoolgradecompleted = c(12L, NA, 12L, 
    NA, 0L, 0L, 12L, 12L, 9L, 9L), insecondaryed = c(0, 0, 0, 
    0, 0, 0, 0, 0, 1, 1), specialedcertcompdate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), secschooldiplomadate = structure(c(NA, 
    NA, 11855, NA, 17682, NA, 15095, 17344, 18068, NA), class = "Date"), 
    geddate = structure(c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
    ), class = "Date"), enrolledinpostseced = c(NA, NA, 0L, NA, 
    0L, 0L, 1L, 1L, 1L, 0L), credprogramenrolldate = structure(c(NA, 
    NA, NA, NA, NA, NA, NA, 17416, 18506, NA), class = "Date"), 
    completedsomepostseced = c(9, 9, 0, 9, 0, 0, 0, 1, 0, 0), 
    associatedegreedate = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), class = "Date"), bachelordegreedate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), mastersdegreedate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), degreeabovemastersdate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), vtlicensedate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), vtcertificatedate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), otherlicorcertdate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), petsstartdate = structure(c(18428, 
    NA, NA, NA, 17707, NA, NA, 16025, NA, 15895), class = "Date"), 
    jecvragencystaff = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), jecvragencypurchase = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jecpurchaseprovidertype = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jecvrservicepurchaseexpenditure = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), wblevragencystaff = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), wblevragencypurchase = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), wblepurchaseprovidertype = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), wblevrservicepurchaseexpenditure = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ceovragencystaff = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), ceovragencypurchase = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), ceopurchaseprovidertype = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), ceovrservicepurchaseexpenditure = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), wrtvragencystaff = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), wrtvragencypurchase = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), wrtpurchaseprovidertype = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), wrtvrservicepurchaseexpenditure = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), isavragencystaff = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), isavragencypurchase = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), isapurchaseprovidertype = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), isavrservicepurchaseexpenditure = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), vrservicestartdate = structure(c(NA, 
    NA, 16835, NA, 14670, 17458, 12262, 16414, 16463, 15895), class = "Date"), 
    careerservicedate = structure(c(18715, 18489, NA, NA, 18271, 
    18411, 17774, 18660, NA, NA), class = "Date"), gcutvragencypurchase = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), gcutpurchaseprovidertype = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), gcutvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), gcutcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), gcutcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), fycutvragencypurchase = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), fycutpurchaseprovidertype = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), fycutvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), fycutcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), fycutcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), jcctvragencypurchase = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jcctpurchaseprovidertype = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jcctvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jcctcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jcctcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), ovtvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ovtvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), ovtpurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), ovtvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ovtcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ovtcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), ojtvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ojtvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), ojtpurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), ojtvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ojtcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ojtcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), ratvragencypurchase = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ratpurchaseprovidertype = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ratvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ratcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), ratcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), barltvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), barltvragencypurchase = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), barltpurchaseprovidertype = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), barltvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), barltcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), barltcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), jrtvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jrtvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), jrtpurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), jrtvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jrtcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jrtcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), drstvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), drstvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), drstpurchaseprovidertype = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), drstvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), drstcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), drstcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), mtvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), mtvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), mtpurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), mtvrtitleifundsexpended = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), mtcompserviceprovider = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), mtcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), rsetvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), rsetvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), rsetpurchaseprovidertype = c(FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
    ), rsetvrtitleifundsexpended = c(0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0), rsetcompserviceprovider = c(FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), rsetcompserviceprovidertype = c("NULL", 
    "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", 
    "NULL"), ctvragencystaff = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0
    ), ctvragencypurchase = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    ctpurchaseprovidertype = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    ctvrtitleifundsexpended = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    ctcompserviceprovider = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    ctcompserviceprovidertype = c("NULL", "NULL", "NULL", "NULL", 
    "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"), assvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), assvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), asspurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), assvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), asscompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), asscompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), dtivragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), dtivragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), dtipurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), dtivrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), dticompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), dticompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), vrcgvragencystaff = c(0, 
    0, 1, 0, 0, 0, 0, 0, 0, 1), vrcgvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), vrcgpurchaseprovidertype = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), vrcgvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), vrcgcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), vrcgcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), jsavragencystaff = c(0, 
    0, 1, 0, 0, 0, 0, 0, 0, 0), jsavragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), jsapurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), jsavrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jsacompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jsacompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), jpavragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jpavragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), jpapurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), jpavrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jpacompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), jpacompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), stjsvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), stjsvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), stjspurchaseprovidertype = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), stjsvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), stjscompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), stjscompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), sesvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), sesvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), sespurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), sesvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), sessetitlevifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), sescompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), sescompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), irsvragencystaff = c(0, 
    0, 1, 0, 0, 0, 0, 0, 0, 0), irsvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), irspurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), irsvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), irscompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), irscompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), bcvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), bcvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), bcpurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), bcvrtitleifundsexpended = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), bccompserviceprovider = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), bccompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), cesvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), cesvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), cespurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), cesvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), cessetitlevifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), cescompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), cescompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), esvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), esvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), espurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), esvrtitleifundsexpended = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), essetitlevifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), tranvragencystaff = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), tranvragencypurchase = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), tranpurchaseprovidertype = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), tranvrtitleifundsexpended = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), trancompserviceprovider = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), trancompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), mntvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), mntvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), mntpurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), mntvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), mntcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), mntcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), rtvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), rtvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), rtpurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), rtvrtitleifundsexpended = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), rtcompserviceprovider = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), rtcompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), pasvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), pasvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), paspurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), pasvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), pascompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), pascompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), tasvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), tasvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), taspurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), tasvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), tascompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), tascompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), rsvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), rsvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), rspurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), rsvrtitleifundsexpended = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), rscompserviceprovider = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), rscompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), isvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), isvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), ispurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), isvrtitleifundsexpended = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), iscompserviceprovider = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), iscompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), osvragencystaff = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), osvragencypurchase = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), ospurchaseprovidertype = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), osvrtitleifundsexpended = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), oscompserviceprovider = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), oscompserviceprovidertype = c("", 
    "NULL", "", "", "", "", "", "", "", ""), edfuncleveldate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), secondarydate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), postsectransreportcarddate = structure(c(NA, 
    NA, NA, NA, NA, NA, NA, 18295, NA, NA), class = "Date"), 
    trainingmilestonedate = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), class = "Date"), skillgainskillsprogdate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), eoprimoccupationstartdate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), exitdate = structure(c(18738, 
    18535, 18501, 18563, 18704, 18681, 18505, 18590, 18648, 18528
    ), class = "Date"), exittype = c(4L, 3L, 4L, 0L, 4L, 4L, 
    4L, 4L, 4L, 4L), exitreason = c(18L, 19L, 18L, 19L, 17L, 
    17L, 17L, 2L, 18L, 17L), exitempoutcome = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA), exitprimoccupation = c("", "NULL", 
    "", "", "", "", "", "", "", ""), exithourlywage = c(0, NA, 
    0, 0, 0, 0, 0, 0, 0, 0), exitweeklyhoursworked = c(0, NA, 
    0, 0, 0, 0, 0, 0, 0, 0), pecredprogramenrolldate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), pecredattainmentdate = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), pecredentialtype = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), appmonthlypubsup = c("0", 
    "0", "_0", "", "0", "2", "0", "0", "0", "2"), appmedinscov = c("7", 
    "1", "_0", "", "0", "1", "0", "7", "7", "1"), exitmonthlypubsup = c("0", 
    "0", "_0", "", "0", "2", "2", "0", "0", "2"), exitmedinscov = c("7", 
    "1", "_0", "0", "0", "1", "0", "7", "7", "1"), ipeinitialdate = structure(c(18473, 
    NA, 16835, NA, 14656, 17458, 12262, 16414, 16463, 15895), class = "Date"), 
    ipeextensiondate = structure(c(NA, 18515, NA, NA, NA, NA, 
    NA, NA, NA, NA), class = "Date"), enrolledinsecequiv = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), compdisenrollmsg = structure(c(NA, 
    NA, NA, NA, NA, NA, NA, 18586, NA, NA), class = "Date"), 
    wblevvragencystaff = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), wblevvragencypurchase = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), wblevpurchaseprovidertype = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), wblevvrtitleifundsexpended = c(0, 
    0, 0, 0, 0, 111111, 0, 0, 0, -111111), wblevcompserviceprovider = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), wblevcompserviceprovidertype = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), age_app = c(18, 0, 0, 
    6, 9, 9, 9, 10, 10, 10), age_ipe = c(18, NA, 0, NA, 10, 10, 
    9, 15, 14, 10), age_preets = c(18, NA, NA, NA, 18, NA, NA, 
    14, NA, 10), age_exit = c(19, 0, 4, 6, 21, 13, 26, 21, 20, 
    17), age_vrservice = c(NA, NA, 0, NA, 10, 10, 9, 15, 14, 
    10)), row.names = c(NA, -10L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000021fcf475940>)
r data.table
1个回答
0
投票

我怀疑问题与存在于错误环境中的变量有关。尝试重写

validate_and_list
函数,使其显式返回某些内容,而不是依赖它默默地更改
invalid_rows_list
的值,该值仅在该函数外部定义。

@AndrewGustar 是对的。

您需要

<<-
运算符:

  # # Function to check and record invalid entries
  # validate_and_list <- function(col_names, condition, error_message) {
  #   for (col in col_names) {
  #     if (!col %in% names(data)) {
  #       cat(sprintf("Warning: Column '%s' does not exist in the data.\n", col))
  #       next
  #     }
  #     column_data <- data[[col]]
  #     invalid_rows <- which(!condition(column_data) & !is.na(column_data))
  #     if (length(invalid_rows) > 0) {
  #       cat(sprintf("Invalid data found in column '%s':\n", col))
  #       print(data[invalid_rows, .(Row = .I, Value = column_data[invalid_rows])])
          invalid_rows_list[[length(invalid_rows_list) + 1]] <<- data.table( # <--- HERE
  #         Column = col,
  #         Row_Index = invalid_rows,
  #         Value = column_data[invalid_rows],
  #         Message = error_message
  #       )
  #     }
  #   }
  # }

你看,

invalid_rows_list
是在
for
循环之上定义的一个环境。

此后的输出:

output <- detox_validation(check_val)

> output
                         Column Row_Index   Value                                                  Message
                         <char>     <int>  <char>                                                   <char>
  1:                programyear        10   20201                Program year must be exactly four digits.
  2:                      asian         9       A Demographic values must be numeric and either 0, 1, or 9
  3:                   hawaiian         2       2 Demographic values must be numeric and either 0, 1, or 9
  4:  gcutvrtitleifundsexpended         1       0             Sex values must be numeric and either 1 or 2
  5:  gcutvrtitleifundsexpended         2       0             Sex values must be numeric and either 1 or 2
 ---                                                                                                      
349: wblevvrtitleifundsexpended         6  111111             Sex values must be numeric and either 1 or 2
350: wblevvrtitleifundsexpended         7       0             Sex values must be numeric and either 1 or 2
351: wblevvrtitleifundsexpended         8       0             Sex values must be numeric and either 1 or 2
352: wblevvrtitleifundsexpended         9       0             Sex values must be numeric and either 1 or 2
353: wblevvrtitleifundsexpended        10 -111111             Sex values must be numeric and either 1 or 2
© www.soinside.com 2019 - 2024. All rights reserved.