如果可能,将数字列转换为整数,否则保留数字

问题描述 投票:0回答:1

背景

我正在使用

read_csv()
(来自 )导入和清理数据集(头包含在下面),并注意到可能应该是整数列的各种列被 readr 保留为数字列。

问题

如果可能的话(即如果不会出现舍入/精度损失),将数字列转换为整数列的优雅方法是什么?如果不可能,则将它们保留为数字?

对于示例数据集,这意味着转换列“执行”、“最高教育水平”、“TDCJ 编号”、“执行年龄”和“体重”。

它不需要是一个 tidyverse 解决方案,基础 R 或任何包也可以。

已完成的研究:

我尝试搜索Google和Stack Overflow,但没有任何运气。老实说,我很惊讶这个问题以前没有被问过!

read.csv()
转换为整数,但不会自动转换日期,如
read_csv()

数据:

# the entire dataset can be downloaded at https://archive.org/download/tx_deathrow_full/tx_deathrow_full.csv if you wish. The head is below

library(tidyverse)

tx_deathrow <- structure(list(Execution = c(553, 552, 551, 550, 549, 548), `Date of Birth` = structure(c(5014, 
-6701, 4110, 6302, 3737, -5266), class = "Date"), `Date of Offence` = structure(c(12743, 
3433, 12389, 13975, 13039, 11444), class = "Date"), `Highest Education Level` = c(9, 
12, 10, 11, 12, 12), `Last Name` = c("Young", "Bible", "Castillo", 
"Davila", "Rodriguez III", "Battaglia"), `First Name` = c("Christopher Anthony", 
"Danny Paul", "Juan Edward", "Erick Daniel", "Rosendo", "John David"
), `TDCJ
Number` = c(999508, 999455, 999502, 999545, 999534, 
999412), `Age at Execution` = c(34, 66, 37, 31, 38, 62), `Date Received` = structure(c(13238, 
12250, 13053, 14302, 14013, 11808), class = "Date"), `Execution Date` = structure(c(17729, 
17709, 17667, 17646, 17617, 17563), class = "Date"), Race = c("Black", 
"White", "Hispanic", "Black", "Hispanic", "White"), County = c("Bexar", 
"Harris", "Bexar", "Tarrant", "Lubbock", "Dallas"), `Eye Color` = c("Brown", 
"Blue", "Brown", "Brown", "Brown", "Green"), Weight = c(216, 
194, 180, 161, 198, 188), Height = c("6' 1\"", "5' 7\"", "5' 11\"", 
"5' 11\"", "5' 8\"", "6' 0\""), `Native County` = c("Bexar", 
"Brazoria", "Bexar", "Tarrant", "Wichita", "Dallas"), `Native State` = c("Texas", 
"Texas", "Texas", "Texas", "Texas", "Texas"), `Last Statement` = c("l want to make sure the Patel family knows I love them like they love me. Make sure the kids in the world know I'm being executed and those kids I've been mentoring keep this fight going. I'm good Warden.", 
NA, "To everyone that has been there for me you know who you are. Love y'all. See y'all on the other side.That's it.", 
"Yes, I would like to say nephew it burns huh. You know I might have lost the fight but I'm still a soldier. I still love you all. To my supporters and family y'all hold it down. Ten Toes down right. That's all.", 
"First I would like to say I have been here since September 2005.  I had the honor and privilege to know many prison guards and staff.  I want to thank all of them.  I would like for everyone to write the people on death row as they are all good men and I am very happy I got to know them.  All of their lives are worth knowing about.\n\nSecondly on February 14th the medical examiner and the chief nurse were engaged in numerous false illegal acts.  They tried to cover up that thousands were wrongfully convicted by Matt Powell, district attorney.  This needs to be brought to justice.\n\nI call upon the FBI to investigate Matt Powell and the Lubbock County Medical Examiner.  Lastly, I was born and raised Catholic and it was not lost upon me that this is Holy Week and last Sunday was Palm Sunday.  Yesterday was my birthday.  Today is the day I join my God and father.  The state may have my body but not my soul.\n\nIn order to save my brothers on death row I call upon Pope Francis and all the people of the world.\n\nLastly, I want everyone to boycott every single business  in the state of Texas until all the businesses are pressed to stop the death penalty.\n\nWith that Lord I commend my spirit.\n\nWarden I am ready to join my father.", 
"No, Well, Hi Mary Jean. See y'all later. Go ahead please.")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
r dplyr type-conversion data-cleaning readr
1个回答
0
投票

虽然评论中已经建议了一些 {readr} 函数,例如

parse_guess()
type_convert()
,但它们只能正确地处理
character
类型的列,这就是为什么它们无法更正应该是
numeric
的列而不是
double

一个技巧是使用参数

col_types = cols(.default = "c")
将所有列读入字符。然后我们可以将结果通过管道传输到
type_convert()
,并将
guess_integer
设置为
TRUE

不知道为什么 {readr} 在读取 csv 时使用不同的过程在内部猜测列 - 对我来说没有多大意义。

library(readr)
library(dplyr)

tx_deathrow <- read_csv('https://archive.org/download/tx_deathrow_full/tx_deathrow_full.csv',
                        col_types = cols(.default = "c")) %>% 
  type_convert(guess_integer = TRUE)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   Execution = col_integer(),
#>   `Date of Birth` = col_date(format = ""),
#>   `Date of Offence` = col_date(format = ""),
#>   `Highest Education Level` = col_double(),
#>   `Last Name` = col_character(),
#>   `First Name` = col_character(),
#>   `TDCJ
#> Number` = col_integer(),
#>   `Age at Execution` = col_integer(),
#>   `Date Received` = col_date(format = ""),
#>   `Execution Date` = col_date(format = ""),
#>   Race = col_character(),
#>   County = col_character(),
#>   `Eye Color` = col_character(),
#>   Weight = col_integer(),
#>   Height = col_character(),
#>   `Native County` = col_character(),
#>   `Native State` = col_character(),
#>   `Last Statement` = col_character()
#> )

tx_deathrow %>% 
  glimpse()
#> Rows: 553
#> Columns: 18
#> $ Execution                 <int> 553, 552, 551, 550, 549, 548, 547, 546, 545,…
#> $ `Date of Birth`           <date> 1983-09-24, 1951-08-28, 1981-04-03, 1987-04…
#> $ `Date of Offence`         <date> 2004-11-21, 1979-05-27, 2003-12-03, 2008-04…
#> $ `Highest Education Level` <dbl> 9, 12, 10, 11, 12, 12, 12, 12, 11, 8, 10, 9,…
#> $ `Last Name`               <chr> "Young", "Bible", "Castillo", "Davila", "Rod…
#> $ `First Name`              <chr> "Christopher Anthony", "Danny Paul", "Juan E…
#> $ `TDCJ\nNumber`            <int> 999508, 999455, 999502, 999545, 999534, 9994…
#> $ `Age at Execution`        <int> 34, 66, 37, 31, 38, 62, 64, 55, 47, 38, 46, …
#> $ `Date Received`           <date> 2006-03-31, 2003-07-17, 2005-09-27, 2009-02…
#> $ `Execution Date`          <date> 2018-07-17, 2018-06-27, 2018-05-16, 2018-04…
#> $ Race                      <chr> "Black", "White", "Hispanic", "Black", "Hisp…
#> $ County                    <chr> "Bexar", "Harris", "Bexar", "Tarrant", "Lubb…
#> $ `Eye Color`               <chr> "Brown", "Blue", "Brown", "Brown", "Brown", …
#> $ Weight                    <int> 216, 194, 180, 161, 198, 188, 179, 198, 204,…
#> $ Height                    <chr> "6' 1\"", "5' 7\"", "5' 11\"", "5' 11\"", "5…
#> $ `Native County`           <chr> "Bexar", "Brazoria", "Bexar", "Tarrant", "Wi…
#> $ `Native State`            <chr> "Texas", "Texas", "Texas", "Texas", "Texas",…
#> $ `Last Statement`          <chr> "l want to make sure the Patel family knows …

创建于 2024-03-16,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.