有人可以帮我解决以下问题吗?
我正在尝试以 mdy(月日年)格式解析列“date2”
date1 date2 date2b date3 date4 date5
<chr> <chr> <chr> <chr> <chr> <chr>
1 5/13/2013 22/9/2012 22-Sep-12 40958 2012/28/08 2010-25-11
2 5/4/2013 26/2/2012 26-Feb-12 41271 2012/15/4 2010-24-02
3 5/15/2013 29/11/2012 29-Nov-12 40942 2/3/2012 2010-28-3
4 4/17/2013 3/2/2012 3-Feb-12 40954 2/15/2012 6/19/2010
5 12/20/2013 3/20/2012 20-Mar-12 40944 2012/14/11 5/11/2010
6 26/02/2013 29/11/2012 15-Aug-03 22/9/2012 2012/05/06 26/02/2013
dput(data)
structure(list(date1 = c("5/13/2013", "5/4/2013", "5/15/2013",
"4/17/2013", "12/20/2013", "26/02/2013", "4/17/2013"), date2 = c("22/9/2012",
"26/2/2012", "29/11/2012", "3/2/2012", "3/20/2012", "29/11/2012",
"2/8/2012"), date2b = c("22-Sep-12", "26-Feb-12", "29-Nov-12",
"3-Feb-12", "20-Mar-12", "15-Aug-03", "12/17/2010"), date3 = c("40958",
"41271", "40942", "40954", "40944", "22/9/2012", "14.05.2013"
), date4 = c("2012/28/08", "2012/15/4", "2/3/2012", "2/15/2012",
"2012/14/11", "2012/05/06", "14.05.2012"), date5 = c("2010-25-11",
"2010-24-02", "2010-28-3", "6/19/2010", "5/11/2010", "26/02/2013",
"18/11/2010")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-7L))
我的理解是,R 应该首先尝试将其格式化为 mdy (参见下面的代码),然后(如果日期 >12 则不可能)格式化为 dmy。
> parse_date_time(data$date2, orders = c("mdy", "dmy"))
[1] "2012-09-22 UTC" "2012-02-26 UTC" "2012-11-29 UTC" "2012-02-03 UTC" "2012-03-20 UTC"
[6] "2012-11-29 UTC" "2012-08-02 UTC"
dput(parse_date_time(data$date2, orders = c("mdy", "dmy")))
structure(c(1348272000, 1330214400, 1354147200, 1328227200, 1332201600,
1354147200, 1343865600), class = c("POSIXct", "POSIXt"), tzone = "UTC")
因此,值 3/2/2012 和 2/8/2012 应解析为 March 3rd, 2012 和 Feb 2nd, 2012。然而,结果是 2012 年 2 月 3 日 和 2012 年 8 月 2 日。
我做错了什么?
library(readxl)
library(lubridate)
# data <- read_excel("./data/Data.xlsx")
data <- structure(list(date1 = c("5/13/2013", "5/4/2013", "5/15/2013",
"4/17/2013", "12/20/2013", "26/02/2013", "4/17/2013"),
date2 = c("22/9/2012", "26/2/2012", "29/11/2012",
"3/2/2012", "3/20/2012", "29/11/2012", "2/8/2012"),
date2b = c("22-Sep-12", "26-Feb-12", "29-Nov-12", "3-Feb-12",
"20-Mar-12", "15-Aug-03", "12/17/2010"),
date3 = c("40958", "41271", "40942", "40954",
"40944", "22/9/2012", "14.05.2013"),
date4 = c("2012/28/08", "2012/15/4", "2/3/2012", "2/15/2012",
"2012/14/11", "2012/05/06", "14.05.2012"),
date5 = c("2010-25-11", "2010-24-02", "2010-28-3", "6/19/2010",
"5/11/2010", "26/02/2013","18/11/2010")),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,-7L))
# Print the first 6 rows of the data
head(data)
parse_date_time(data$date2, orders = c("mdy", "dmy"))
[1] "2012-09-22 UTC" "2012-02-26 UTC" "2012-11-29 UTC" "2012-02-03 UTC" "2012-03-20 UTC"
[6] "2012-11-29 UTC" "2012-08-02 UTC"
提前非常感谢您!!
我尝试使用该函数:parse_date_time(ds$date2,orders = c('mdy','dmy')) 不幸的是,它不起作用
日期格式化的乐趣...
像
3/2/212
这样的日期,其中日期和月份都小于 13,会给您带来棘手的问题并润滑。
parse_date_time(data$date2, orders = c("mdy", "dmy"))
该函数不知道您的偏好,除非它无法匹配第一个格式。在这种情况下,它不会失败,因此被解释为
February 3rd, 2012.
我的感觉是同意@IRTFM;更好的方法是将输入数据转换为统一格式 YYYY-MM-DD。
简而言之 - 您需要清理输入数据...就目前情况而言,您正在使用图像中显示的数据查看大量条件逻辑。希望以上内容能为您指明正确的方向。 ́_(ツ)_/́