我希望基于公共变量DateTime15连接两个不同长度和时间间隔的数据帧,以便将df2中的Hourly_Rainfall_mm合并到DateTime15匹配的df1中,即每小时。
df1<-structure(list(Tag = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Date = structure(c(1721174400,
1721174400, 1721174400, 1721174400, 1721174400, 1721174400, 1721174400,
1721174400, 1721174400, 1721174400), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(46820, 47764, 48665, 49566,
50468, 51368, 52269, 53170, 54071, 54972), class = c("hms", "difftime"
), units = "secs"), Temp = c(17.9, 17.9, 17.9, 17.8, 17.8, 17.8,
17.8, 17.7, 17.7, 17.7), `Baro (mb)` = c(1016, 1016, 1016, 1016,
1016, 1016, 1016, 1016, 1016, 1016), pH = c(8.45, 8.42, 8.4,
8.38, 8.38, 8.37, 8.37, 8.36, 8.36, 8.37), pHmV = c(-68.3, -67.1,
-66.2, -65.2, -65, -64.8, -64.7, -64.4, -64.3, -64.5), `ORP (REDOX)` = c(225.8,
212.5, 221.1, 229.1, 234, 237.5, 239.6, 240.6, 242.3, 242.6),
DO_Sat = c(107.4, 107.4, 106.6, 106.1, 106.3, 106.4, 106.5,
106.4, 106.7, 106.6), DO_mgL = c(10.16, 10.16, 10.08, 10.06,
10.08, 10.09, 10.09, 10.11, 10.14, 10.13), Conductivity = c(563,
561, 562, 561, 563, 560, 564, 564, 564, 565), `RES (Ohms.cm)` = c(2053,
2061, 2057, 2066, 2057, 2070, 2053, 2057, 2057, 2053), `TDS (mg/L)` = c(365,
364, 365, 364, 365, 364, 366, 366, 366, 367), `SAL (ppt)` = c(0.24,
0.24, 0.24, 0.24, 0.24, 0.23, 0.24, 0.24, 0.24, 0.24), `SSG (st)` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), NH3 = c(0.01, 0.01, 0.01, 0.01,
0.01, 0.01, 0.01, 0.01, 0.01, 0.01), cDOM = c(8.9, 8.9, 9.1,
9, 7.9, 8.7, 9.2, 8.7, 8.6, 8.9), Ammonium = c(0.13, 0.13,
0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12), DateTime = structure(c(1721217620,
1721218564, 1721219465, 1721220366, 1721221268, 1721222168,
1721223069, 1721223970, 1721224871, 1721225772), class = c("POSIXct",
"POSIXt"), tzone = ""), DateTime15 = structure(c(1721217600,
1721218500, 1721219400, 1721220300, 1721221200, 1721222100,
1721223000, 1721223900, 1721224800, 1721225700), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, 10L), class = "data.frame")
df2<-structure(list(DateTime = structure(c(1721221200.055, 1721224800.06,
1721228400.065, 1721232000.07, 1721235600.075, 1721239200.08,
1721242800.085, 1721246400.09, 1721250000.095, 1721253600.1), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Hourly_Rainfall_mm = c(0, 0, 0, 0, 0, 0.1, 0.1, 0.8,
1.4, 1.1), DateTime15 = structure(c(1721221200.055, 1721224800.06,
1721228400.065, 1721232000.07, 1721235600.075, 1721239200.08,
1721242800.085, 1721246400.09, 1721250000.095, 1721253600.1), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
我花了几个小时浏览 stackoverflow 问题试图找到解决方案,并应用了许多答案中的代码,但我似乎遗漏了一些东西,因为它们都不适用于我的数据帧。
JoinedDF<-left_join(df1, df2, by="DateTime15") # Returns all NA for Hourly_Rainfall_mm
JoinedDF<- merge(df1, df2, by.x = df1$DateTime15,
by.y= df2$DateTime15, all=TRUE) # gives error in fix.by(by.x, x) : 'by' must match numbers of columns
JoinedDF<-merge(df1, df2, by="DateTime15", all=TRUE) # inserts an additional row in the dataframe
如果有任何帮助,我将非常感激。
您应该使用how='left'来确保包含df1中的所有行,并且df2中的行基于DateTime15进行匹配。
Hourly_Rainfall_mm 显示为 NaN,因为 df1 和 df2 中的 DateTime15 值不完全匹配。日期时间值的微小差异(即使是毫秒)也可能导致合并失败。所以我将 DateTime15 值四舍五入到最接近的小时。
[![每小时_降雨量_mm][1]][1]
您可以尝试一下这个代码吗?
import pandas as pd
# Define df1
df1 = pd.DataFrame({
'Tag': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': pd.to_datetime(['2024-07-15'] * 10, utc=True),
'Time': pd.to_timedelta(['13:00:20', '13:16:04', '13:31:05', '13:46:06', '14:01:08', '14:16:08', '14:31:09', '14:46:10', '15:01:11', '15:16:12']),
'Temp': [17.9, 17.9, 17.9, 17.8, 17.8, 17.8, 17.8, 17.7, 17.7, 17.7],
'Baro (mb)': [1016] * 10,
'pH': [8.45, 8.42, 8.4, 8.38, 8.38, 8.37, 8.37, 8.36, 8.36, 8.37],
'pHmV': [-68.3, -67.1, -66.2, -65.2, -65, -64.8, -64.7, -64.4, -64.3, -64.5],
'ORP (REDOX)': [225.8, 212.5, 221.1, 229.1, 234, 237.5, 239.6, 240.6, 242.3, 242.6],
'DO_Sat': [107.4, 107.4, 106.6, 106.1, 106.3, 106.4, 106.5, 106.4, 106.7, 106.6],
'DO_mgL': [10.16, 10.16, 10.08, 10.06, 10.08, 10.09, 10.09, 10.11, 10.14, 10.13],
'Conductivity': [563, 561, 562, 561, 563, 560, 564, 564, 564, 565],
'RES (Ohms.cm)': [2053, 2061, 2057, 2066, 2057, 2070, 2053, 2057, 2057, 2053],
'TDS (mg/L)': [365, 364, 365, 364, 365, 364, 366, 366, 366, 367],
'SAL (ppt)': [0.24] * 10,
'SSG (st)': [0] * 10,
'NH3': [0.01] * 10,
'cDOM': [8.9, 8.9, 9.1, 9, 7.9, 8.7, 9.2, 8.7, 8.6, 8.9],
'Ammonium': [0.13] * 9 + [0.12],
'DateTime': pd.to_datetime(['2024-07-16 01:07:00', '2024-07-16 01:16:04', '2024-07-16 01:31:05', '2024-07-16 01:46:06', '2024-07-16 02:01:08', '2024-07-16 02:16:08', '2024-07-16 02:31:09', '2024-07-16 02:46:10', '2024-07-16 03:01:11', '2024-07-16 03:16:12'], utc=True),
'DateTime15': pd.to_datetime(['2024-07-16 01:06:40', '2024-07-16 01:15:00', '2024-07-16 01:23:20', '2024-07-16 01:31:40', '2024-07-16 02:00:00', '2024-07-16 02:15:00', '2024-07-16 02:30:00', '2024-07-16 02:45:00', '2024-07-16 03:00:00', '2024-07-16 03:15:00'], utc=True)
})
# Define df2
df2 = pd.DataFrame({
'DateTime': pd.to_datetime(['2024-07-16 02:00:00.055', '2024-07-16 03:00:00.060', '2024-07-16 04:00:00.065', '2024-07-16 05:00:00.070', '2024-07-16 06:00:00.075', '2024-07-16 07:00:00.080', '2024-07-16 08:00:00.085', '2024-07-16 09:00:00.090', '2024-07-16 10:00:00.095', '2024-07-16 11:00:00.100'], utc=True),
'Hourly_Rainfall_mm': [0, 0, 0, 0, 0, 0.1, 0.1, 0.8, 1.4, 1.1],
'DateTime15': pd.to_datetime(['2024-07-16 02:00:00.055', '2024-07-16 03:00:00.060', '2024-07-16 04:00:00.065', '2024-07-16 05:00:00.070', '2024-07-16 06:00:00.075', '2024-07-16 07:00:00.080', '2024-07-16 08:00:00.085', '2024-07-16 09:00:00.090', '2024-07-16 10:00:00.095', '2024-07-16 11:00:00.100'], utc=True)
})
# Round DateTime15 to the nearest hour to ensure exact matches
df1['DateTime15'] = df1['DateTime15'].dt.round('H')
df2['DateTime15'] = df2['DateTime15'].dt.round('H')
# Merge the dataframes
JoinedDF = pd.merge(df1, df2[['DateTime15', 'Hourly_Rainfall_mm']], on='DateTime15', how='left')
# Print the merged dataframe
print(JoinedDF)
[1]: https://i.sstatic.net/lQu0uee9.png