我有一个如下所示的数据框:
OrdNo year
1 20059999
2 20070830
3 20070719
4 20030719
5 20039999
6 20070911
7 20050918
8 20070816
9 20069999
如果 Pandas Dataframe 中的最后 4 位数字是 9999,如何用 0101 替换它们?
谢谢
假设您的
year
列的类型为 str
:
df["year"] = df["year"].str.replace("(9999)$", "0101")
如果是
numeric
类型
df["year"] = pd.to_numeric(df["year"].astype(str).str.replace("(9999)$", "0101"), errors="coerce")
我创建了一个脚本来解释如何解决这个问题。请注意,这是一个非常冗长的版本,它可能而且应该被压缩,但我已尽力使其尽可能清晰地遵循。 另外,如果您是新手开发人员,一个好的练习方法是想出一些步骤来解决您头脑中的问题(或写下来),并深入研究库的文档以尝试找到一个好的解决方案。
import pandas as pd
# Creating dataframe
data = [[1, 20059999], [2, 20070830], [3, 20070719], [4, 20030719], [5, 20039999], [6, 20070911], [7, 20050918], [8, 20070816], [9, 20069999]]
df = pd.DataFrame(data, columns=['OrdNo', 'year'])
# Iterating through dataframe
for index, row in df.iterrows():
# Here we take the columns from the row we are in right now
OrdNo = row['OrdNo']
year = row['year']
# Taking last four digits from year int. We need to convert the year int to string to do this. -4: basically
# tells the code to start at the end (-), move 4 characters back (4) and return everything from that point to the
# end (:)
lastfour = str(year)[-4:]
# Check if last four digits are 9999 (as string, because lastfour is a string)
if lastfour == "9999":
# If true, replace the 9999 with 0101
# First we take the year but remove the last four digits (the 9999)
year = str(year)[:-4]
# Then we add 0101 to the year
newyear = year + "0101"
# Now convert it back to int
newyear = int(newyear)
# And put it back in the dataframe
# We use loc to find based on the OrdNo and then we replace the year column by our new value
df.loc[df['OrdNo'] == OrdNo, 'year'] = newyear
# Lets print the result
print(df.to_string(index=False))