Python - 将邮政编码作为字符串加载到数据帧中？

Question

我正在使用 Pandas 加载包含邮政编码（例如 32771）的 Excel 电子表格。邮政编码以 5 位数字符串形式存储在电子表格中。当使用命令将它们拉入 DataFrame 时...

xls = pd.ExcelFile("5-Digit-Zip-Codes.xlsx")
dfz = xls.parse('Zip Codes')

它们被转换成数字。所以“00501”变成了 501。

所以我的问题是，我该怎么做：

a.加载DataFrame并保留Excel文件中存储的邮政编码的字符串类型？

b.将 DataFrame 中的数字转换为五位数字字符串，例如“501”变成“00501”？

Answer 1

作为解决方法，您可以使用

int

将

Series.str.zfill 转换为长度为 5 的 0 填充字符串：

df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)

演示：

import pandas as pd
df = pd.DataFrame({'zipcode':['00501']})
df.to_excel('/tmp/out.xlsx')
xl = pd.ExcelFile('/tmp/out.xlsx')
df = xl.parse('Sheet1')
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
print(df)

产量

  zipcode
0   00501

Answer 2

您可以使用自定义转换器来避免 panda 的类型推断，例如如果

'zipcode'

是带有邮政编码的列的标题：

dfz = xls.parse('Zip Codes', converters={'zipcode': lambda x:x})

这可以说是一个错误，因为该列最初是字符串编码的，造成了问题here

Answer 3

str(my_zip).zfill(5)

或

print("{0:>05s}".format(str(my_zip)))

这是许多方法中的两种

Answer 4

之前的答案已正确建议使用

zfill(5)

。但是，如果您的邮政编码由于某种原因已经是

float

数据类型（我最近遇到这样的数据），您首先需要将其转换为

int

。然后你就可以使用

zfill(5)

。

df = pd.DataFrame({'zipcode':[11.0, 11013.0]})

    zipcode
0   11.0
1   11013.0

df['zipcode'] = df['zipcode'].astype(int).astype(str).str.zfill(5)

    zipcode
0   00011
1   11013

Answer 5

Pandas.read_excel 文档说，您可以通过将 dtype 指定为

object

来完全保留 Excel 工作表中的数据： https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

dtypeType 列的名称或字典 -> 类型，默认 None 数据或列的数据类型。例如。 {‘a’: np.float64, ‘b’: np.int32} 使用对象保留 Excel 中存储的数据，而不解释 dtype。如果指定了转换器，则将应用它们而不是数据类型转换。

所以，像这样的东西应该有效：

xls = pd.read_excel("5-Digit-Zip-Codes.xlsx", dtype=dtype={'zip_code': object, 'other_col': str})

（注意：现在不在我的工作电脑上，所以还无法测试）

Python - 将邮政编码作为字符串加载到数据帧中？

问题描述投票：0回答：5

5个回答

最新问题

Python - 将邮政编码作为字符串加载到数据帧中？

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5