我正在使用 python 和 pandas 将数据推送到 BigQuery。该脚本运行良好,但我有一个错误:
object of type <class 'str'> cannot be converted to int
File "/**/bq.py", line 71, in post
job = self.client.load_table_from_dataframe(df,
File "/**/tobq.py", line 99, in <module>
bq.post(data, target_table="tablemane")
pyarrow.lib.ArrowTypeError: object of type <class 'str'> cannot be
converted to int
我理解这个错误,但我无法定位它,我的数据
df.dtypes
和 BQ 模式看起来都是一致的。
例如(这里我试图只推一行)
我的代码:
df = pd.DataFrame(data)
df = df.reset_index(drop=True)
df['crawl_date'] = pd.to_datetime(df['crawl_date']).dt.date
df['r_index'] = df['rank_index'].astype('float')
df['v_index']=df['visibility_index'].astype('float')
df['s_var']=df['serp_var'].astype('float')
df['kd']=df['kd'].astype('float')
df['camp_id'] = df['camp_id'].astype('int64')
print(df.head())
print(df.isnull().sum())
table_id = f"{os.getenv('GCP_DATASET_NAME')}.{table_name}"
print(df.dtypes)
df.to_gbq(destination_table=table_id, table_schema=schema_path, project_id=os.getenv('GCP_PROJECT_NAME'), if_exists='append')
我的数据:
[{'crawl_date': '2021-03-22', 'domain': 'www.example.com', 'categ': 't1', 'position': 1, 'position_spread': 'TOP_5', 'position_change': 0, 'v_index': 100, 'r_index': 100, 'estimated_traffic': 101881, 'traffic_change': 0, 'max_traffic': 0, 'device': 'desktop', 'top_rank': 1, 's_var': 0, 'kwd': '****** pro', 'volume': 461000, 'kd': 0, 'camp_id': 2, 'camp_name': '******'}]
print(df.dtypes)
crawl_date object
domain object
categ object
position int64
position_spread object
position_change int64
v_index float64
r_index float64
estimated_traffic int64
traffic_change int64
max_traffic int64
device object
top_rank int64
s_var float64
kwd object
volume int64
kd float64
camp_id int64
camp_name object
最后,我的 BQ 架构:
[
{
"name": "crawl_date",
"mode": "NULLABLE",
"type": "DATE",
"description": null,
"fields": []
},
{
"name": "domain",
"mode": "NULLABLE",
"type": "STRING",
"description": null,
"fields": []
},
{
"name": "categ",
"mode": "NULLABLE",
"type": "STRING",
"description": null,
"fields": []
},
{
"name": "position",
"mode": "NULLABLE",
"type": "INTEGER",
"description": null,
"fields": []
},
{
"name": "position_spread",
"mode": "NULLABLE",
"type": "STRING",
"description": null,
"fields": []
},
{
"name": "position_change",
"mode": "NULLABLE",
"type": "INTEGER",
"description": null,
"fields": []
},
{
"name": "v_index",
"mode": "NULLABLE",
"type": "FLOAT",
"description": null,
"fields": []
},
{
"name": "r_index",
"mode": "NULLABLE",
"type": "FLOAT",
"description": null,
"fields": []
},
{
"name": "estimated_traffic",
"mode": "NULLABLE",
"type": "INTEGER",
"description": null,
"fields": []
},
{
"name": "traffic_change",
"mode": "NULLABLE",
"type": "INTEGER",
"description": null,
"fields": []
},
{
"name": "max_traffic",
"mode": "NULLABLE",
"type": "INTEGER",
"description": null,
"fields": []
},
{
"name": "device",
"mode": "NULLABLE",
"type": "STRING",
"description": null,
"fields": []
},
{
"name": "top_rank",
"mode": "NULLABLE",
"type": "INTEGER",
"description": null,
"fields": []
},
{
"name": "s_var",
"mode": "NULLABLE",
"type": "FLOAT",
"description": null,
"fields": []
},
{
"name": "kwd",
"mode": "NULLABLE",
"type": "STRING",
"description": null,
"fields": []
},
{
"name": "volume",
"mode": "NULLABLE",
"type": "INTEGER",
"description": null,
"fields": []
},
{
"name": "kd",
"mode": "NULLABLE",
"type": "FLOAT",
"description": null,
"fields": []
},
{
"name": "camp_id",
"mode": "NULLABLE",
"type": "INTEGER",
"description": null,
"fields": []
},
{
"name": "camp_name",
"mode": "NULLABLE",
"type": "STRING",
"description": null,
"fields": []
}
]
在发送到大查询之前,尝试将crawl_data从对象更改为日期时间对象。
import pandas as pd
df['crawl_date'] = pd.to_datetime(df['crawl_date'])