pyhon 向 bigquery 发送数据返回字段不匹配

问题描述 投票:0回答:1

我正在使用 python 和 pandas 将数据推送到 BigQuery。该脚本运行良好,但我有一个错误:

object of type <class 'str'> cannot be converted to int
File "/**/bq.py", line 71, in post
job = self.client.load_table_from_dataframe(df,
File "/**/tobq.py", line 99, in <module>
bq.post(data, target_table="tablemane")
pyarrow.lib.ArrowTypeError: object of type <class 'str'> cannot be 
converted to int

我理解这个错误,但我无法定位它,我的数据

df.dtypes
和 BQ 模式看起来都是一致的。 例如(这里我试图只推一行)

我的代码:

    df = pd.DataFrame(data)
    df = df.reset_index(drop=True)
    df['crawl_date'] = pd.to_datetime(df['crawl_date']).dt.date
    df['r_index'] = df['rank_index'].astype('float')
    df['v_index']=df['visibility_index'].astype('float')
    df['s_var']=df['serp_var'].astype('float')
    df['kd']=df['kd'].astype('float')        
    df['camp_id'] = df['camp_id'].astype('int64')        
    print(df.head())
    print(df.isnull().sum())
    
    table_id = f"{os.getenv('GCP_DATASET_NAME')}.{table_name}"
    
    print(df.dtypes)
    df.to_gbq(destination_table=table_id, table_schema=schema_path, project_id=os.getenv('GCP_PROJECT_NAME'), if_exists='append')

我的数据:

[{'crawl_date': '2021-03-22', 'domain': 'www.example.com', 'categ': 't1', 'position': 1, 'position_spread': 'TOP_5', 'position_change': 0, 'v_index': 100, 'r_index': 100, 'estimated_traffic': 101881, 'traffic_change': 0, 'max_traffic': 0, 'device': 'desktop', 'top_rank': 1, 's_var': 0, 'kwd': '****** pro', 'volume': 461000, 'kd': 0, 'camp_id': 2, 'camp_name': '******'}]

print(df.dtypes)

crawl_date            object
domain                object
categ                 object
position               int64
position_spread       object
position_change        int64
v_index     float64
r_index           float64
estimated_traffic      int64
traffic_change         int64
max_traffic            int64
device                object
top_rank            int64
s_var             float64
kwd               object
volume                 int64
kd                   float64
camp_id                int64
camp_name             object

最后,我的 BQ 架构:

[
{
   "name": "crawl_date",
   "mode": "NULLABLE",
   "type": "DATE",
   "description": null,
   "fields": []
 },
 {
    "name": "domain",
    "mode": "NULLABLE",
    "type": "STRING",
    "description": null,
    "fields": []
 },
{
    "name": "categ",
    "mode": "NULLABLE",
    "type": "STRING",
    "description": null,
   "fields": []
  },
 {
    "name": "position",
    "mode": "NULLABLE",
    "type": "INTEGER",
    "description": null,
    "fields": []
 },
 {
    "name": "position_spread",
    "mode": "NULLABLE",
    "type": "STRING",
    "description": null,
    "fields": []
  },
{
    "name": "position_change",
    "mode": "NULLABLE",
    "type": "INTEGER",
    "description": null,
    "fields": []
 },
 {
    "name": "v_index",
    "mode": "NULLABLE",
    "type": "FLOAT",
    "description": null,
    "fields": []
 },
 {
    "name": "r_index",
    "mode": "NULLABLE",
    "type": "FLOAT",
    "description": null,
    "fields": []
  },
  {
    "name": "estimated_traffic",
    "mode": "NULLABLE",
    "type": "INTEGER",
    "description": null,
   "fields": []
  },
 {
    "name": "traffic_change",
    "mode": "NULLABLE",
    "type": "INTEGER",
    "description": null,
    "fields": []
  },
 {
    "name": "max_traffic",
    "mode": "NULLABLE",
    "type": "INTEGER",
    "description": null,
    "fields": []
  }, 
  {
    "name": "device",
    "mode": "NULLABLE",
    "type": "STRING",
    "description": null,
   "fields": []
  },
  {
    "name": "top_rank",
    "mode": "NULLABLE",
    "type": "INTEGER",
    "description": null,
    "fields": []
  }, 
 {
    "name": "s_var",
    "mode": "NULLABLE",
    "type": "FLOAT",
    "description": null,
    "fields": []
  },
  {
    "name": "kwd",
    "mode": "NULLABLE",
    "type": "STRING",
    "description": null,
    "fields": []
  },
  {
    "name": "volume",
    "mode": "NULLABLE",
    "type": "INTEGER",
    "description": null,
    "fields": []
  },
  {
    "name": "kd",
    "mode": "NULLABLE",
    "type": "FLOAT",
    "description": null,
    "fields": []
   },
  { 
    "name": "camp_id",
    "mode": "NULLABLE",
    "type": "INTEGER",
    "description": null,
    "fields": []
  },
  { 
     "name": "camp_name",
     "mode": "NULLABLE",
     "type": "STRING",
     "description": null,
     "fields": []
  }
]
python pandas google-bigquery
1个回答
0
投票

在发送到大查询之前,尝试将crawl_data从对象更改为日期时间对象。

import pandas as pd

df['crawl_date'] = pd.to_datetime(df['crawl_date'])

© www.soinside.com 2019 - 2024. All rights reserved.