我从后端上传附加的 JSON 到 Google 存储桶, 现在我尝试将此 JSON 连接到 Bigquery 表,但出现以下错误,我需要进行哪些更改?
读取表时出错:XXXXX,错误消息:无法解析 JSON:启动新数组时未找到对象。; BeginArray 返回 false;解析器在字符串结束之前终止
[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"],["video_screen","click_on_screen","false","202011230213","1","4","0"],["video_screen","click_on_screen","false","202011230633","1","4","0"],["video_screen","click_on_screen","false","202011230709","1","4","0"],["video_screen","click_on_screen","false","202011230712","1","4","0"],["video_screen","click_on_screen","false","202011230723","1","4","0"],["video_screen","click_on_screen","false","202011230725","1","4","0"],["video_screen","click_on_screen","false","202011231739","1","4","0"],["category","select","MTV","202011232228","1","3","0"],["sign_in","enter","user_details","202011230108","2","3","0"],["sign_in","enter","user_details","202011230442","2","3","0"],["video","select","youtube","202011230108","1","3","0"],["video","select","youtube","202011230633","1","3","0"],["video_screen","click_on_screen","false","202011230458","1","3","0"],["video_screen","click_on_screen","false","202011230552","1","3","0"],["video_screen","click_on_screen","false","202011230612","1","3","0"],["video_screen","click_on_screen","false","202011231740","1","3","0"],["category","select","Disney Karaoke","202011232228","1","2","0"],["category","select","Duet","202011232228","1","2","0"],["category","select","Free","202011230726","1","2","0"],["category","select","Free","202011231830","2","2","0"],["category","select","Free","202011232228","1","2","0"],["category","select","Love","202011232228","1","2","0"],["category","select","New","202011232228","1","2","0"],["category","select","Pitch Perfect 2","202011232228","1","2","0"],["developer","click","hithub","202011230749","1","2","0"],["sign_in","enter","user_details","202011230134","1","2","0"],["sign_in","enter","user_details","202011230211","1","2","0"],["sign_in","enter","user_details","202011230219","1","2","0"]]
Bigquery 读取 JSONL 文件。该示例不是采用该格式。
\n
作为记录之间的分隔符。该示例全部在一行上,并用逗号分隔。{
开头,以 }
结尾。该示例包含不支持的 JSON 数组。{ "field1_name": "video_screen", "field2_name": "click_on_screen", "field3_name": false, "field4_name": 202011231958, "Field5_name": 1, "field6_name": 43, "field7_name": 0}
[]
。第一行开始于 {
,而不是 [{
,最后一行结束于 }
,而不是 }]
。这里是 Steven 回答的 Python 解决方案,用于将 GCS 中的 JSON 文件转换为 BigQuery 可以导入的文件。 bucket、gcp_prefix、source_file 和 modified_name 是与您的 GCS 项目相关的变量,如果文件位于存储桶根目录中,则不需要 gcp_prefix:
import json
from google.cloud import storage
from io import BytesIO, StringIO
def get_storage_client(bucket_name):
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
return bucket
def download_from_gcs(bucket_name, file_name):
file_io = BytesIO()
bucket = get_storage_client(bucket_name)
blob = bucket.blob(file_name)
blob.download_to_file(file_io)
return file_io
def upload_to_gcs(bucket_name, file_name, file_io):
bucket = get_storage_client(bucket_name)
blob = bucket.blob(file_name)
blob.upload_from_file(file_io, rewind=True)
blob_id = f'gs://{blob.id}'
return blob_id
def generate_json_file(gcs_object):
gcs_object.seek(0)
decoded_json = json.loads(gcs_object.read().decode('utf-8'))
content_string = [json.dumps(row) for row in decoded_json]
json_content = '\n'.join(content_string)
return json_content
file_object = BytesIO(download_from_gcs(bucket, source_file)['Body'].read())
modified_json = generate_json_file(file_object)
binary_json = StringIO(modified_json).getvalue().encode('utf-8')
blob_id = upload_to_gcs(bucket, f'{gcp_prefix}/{modified_name}', binary_json)