我已使用BQ文档中的示例通过此查询将BQ表读入熊猫数据框:
query_string = """
SELECT
CONCAT(
'https://stackoverflow.com/questions/',
CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(bqstorage_client=bqstorageclient)
)
print(dataframe.head())
url view_count
0 https://stackoverflow.com/questions/22879669 48540
1 https://stackoverflow.com/questions/13530967 45778
2 https://stackoverflow.com/questions/35159967 40458
3 https://stackoverflow.com/questions/10604135 39739
4 https://stackoverflow.com/questions/16609219 34479
但是,当我尝试使用任何其他非公开数据集时,出现以下错误:
google.api_core.exceptions.FailedPrecondition:400创建会话时出错:表的存储格式不受支持
我需要在表中进行一些设置,以便它可以与BQ Storage API一起使用吗?
此作品:
query_string = """SELECT funding_round_type, count(*) FROM `datadocs-py.datadocs.investments` GROUP BY funding_round_type order by 2 desc LIMIT 2"""
>>> bqclient.query(query_string).result().to_dataframe()
funding_round_type f0_
0 venture 104157
1 seed 43747
但是,当我将其设置为使用bqstorageclient
时,出现该错误:
>>> bqclient.query(query_string).result().to_dataframe(bqstorage_client=bqstorageclient)
Traceback (most recent call last):
File "/Users/david/Desktop/V/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 533, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.FAILED_PRECONDITION
details = "there was an error creating the session: the table has a storage format that is not supported"
debug_error_string = "{"created":"@1565047973.444089000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"there was an error creating the session: the table has a storage format that is not supported","grpc_status":9}"
>
我遇到了与2019年11月6日相同的问题,事实是您收到的错误是Read API的已知问题,因为它目前无法处理小于10MB的结果集。我碰到了这个问题,这为这个问题提供了一些启示:GitHub.com - GoogleCloudPlatform/spark-bigquery-connector - FAILED_PRECONDITION: there was an error creating the session: the table has a storage format that is not supported #46
我已经使用返回大于10MB结果集的查询对它进行了测试,并且对我要查询的数据集进行EU多区域定位对于我来说似乎还不错。
此外,您需要在您的环境中安装fastavro
才能使用此功能。