BigQuery Storage API:表的存储格式不受支持

问题描述 投票:3回答:1

我已使用BQ文档中的示例通过此查询将BQ表读入熊猫数据框:

query_string = """
SELECT
CONCAT(
    'https://stackoverflow.com/questions/',
    CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
"""

dataframe = (
    bqclient.query(query_string)
    .result()
    .to_dataframe(bqstorage_client=bqstorageclient)
)
print(dataframe.head())

                                            url  view_count
0  https://stackoverflow.com/questions/22879669       48540
1  https://stackoverflow.com/questions/13530967       45778
2  https://stackoverflow.com/questions/35159967       40458
3  https://stackoverflow.com/questions/10604135       39739
4  https://stackoverflow.com/questions/16609219       34479

但是,当我尝试使用任何其他非公开数据集时,出现以下错误:

google.api_core.exceptions.FailedPrecondition:400创建会话时出错:表的存储格式不受支持

我需要在表中进行一些设置,以便它可以与BQ Storage API一起使用吗?

此作品:

query_string = """SELECT funding_round_type, count(*) FROM `datadocs-py.datadocs.investments` GROUP BY funding_round_type order by 2 desc LIMIT 2""" 
>>> bqclient.query(query_string).result().to_dataframe()

funding_round_type     f0_
0            venture  104157
1               seed   43747

但是,当我将其设置为使用bqstorageclient时,出现该错误:

>>> bqclient.query(query_string).result().to_dataframe(bqstorage_client=bqstorageclient)

Traceback (most recent call last):
  File "/Users/david/Desktop/V/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 533, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
    status = StatusCode.FAILED_PRECONDITION
    details = "there was an error creating the session: the table has a storage format that is not supported"
    debug_error_string = "{"created":"@1565047973.444089000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"there was an error creating the session: the table has a storage format that is not supported","grpc_status":9}"
>
google-cloud-platform google-bigquery
1个回答
1
投票

我遇到了与2019年11月6日相同的问题,事实是您收到的错误是Read API的已知问题,因为它目前无法处理小于10MB的结果集。我碰到了这个问题,这为这个问题提供了一些启示:GitHub.com - GoogleCloudPlatform/spark-bigquery-connector - FAILED_PRECONDITION: there was an error creating the session: the table has a storage format that is not supported #46

我已经使用返回大于10MB结果集的查询对它进行了测试,并且对我要查询的数据集进行EU多区域定位对于我来说似乎还不错。

此外,您需要在您的环境中安装fastavro才能使用此功能。

© www.soinside.com 2019 - 2024. All rights reserved.