限制 Google Cloud Bigtable 中由前缀定义的行集

Question

我试图在Python中执行此操作：我有多个前缀可以在Bigtable中查询，但我只想要由前缀定义的每个行集的第一个结果。本质上，对每个行集应用 1 的限制，而不是对整个扫描。

假设您有以下记录的行键：

collection_1#item1#reversed_timestamp1
collection_1#item1#reversed_timestamp2
collection_1#item2#reversed_timestamp3
collection_1#item2#reversed_timestamp4

如果我想同时检索

collection_1#item1#

和

collection_1#item2#

的最新条目怎么办？

预期输出应该是与以下内容对应的行：

collection_1#item1#reversed_timestamp1
collection_1#item2#reversed_timestamp3

这可以在 Bigtable 中完成吗？谢谢！

Answer 1

collection_1#item1#reversed_timestamp1是行键还是reversed_timestamp1实际上是时间戳？

如果它不是行键的一部分，您可以使用过滤器，例如每列单元格 https://cloud.google.com/bigtable/docs/using-filters#cells-per-column-limit 例如

rows = table.read_rows(filter_=row_filters.CellsColumnLimitFilter(2))

或每行的单元格 https://cloud.google.com/bigtable/docs/using-filters#cells-per-row-limit 例如

rows = table.read_rows(filter_=row_filters.CellsRowLimitFilter(2))

取决于您的数据的布局方式。

Answer 2

假设每个时间戳都是唯一的bigtable行，您可以将限制传递给readrows函数：

prefix = "collection_1#item1#"
end_key = prefix[:1]+chr(ord(prefix[-1])+1)
row_set = RowSet()
row_set.add_row_range_from_keys(prefix.encode("utf-8"),end_key.encode("utf-8"))
rows = table.read_rows(row_set=row_set, limit=1)

prefix = "collection_1#item2#"
end_key = prefix[:1]+chr(ord(prefix[-1])+1)
row_set = RowSet()
row_set.add_row_range_from_keys(prefix.encode("utf-8"),end_key.encode("utf-8"))
rows = table.read_rows(row_set=row_set, limit=1)

如果你想在bigtable上执行此操作，则不能“同时”查询item1和item2。

显然我们不知道您的用例，但 Bora 是对的，您应该考虑将时间戳作为密钥的一部分，然后从底层单元格中提取它。（从而做出类似的东西

filter_=row_filters.CellsRowLimitFilter(1)

正确）但是如果您的密钥粒度确实需要时间戳，这可能会创建热键。

限制 Google Cloud Bigtable 中由前缀定义的行集

问题描述投票：0回答：2

2个回答

最新问题

限制 Google Cloud Bigtable 中由前缀定义的行集

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2