如何使用这种类型的列?

问题描述 投票:-1回答:1

我不知道如何从这个SQL列类型中获取相关信息:

array<
 struct<
  day_of_week:string,
  start:bigint,
  duration:bigint,
  enabled:boolean,
  created_at:timestamp,
  deleted_at:timestamp
  >
>

此列包含有关数据库中餐馆每日营业时间的信息。有餐厅已经改变了我们每天的操作,因此我真的不需要SQL表中的一些行。所有需要的是所有餐厅目前的营业时间。

这是我尝试从以下内容获取信息的列的示例:

[
  {
    "day_of_week": "4",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-02-23T10:47:15.033+0000",
    "deleted_at": "2018-10-22T18:27:40.403+0000"
  },
  {
    "day_of_week": "7",
    "start": 64800000,
    "duration": 359,
    "enabled": true,
    "created_at": "2018-10-22T18:29:11.030+0000",
    "deleted_at": null
  },
  {
    "day_of_week": "5",
    "start": 64800000,
    "duration": 359,
    "enabled": true,
    "created_at": "2018-10-22T18:29:11.030+0000",
    "deleted_at": null
  },
  {
    "day_of_week": "6",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-10-22T18:27:40.397+0000",
    "deleted_at": "2018-10-22T18:27:42.074+0000"
  },
  {
    "day_of_week": "7",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-10-22T18:27:40.397+0000",
    "deleted_at": "2018-10-22T18:27:42.074+0000"
  },
  {
    "day_of_week": "1",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-10-22T18:27:42.069+0000",
    "deleted_at": "2018-10-22T18:29:11.035+0000"
  },
  {
    "day_of_week": "6",
    "start": 64800000,
    "duration": 359,
    "enabled": true,
    "created_at": "2018-10-22T18:29:11.030+0000",
    "deleted_at": null
  },
  {
    "day_of_week": "7",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-10-22T18:27:42.069+0000",
    "deleted_at": "2018-10-22T18:29:11.035+0000"
  },
  {
    "day_of_week": "2",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-02-23T10:47:15.033+0000",
    "deleted_at": "2018-10-22T18:27:40.403+0000"
  },

我对这条信息不感兴趣,因为它在2018-10-22被删除了:

[{"day_of_week":"4","start":64800000,"duration":359,"enabled":false,
"created_at":"2018-02-23T10:47:15.033+0000","deleted_at":"2018-10-22T18:27:40.403+0000"}

但是我对本专栏的所有部分感兴趣,因为它显示了day_of_week:7的运行时间。

"day_of_week":"7","start":64800000,"duration":359,"enabled":true,
"created_at":"2018-10-22T18:29:11.030+0000","deleted_at":null

我试过这个来获取列的所有元素,但它只返回第一个像单元格的内容,仅此而已:

LATERAL VIEW explode(shifts.`day_of_week`) exploded_table as day_of_week
LATERAL VIEW explode(shifts.`start`) exploded_table as start
LATERAL VIEW explode(shifts.`enabled`) exploded_table as enabled
LATERAL VIEW explode(shifts.`duration`) exploded_table as duration

有人可以帮我这个!!!

另外,我想"start":64800000是指开放时间

"duration":359餐厅开放的持续时间。但我也很无法解释这些数字。我不知道"start":64800000是否指的是早上7点,早上8点,上午9点?如果“持续时间”:359 7小时,9小时?

很抱歉这么长的帖子,但我是SQL的新手,在这里是我找到我无能为力的事情的唯一真实资源。

提前感谢您提供的任何帮助。

sql apache-spark databricks
1个回答
0
投票

TLDR:

对于具有架构的数据帧df

key:integer
data:array
  element:struct
    day_of_week:string
    start:decimal(38,0)
    duration:decimal(38,0)
    enabled:boolean
    created_at:string
    deleted_at:string

注册为临时表test可以爆炸:

select key, a.ed.day_of_week,
  a.ed.start, a.ed.duration,
  a.ed.enabled, a.ed.created_at, a.ed.deleted_at
from (select key, explode(data) as ed from global_temp.test) a
where a.ed.deleted_at is null

见:https://imgur.com/a/bFcoSz3

© www.soinside.com 2019 - 2024. All rights reserved.