将 JSON 嵌套到 Azure DataBricks 上的 Flat PySpark Dataframe

问题描述 投票:0回答:2

我有以下 PySpark 数据框:

StructType([
  StructField('data', ArrayType(StructType([
    StructField('awayPlayers', ArrayType(StructType([
      StructField('number', StringType(), True),
      StructField('optaId', StringType(), True),
      StructField('playerId', StringType(), True),
      StructField('speed', DoubleType(), True),
      StructField('xyz', ArrayType(DoubleType(), True), True)
    ]), True), True),
    StructField('ball', StructType([
      StructField('speed', DoubleType(), True),
      StructField('xyz', ArrayType(DoubleType(), True), True)
      ]), True),
    StructField('frameIdx', LongType(), True),
    StructField('gameClock', StringType(), True),
    StructField('homePlayers', ArrayType(StructType([
      StructField('number', StringType(), True),
      StructField('optaId', StringType(), True),
      StructField('playerId', StringType(), True),
      StructField('speed', DoubleType(), True),
      StructField('xyz', ArrayType(DoubleType(), True), True)
    ]), True), True),
    StructField('period', LongType(), True),
    StructField('wallClock', LongType(), True)
  ]), True), True),
  StructField('feedName', StringType(), True),
  StructField('gameId', StringType(), True),
  StructField('league', StringType(), True),
  StructField('messageId', StringType(), True)
])

我从 JSON 文件中读入。我想将其展平,其中 homePlayers 和awayPlayers 位于单独的行上。

我尝试过以下方法:

# Flatten the DataFrame
flattened_df = df.selectExpr("league", "gameId", "feedName", "messageId", "data.*")
flattened_df = flattened_df.select("league", "gameId", "feedName", "messageId", explode("data.awayPlayers").alias("awayPlayers"), "data.ball", "data.frameIdx", "data.gameClock", explode("data.homePlayers").alias("homePlayers"))

但我收到以下错误:


AnalysisException: Can only star expand struct data types. Attribute: `ArrayBuffer(data)`; line 1 pos 0
---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<command-1660785452814683> in <cell line: 5>()
      3 
      4 # Flatten the DataFrame
----> 5 flattened_df = df.selectExpr("league", "gameId", "feedName", "messageId", "data.*")
      6 flattened_df = flattened_df.select("league", "gameId", "feedName", "messageId", explode("data.awayPlayers").alias("awayPlayers"), "data.ball", "data.frameIdx", "data.gameClock", explode("data.homePlayers").alias("homePlayers"))

/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
     46             start = time.perf_counter()
     47             try:
---> 48                 res = func(*args, **kwargs)
     49                 logger.log_success(
     50                     module_name, class_name, function_name, time.perf_counter() - start, signature

/databricks/spark/python/pyspark/sql/dataframe.py in selectExpr(self, *expr)
   2132         if len(expr) == 1 and isinstance(expr[0], list):
   2133             expr = expr[0]  # type: ignore[assignment]
-> 2134         jdf = self._jdf.selectExpr(self._jseq(expr))
   2135         return DataFrame(jdf, self.sparkSession)

这是示例数据:

{"league": "test_league", "gameId": "anonymized_6fde632c-42a1-43bd-89a2-a87ad1aa06d6", "feedName": "test_feed", "messageId": "1173:2791", "data": [{"awayPlayers": [{"number": "anonymized_2d6248a3-8fff-4b75-8cbc-835e2dda8383", "optaId": "anonymized_16e8a41f-70eb-4444-914c-6172a95c0c5b", "playerId": "anonymized_9ae7f0ba-21f5-42b8-b1d5-e74f6cb4cb15", "speed": 1.2, "xyz": [-16.08, -20.19, 1.75]}, {"number": "anonymized_fd3bb284-e563-4b5b-9e81-cc5588f99ab4", "optaId": "anonymized_0ef53d56-d386-4a76-b7ec-195f29bb2d28", "playerId": "anonymized_8db39e06-0036-4008-ba83-d2db04509724", "speed": 3.24, "xyz": [-18.16, 16.28, 1.45]}, {"number": "anonymized_ca9dbf77-2c76-46e3-ba00-33044782918b", "optaId": "anonymized_50858f97-f88f-4e23-821e-5e35e23991d3", "playerId": "anonymized_50b2ba68-22ac-4f21-8901-8ed612409a75", "speed": 0.25, "xyz": [-10.49, -10.08, 1.7]}, {"number": "anonymized_07f1e120-1edb-4768-8a15-79e49aa2a5bc", "optaId": "anonymized_140a2128-eba0-439e-89af-c32bc0c432b6", "playerId": "anonymized_3539b581-23d6-4b68-b95b-845c20b578af", "speed": 1.02, "xyz": [-3.94, -9.07, 1.85]}, {"number": "anonymized_be3033ad-77dd-4383-807e-dc2edc3f01ee", "optaId": "anonymized_1ec1360b-9cb0-4b8e-bd62-14091c202ec5", "playerId": "anonymized_beaf7898-fd69-46bd-882e-e85fe30beec1", "speed": 0.68, "xyz": [-48.64, 32.13, 1.65]}, {"number": "anonymized_5018c77e-c8aa-456f-97d4-e3a90ab70da6", "optaId": "anonymized_8efd7d1e-a78e-447b-a977-57d68deb052e", "playerId": "anonymized_25d5e859-f4f2-458d-a8fa-b699f990ca11", "speed": 2.05, "xyz": [-6.43, -28.27, 1.7]}, {"number": "anonymized_b9dd2608-7a8b-4db0-822f-2b9ebdb5316e", "optaId": "anonymized_c18007e1-48d9-4f72-acc3-e087b63c615b", "playerId": "anonymized_38c031b2-7117-494f-9c9e-c44f613d3e3b", "speed": 0.54, "xyz": [-30.96, -17.14, 1.65]}, {"number": "anonymized_9bfe010f-e483-47f3-83fb-c5175cb98797", "optaId": "anonymized_805493d2-6436-451d-966e-7aa965cad68a", "playerId": "anonymized_fa830a4c-b0a2-4d6e-a1fa-6dde65448399", "speed": 0.13, "xyz": [-51.99, -10.29, 1.8]}, {"number": "anonymized_c6a29068-604b-4e7c-8d73-99076bdf9626", "optaId": "anonymized_1bd5c04a-edff-4417-afe5-38886dc45cc4", "playerId": "anonymized_75351b20-8adb-4208-a7ed-6015d73ddc51", "speed": 1.02, "xyz": [-30.56, -7.14, 1.7]}], "ball": {"speed": 0.59, "xyz": [-0.28, 2.82, 1.45]}, "frameIdx": 17795, "gameClock": null, "homePlayers": [{"number": "anonymized_b47a4aed-3c85-4b89-b378-5af4e05a6ae9", "optaId": "anonymized_14647784-948e-4422-aedd-24ef6e347db3", "playerId": "anonymized_432d6020-b627-4274-936a-45bb92f84f83", "speed": 2.61, "xyz": [33.6, -28.58, 1.65]}, {"number": "anonymized_4405effc-23f6-4842-b027-074ea02de2a0", "optaId": "anonymized_e7cd425b-89b3-4c53-b576-8ebb726e27bd", "playerId": "anonymized_1a9aba2b-bfb2-44f5-8553-3850e8ef0339", "speed": 0.28, "xyz": [-37.01, -21.97, 1.75]}, {"number": "anonymized_54fcfdad-a999-4bd0-a24e-c64e14b7566b", "optaId": "anonymized_5c13e4ef-4fe5-4edc-ae82-771c444a6af6", "playerId": "anonymized_8153f216-39f3-47b8-a2f5-f710e514edf1", "speed": 1.45, "xyz": [23.14, -16.59, 1.65]}, {"number": "anonymized_4e82f70e-dcf0-4d39-83ad-2d4c5ca9cd1e", "optaId": "anonymized_1c722dc5-ed4c-4070-b059-895711a4571d", "playerId": "anonymized_cfb66f19-adfe-4a63-974f-b529efaaffa5", "speed": 0.38, "xyz": [-2.92, -2.01, 1.8]}, {"number": "anonymized_e67b684e-8ef5-451f-b2cb-7c0e237843ae", "optaId": "anonymized_b9448bf5-0075-41e1-960a-b2b9725226b8", "playerId": "anonymized_0da34648-aac6-4989-ab9f-adb4a41b76f3", "speed": 0.68, "xyz": [-18.87, -2.46, 1.75]}, {"number": "anonymized_5c2d481e-499b-4046-b51c-ccbf3dae756c", "optaId": "anonymized_3834a35c-8a48-4267-8240-96146ec01af4", "playerId": "anonymized_f69cc0b5-f827-4e53-b9c6-5e3b6f2ef4c1", "speed": 1.05, "xyz": [-0.28, -23.09, 1.7]}, {"number": "anonymized_ecb9e36d-84d2-408f-8872-3d4569cb69f2", "optaId": "anonymized_7e81fe0c-1f21-4cc6-a16a-30689b89f6ba", "playerId": "anonymized_5176abd5-57dd-41f8-a0d3-c488dd570cfa", "speed": 0.63, "xyz": [-0.28, 2.82, 1.45]}, {"number": "anonymized_03c455f9-413f-4e2f-9fdd-721c9b5ed32b", "optaId": "anonymized_8fb569db-0ed2-4dfb-93fe-af8ee2fa320c", "playerId": "anonymized_f326f1c8-4344-4e3b-916b-4998c21494af", "speed": 0.57, "xyz": [-31.78, -0.03, 1.7]}, {"number": "anonymized_f9117c1c-7a46-4f7f-8f1e-9e3ce603cd9c", "optaId": "anonymized_510be692-5639-4654-be3c-e17450c8fe91", "playerId": "anonymized_2f32466d-5413-4d7c-a00b-6153de918b7c", "speed": 0.38, "xyz": [1.45, 1.3, 1.45]}, {"number": "anonymized_67541e01-820b-4c19-b21c-7f2e2242bfc2", "optaId": "anonymized_d7263fe3-b932-45cf-a7de-a0746e15cf7d", "playerId": "anonymized_b0c6aefe-50b2-47d6-aff9-dd67d40c8e1f", "speed": 1.02, "xyz": [26.9, -16.13, 1.55]}], "period": 1, "wallClock": 1661884065800}, {"awayPlayers": [{"number": "anonymized_78ba32fe-761e-4907-9e8f-e20bc560f667", "optaId": "anonymized_32016818-53fb-4da6-a8b3-e610ccda9f6a", "playerId": "anonymized_5b623ea0-167d-46f7-94d6-50ce4cd69433", "speed": 1.31, "xyz": [-16.13, -20.24, 1.8]}, {"number": "anonymized_22a3f10d-c700-4348-8ac8-53cfd59d036c", "optaId": "anonymized_e5ebbd6b-a3bf-4938-aa42-470147630f0d", "playerId": "anonymized_cd5b7a0c-8488-4843-a8ec-b20f43d72d15", "speed": 3.24, "xyz": [-18.21, 16.08, 1.45]}, {"number": "anonymized_538e3e90-150d-472a-a4f5-b774efd859c8", "optaId": "anonymized_6d01cedc-b813-422b-a672-627335240575", "playerId": "anonymized_61f85ce1-90cb-4dc9-b0f9-f0aad5f31c06", "speed": 0.46, "xyz": [-10.49, -10.08, 1.7]}, {"number": "anonymized_ffea5330-76d0-40eb-a268-33935a0e3cbe", "optaId": "anonymized_88e7009d-8ce1-4251-8484-f9e55ed09723", "playerId": "anonymized_d72a40b2-2db9-4a5c-ba1e-d4e8cad0d2c2", "speed": 0.89, "xyz": [-3.89, -9.07, 1.85]}, {"number": "anonymized_59c2af65-13c4-4ab4-8396-b2aa445af202", "optaId": "anonymized_e7ca3c27-4b84-413e-ab26-bc3f0690d3a7", "playerId": "anonymized_e58efe23-5050-41b1-a865-590850361b5b", "speed": 1.02, "xyz": [-48.74, 31.98, 1.65]}, {"number": "anonymized_aea06229-d46a-4394-ade3-7a3e52f98532", "optaId": "anonymized_1055432c-5638-4a7c-888e-b279069277bb", "playerId": "anonymized_989afe02-eec3-4f97-a941-3ffe241e9b33", "speed": 2.25, "xyz": [-6.32, -28.27, 1.7]}, {"number": "anonymized_daaa3e93-d2fc-414f-bcdd-bf9b3f12c8a3", "optaId": "anonymized_e2235aad-b427-4573-b261-64d92a05dcbe", "playerId": "anonymized_8e4fed56-189e-4679-88a1-a0f8ae02d098", "speed": 0.74, "xyz": [-31.01, -17.09, 1.65]}, {"number": "anonymized_c58d95be-ecfa-41f5-b905-a59b26adbbc6", "optaId": "anonymized_4a473a09-b966-4a80-9894-a7aa16c13d53", "playerId": "anonymized_584f3c98-bb29-4359-8f77-f5e919500b5e", "speed": 0.46, "xyz": [-52.1, -10.24, 1.85]}, {"number": "anonymized_48f3afea-8118-4eed-b0c7-256db54ab23a", "optaId": "anonymized_fc5844f7-5e3b-4d71-aca1-6c34e01275c1", "playerId": "anonymized_19b7af9b-e338-4444-9de7-f450c8553050", "speed": 1.09, "xyz": [-30.61, -7.14, 1.75]}], "ball": {"speed": 0.56, "xyz": [-0.28, 2.87, 1.45]}, "frameIdx": 17796, "gameClock": null, "homePlayers": [{"number": "anonymized_582e52a4-b2a9-43c3-9cf4-c04f187839db", "optaId": "anonymized_6804ddd0-c4a0-45f7-947b-f13981ac4625", "playerId": "anonymized_849099f5-fe35-49dd-8ca5-20c41b137ce1", "speed": 2.62, "xyz": [33.45, -28.52, 1.6]}, {"number": "anonymized_9f8ae4ed-d1f8-49c9-a817-118a9ca2afdd", "optaId": "anonymized_1783cd14-0c6f-4dc1-88af-e524203466c2", "playerId": "anonymized_0816d5a3-bdb5-4994-919d-46010a5b7f6e", "speed": 0.25, "xyz": [-37.01, -21.97, 1.75]}, {"number": "anonymized_e8fab359-b5ff-44c7-a841-29ccf8915b7d", "optaId": "anonymized_73b7fa6d-a182-401e-8483-3e3da95908c3", "playerId": "anonymized_50cae6af-285f-4b76-9d2e-b087b97c2ee3", "speed": 1.57, "xyz": [23.09, -16.48, 1.65]}, {"number": "anonymized_ece4056b-2f66-4c56-bf37-a16409996911", "optaId": "anonymized_9f5a9bb4-5e4b-493f-b8bc-ba92939ee84b", "playerId": "anonymized_07325772-6cf7-4a63-adcf-ce58d3acb58b", "speed": 0.38, "xyz": [-2.97, -2.01, 1.8]}, {"number": "anonymized_359e738c-c895-4ca1-8614-0238cb7a46c3", "optaId": "anonymized_8b596614-a9e8-4dea-8559-e2929091868b", "playerId": "anonymized_967dab8c-8f9c-49d1-8c4e-5c9a9fde9a26", "speed": 0.68, "xyz": [-18.92, -2.46, 1.7]}, {"number": "anonymized_a8dda121-c2ac-49f8-8cb2-9a8e10233f30", "optaId": "anonymized_b5b5aeb5-4bf3-49ec-b377-886212d6a6e3", "playerId": "anonymized_200f7d72-e460-44ab-ad49-fe6d6db63a68", "speed": 0.89, "xyz": [-0.28, -23.14, 1.7]}, {"number": "anonymized_5b2dc4e9-6b6d-4c31-817c-d09edc234bca", "optaId": "anonymized_52e48a54-a85a-4bab-8b30-d48b8402e395", "playerId": "anonymized_08707ad3-6db5-4ac3-bcc9-10e9cd0d021d", "speed": 0.68, "xyz": [-0.28, 2.87, 1.45]}, {"number": "anonymized_c56e42da-c0f7-43bc-97a2-ee7caad10206", "optaId": "anonymized_7e252fb7-b500-4c79-91ea-b9946a355fca", "playerId": "anonymized_26b12f58-9adc-4478-a8fb-15396fbb1c6d", "speed": 0.0, "xyz": [-31.78, -0.03, 1.7]}, {"number": "anonymized_54160b84-804b-4beb-af77-b0bea492fa8b", "optaId": "anonymized_5836038e-f3de-4921-8191-998967e49f75", "playerId": "anonymized_e6ba9650-f269-4ebc-bd4a-1151393d8954", "speed": 0.68, "xyz": [1.5, 1.3, 1.45]}, {"number": "anonymized_f97a5e88-1465-46a2-bb8f-3463972eb044", "optaId": "anonymized_3477e772-dea3-4597-8669-7e82e2643041", "playerId": "anonymized_52ccf1b8-1b09-4615-b04e-6e43661e7f33", "speed": 1.3, "xyz": [27.0, -16.08, 1.45]}], "period": 1, "wallClock": 1661884065840}, {"awayPlayers": [{"number": "anonymized_897d3b7d-7646-4637-8492-d1d2d56abd57", "optaId": "anonymized_14968781-8da6-4bc8-832e-f69cb16873b3", "playerId": "anonymized_5a07453e-6e6e-4b1b-ac63-3ebbec668857", "speed": 1.2, "xyz": [-16.13, -20.24, 1.8]}, {"number": "anonymized_59f9aad6-9662-4089-b8bf-4b79c8577eaa", "optaId": "anonymized_95e3a647-5f6b-43ed-b406-b1378f9df30a", "playerId": "anonymized_285b93f9-2343-440f-94cf-2d9a09797a13", "speed": 3.24, "xyz": [-18.21, 16.03, 1.45]}, {"number": "anonymized_bd54d72d-82ad-4448-8684-bb0b96266067", "optaId": "anonymized_5d467a87-d12b-4cfa-992e-a0ecf462f27e", "playerId": "anonymized_468d66b6-7241-43f7-91b5-8a0a2e36ec1b", "speed": 0.4, "xyz": [-10.44, -10.08, 1.7]}, {"number": "anonymized_02f4d47d-ada7-497b-b6de-d29ab383e818", "optaId": "anonymized_a8a65053-5748-4486-b5ba-3338b3fca853", "playerId": "anonymized_02e47bc0-b0ea-4bf4-97ad-23fa3ddb837e", "speed": 0.63, "xyz": [-3.89, -9.07, 1.85]}, {"number": "anonymized_7f6c807a-cfb2-40df-b8d6-1f0fc45ca6e6", "optaId": "anonymized_ecd14a38-8bda-40ba-aab1-fa6c6809116f", "playerId": "anonymized_9dbb9f19-5aae-4368-bdba-b89ab59d79f1", "speed": 0.4, "xyz": [-48.74, 31.98, 1.65]}, {"number": "anonymized_cca830ae-5db5-4bbe-a111-f2d9a296ef06", "optaId": "anonymized_75bf4118-3707-4be6-b49e-5d4c72212363", "playerId": "anonymized_7b5b4199-c4c6-4b21-9c05-3d412874bfc6", "speed": 2.17, "xyz": [-6.27, -28.32, 1.7]}, {"number": "anonymized_2eb4c517-b091-4193-a973-e761f4270a27", "optaId": "anonymized_ef799a09-8f2d-4233-b832-0f4e09a1ba52", "playerId": "anonymized_7dd6febd-f6d8-4eba-8bf1-54180131677d", "speed": 0.68, "xyz": [-31.01, -17.09, 1.65]}, {"number": "anonymized_03dbdfe8-b705-4ecc-a03b-ce6306e013c8", "optaId": "anonymized_594e18c4-ee9c-437d-b556-504fb879fd05", "playerId": "anonymized_2498a5cd-f010-454f-a296-ec407cbd1d3e", "speed": 0.57, "xyz": [-51.94, -10.29, 1.8]}, {"number": "anonymized_5e861e6e-89f3-4aa2-a395-0d85688c705b", "optaId": "anonymized_4f37d09d-9a05-4fd2-a6f8-e87b0f149b01", "playerId": "anonymized_92d0d39b-4aec-4d52-9130-fafb9f47dfa7", "speed": 1.09, "xyz": [-30.66, -7.09, 1.8]}], "ball": {"speed": 0.5, "xyz": [-0.28, 2.87, 1.45]}, "frameIdx": 17797, "gameClock": null, "homePlayers": [{"number": "anonymized_292d798b-2d56-4b97-81e4-800e2e1f7e46", "optaId": "anonymized_e020e124-4b6e-47c4-a9e1-3cff46c0a652", "playerId": "anonymized_96ecd19c-d44c-4428-ba82-7ae6e11dae37", "speed": 2.5, "xyz": [33.4, -28.47, 1.6]}, {"number": "anonymized_f81a686a-f38c-4d60-923c-fee842a2233a", "optaId": "anonymized_74c9295a-ac48-404b-b6e8-a32f4a2909e4", "playerId": "anonymized_09b824b1-728e-412c-8fb7-c840ab4c0f9d", "speed": 0.4, "xyz": [-37.01, -21.97, 1.75]}, {"number": "anonymized_a10846c1-1057-408a-bc5f-1f89f10f4d8a", "optaId": "anonymized_b3a57296-cde1-4cd2-9452-48e6dfdb1c98", "playerId": "anonymized_59783785-8d99-496e-87cf-ac22813e1fc7", "speed": 1.37, "xyz": [23.09, -16.48, 1.65]}, {"number": "anonymized_947508b2-8629-4a39-9f96-d9af021f3b06", "optaId": "anonymized_e051bc2f-82a7-453c-93b7-221a942588de", "playerId": "anonymized_b7231725-c231-4de7-a13a-c8438ebedcc9", "speed": 0.25, "xyz": [-2.97, -2.01, 1.8]}, {"number": "anonymized_233555be-0e8d-4376-a721-67d6ccfc842d", "optaId": "anonymized_2cf5e182-93f3-4c94-8734-98228604c8f9", "playerId": "anonymized_d6195f4b-80e3-4734-8183-568a6d2c2d50", "speed": 0.74, "xyz": [-18.92, -2.46, 1.7]}, {"number": "anonymized_8390bcd3-7f11-4e1e-88bc-32328bb722ba", "optaId": "anonymized_da23ea34-1bbb-4804-9461-e1283343ed7c", "playerId": "anonymized_ab220578-93f9-4949-bfba-5c9b047e3014", "speed": 0.89, "xyz": [-0.28, -23.14, 1.7]}, {"number": "anonymized_2b08c8b6-84fd-4769-b336-206b4746d343", "optaId": "anonymized_31de4f35-3570-494b-8b38-e923eb1cbfbe", "playerId": "anonymized_af58bd88-950a-4a23-8b93-bb979410a97f", "speed": 0.74, "xyz": [-0.28, 2.87, 1.45]}, {"number": "anonymized_9f620d41-83c1-459e-9402-f9328abe0cca", "optaId": "anonymized_dd9ab094-9839-49c8-9bd6-eb6ef66d22de", "playerId": "anonymized_5dcf5cd7-1bc4-40ed-81a2-1508437321cd", "speed": 0.25, "xyz": [-31.78, -0.03, 1.7]}, {"number": "anonymized_15f98871-153b-40ba-9c43-1e44f81e901d", "optaId": "anonymized_6ee7c1d7-7747-48fc-b5b4-752d0e50f322", "playerId": "anonymized_a67bcb27-1597-4228-ab71-ee6ea7f08e6c", "speed": 0.74, "xyz": [1.5, 1.3, 1.45]}, {"number": "anonymized_89364ffa-a200-4e48-9312-dfd5ad26840e", "optaId": "anonymized_5df940c8-2b5a-4991-8f47-a7df3240dba4", "playerId": "anonymized_95feac72-70bd-4cc4-939f-f41b5660f1b4", "speed": 1.3, "xyz": [27.0, -16.13, 1.5]}], "period": 1, "wallClock": 1661884065880}, {"awayPlayers": [{"number": "anonymized_e6a7c30f-2c30-4f96-aeaf-8cf3bb40c171", "optaId": "anonymized_136ae237-5cc7-4b65-96d8-7167c3548786", "playerId": "anonymized_dc9ed713-9c23-45d3-96cf-8efc1410f6e9", "speed": 1.2, "xyz": [-16.18, -20.35, 1.8]}, {"number": "anonymized_7a8c1e41-6786-4a29-ba13-154c7131bddc", "optaId": "anonymized_b8c8ec62-6f65-4b2d-8b1f-747752a4c574", "playerId": "anonymized_3074e0b4-fd2b-4266-89ed-6e5b75d0deaa", "speed": 3.2, "xyz": [-18.26, 15.82, 1.45]}, {"number": "anonymized_dfbbe0e9-51d2-4208-b4e3-a9bfcf9bff56", "optaId": "anonymized_70fe5983-5f2c-4c47-a99f-c417ea5230c3", "playerId": "anonymized_1b3dfc3d-08c4-42e2-9e07-4bd23d91a442", "speed": 0.36, "xyz": [-10.44, -10.13, 1.7]}, {"number": "anonymized_dc34c4ce-1154-4aa6-ba7e-e7693f5769bd", "optaId": "anonymized_c9f05f3a-4c7e-4c24-9619-65fd08be036c", "playerId": "anonymized_49db3178-5882-4fb0-9e77-6ac6854ebfb5", "speed": 0.63, "xyz": [-3.84, -9.07, 1.85]}, {"number": "anonymized_690231cb-0421-4c10-86f3-610fb3df9f4a", "optaId": "anonymized_3ad288d0-42a5-49f2-b98f-bef8cd1f39af", "playerId": "anonymized_40d18e2b-f44d-4430-ab90-e284cff0bcd9", "speed": 0.54, "xyz": [-48.69, 32.03, 1.65]}, {"number": "anonymized_5883388e-5b76-449f-87a2-6fe516d99871", "optaId": "anonymized_ab8d4dbd-69ab-497e-91dc-7a893d165c0f", "playerId": "anonymized_e1ef4e6a-cffe-4f7f-a548-fbade41d5ea0", "speed": 2.1, "xyz": [-6.12, -28.37, 1.7]}, {"number": "anonymized_f5050aa4-b9b3-46b0-a801-fe032684fba9", "optaId": "anonymized_8d7bb546-ecad-435a-b4fd-d47182525676", "playerId": "anonymized_4902b459-ffaa-4c1d-a7ac-eb7fc0a2139c", "speed": 0.38, "xyz": [-31.06, -17.09, 1.65]}, {"number": "anonymized_4c4873cf-9021-4122-a938-fc33fd438792", "optaId": "anonymized_f80f2b8c-143f-469a-a115-bf5327ca966a", "playerId": "anonymized_7e6c8d9b-b5c7-4f43-9ec5-2c2b8eb39eef", "speed": 0.74, "xyz": [-52.1, -10.24, 1.85]}, {"number": "anonymized_1caffc0e-1b20-4cf9-b97e-309447d9a190", "optaId": "anonymized_6a9e01c9-08a8-4bda-be57-3408619126f7", "playerId": "anonymized_75a36bd2-2486-47de-b891-776c28089587", "speed": 1.25, "xyz": [-30.71, -7.09, 1.8]}], "ball": {"speed": 0.48, "xyz": [-0.33, 2.92, 1.45]}, "frameIdx": 17798, "gameClock": null, "homePlayers": [{"number": "anonymized_66d20fa2-a4b4-402c-9f14-e250574b3c6b", "optaId": "anonymized_3bc8b0f6-19c0-4b42-b8f2-51ba5fc74dcc", "playerId": "anonymized_ea5f0901-3676-461e-a9f5-6255f866ba42", "speed": 2.29, "xyz": [33.3, -28.42, 1.65]}, {"number": "anonymized_745a8810-6180-407c-9801-aacd4633eca7", "optaId": "anonymized_10870d6a-2286-46f7-9082-b56797d42b17", "playerId": "anonymized_09ed5ff8-0a0b-444d-b517-6c1c7ec842cb", "speed": 0.68, "xyz": [-37.06, -21.92, 1.75]}, {"number": "anonymized_a8235961-40fb-4e26-856a-62fde3c44ec0", "optaId": "anonymized_cc30acb3-2aa7-4bbc-9997-cd2e8ed4ddca", "playerId": "anonymized_c7276a83-bf41-47f4-8af0-557238f803c5", "speed": 1.33, "xyz": [23.09, -16.38, 1.65]}, {"number": "anonymized_21ebfe48-0f20-47ae-a4c7-2c1c94d1b93e", "optaId": "anonymized_2f1cdb3e-e749-4ad7-b2e2-69ca9a6ac8bf", "playerId": "anonymized_4d4ddbe8-0e43-48f8-85cb-bf064e90c858", "speed": 0.25, "xyz": [-2.97, -2.01, 1.8]}, {"number": "anonymized_e79c29cd-53f3-4d9a-9274-0269c3578976", "optaId": "anonymized_fa8da474-e052-4c9a-bce4-92286b39b8e8", "playerId": "anonymized_5d710baf-5728-418c-9e18-942173a8c767", "speed": 1.09, "xyz": [-18.97, -2.41, 1.7]}, {"number": "anonymized_1e1b9191-2869-48cd-83c0-c118f73a19fd", "optaId": "anonymized_5cc4dcc0-3815-4681-9285-db56a2889138", "playerId": "anonymized_36896850-30c9-4eb9-9c51-b85ac1b07283", "speed": 1.02, "xyz": [-0.28, -23.19, 1.75]}, {"number": "anonymized_8fbbc8e9-b2af-4d53-ac9f-0f0e3e23685b", "optaId": "anonymized_770814ab-ac92-4dcb-a366-d70be090b3e4", "playerId": "anonymized_1e719602-299d-42b4-a4fb-3a0c05b8b2af", "speed": 0.54, "xyz": [-0.33, 2.92, 1.45]}, {"number": "anonymized_3d1931b0-27d4-4e41-8bdf-b853ed969a02", "optaId": "anonymized_77cdb24a-13bb-4429-8db1-bdd8baa036c5", "playerId": "anonymized_552af864-1cce-4d99-91a1-84b053fbf112", "speed": 0.38, "xyz": [-31.78, -0.03, 1.7]}, {"number": "anonymized_3040a97b-4166-4bd9-b11c-bdcb56f2896c", "optaId": "anonymized_229229e7-d708-4234-b6a1-46454b78c7cf", "playerId": "anonymized_bc1a6030-a8ce-43ae-91b7-ce2987d4e2a6", "speed": 0.74, "xyz": [1.55, 1.24, 1.45]}, {"number": "anonymized_2194758b-9098-4b97-87c7-be8f68d9ff24", "optaId": "anonymized_8ca9d671-f4c2-45b0-a93c-821d8d34a8ca", "playerId": "anonymized_8746361c-cdb9-4cd0-bf72-eace14bf0e8d", "speed": 1.28, "xyz": [27.1, -16.08, 1.5]}], "period": 1, "wallClock": 1661884065920}, {"awayPlayers": [{"number": "anonymized_978cd54c-21be-472a-92ee-af791ab1b41e", "optaId": "anonymized_79d6d826-ff23-4050-9183-bcf7dcb2cf0d", "playerId": "anonymized_783bee57-6d94-466f-b3de-dfb0211d8c9d", "speed": 1.31, "xyz": [-16.18, -20.35, 1.8]}, {"number": "anonymized_fffddea9-af0a-4ced-9f54-24991f4f5cf9", "optaId": "anonymized_e6c34670-6374-418e-ae02-cf59c04945b5", "playerId": "anonymized_d3c8ee78-e446-4614-9b9e-b9d1d6be1d14", "speed": 3.22, "xyz": [-18.26, 15.77, 1.45]}, {"number": "anonymized_653ef82c-f0a8-4ae1-8ce3-a34a9e70e90a", "optaId": "anonymized_ac2f01fa-c279-48eb-be25-018015cbcd1d", "playerId": "anonymized_e3828183-65f2-497a-88fc-55b5f8420da4", "speed": 0.36, "xyz": [-10.44, -10.08, 1.7]}, {"number": "anonymized_d64e8e49-6732-4986-940f-39a04bc4e3af", "optaId": "anonymized_810e2196-5a79-4c65-bbb3-a1f2bbaf7e71", "playerId": "anonymized_c3ed955e-7397-487d-b5f4-cc546cb6b588", "speed": 0.63, "xyz": [-3.84, -9.07, 1.85]}, {"number": "anonymized_f9f70ab5-d6f3-4e99-a452-9ba6f660bee9", "optaId": "anonymized_e7b2ebd3-6899-414e-b7a5-319edf856206", "playerId": "anonymized_663520c7-5202-4463-a99f-4e7c940b818c", "speed": 0.57, "xyz": [-48.69, 32.03, 1.65]}, {"number": "anonymized_13dd5861-3de5-4955-b2fd-e5431b9ef3ad", "optaId": "anonymized_fb283d33-2310-412f-9a4f-9e1ea1948b79", "playerId": "anonymized_0d123ce8-bb33-4841-b325-50b81eedd799", "speed": 1.89, "xyz": [-6.12, -28.37, 1.65]}, {"number": "anonymized_99eca7f7-bd4e-4bb8-82c4-60f87dfc3ad3", "optaId": "anonymized_ca50f786-aa73-4ee6-a5b9-5ea5f2e1f0e0", "playerId": "anonymized_f8511a80-aa01-48a1-9e58-0af12cf20463", "speed": 0.25, "xyz": [-31.06, -17.09, 1.65]}, {"number": "anonymized_760e88c6-0ff2-44c8-9845-20e4b20ab6d2", "optaId": "anonymized_11619401-1da4-45aa-8901-36cc516f2fa9", "playerId": "anonymized_cfebe8c1-a8f6-48a9-b642-3d7f5f88e845", "speed": 1.31, "xyz": [-52.1, -10.24, 1.85]}, {"number": "anonymized_d3039ade-a858-4348-9a3b-c4589d1f9e23", "optaId": "anonymized_dabbd161-6c76-41c1-9be6-f001a66aa62a", "playerId": "anonymized_fb3cebf7-963c-482b-a6dd-99e780ffcaa0", "speed": 1.09, "xyz": [-30.71, -7.09, 1.8]}], "ball": {"speed": 0.46, "xyz": [-0.33, 2.92, 1.45]}, "frameIdx": 17799, "gameClock": null, "homePlayers": [{"number": "anonymized_4516ef6f-7cd6-4941-b014-53e9b8e1ee09", "optaId": "anonymized_f2f03c4f-1aac-4cb2-8328-3ae3c76dd6f0", "playerId": "anonymized_76f13d75-9412-4065-bbe8-e7800bc92bb9", "speed": 2.44, "xyz": [33.25, -28.37, 1.65]}, {"number": "anonymized_0f785f19-c02b-4752-9580-28e8603c251d", "optaId": "anonymized_3743a1d1-008b-4390-9738-8ae26d6d1dec", "playerId": "anonymized_62991fd1-8c6c-4ed5-beeb-121117060ba9", "speed": 0.68, "xyz": [-37.01, -21.92, 1.75]}, {"number": "anonymized_9bf61ab7-36d8-445c-ae43-1a5d0c3674d3", "optaId": "anonymized_0b9f8565-3b1d-4498-8e60-d0a66265f418", "playerId": "anonymized_2a47d600-5886-45b9-a617-02e684bbb6bb", "speed": 1.33, "xyz": [23.04, -16.38, 1.65]}, {"number": "anonymized_7688cde4-b8fb-4790-9b77-06fa5daf8e9e", "optaId": "anonymized_c34bb987-3cf5-432e-9faf-c46f38e07d1f", "playerId": "anonymized_4dc2fecc-039c-401b-a994-118326b71d0c", "speed": 0.38, "xyz": [-2.97, -2.01, 1.8]}, {"number": "anonymized_f77435b2-cad8-4212-ac9e-30f43138ad3b", "optaId": "anonymized_bb33dea9-30c9-4557-ab22-b529d50d8c0f", "playerId": "anonymized_8c2b8874-e248-47d1-aa5d-fe843d6b25e9", "speed": 1.2, "xyz": [-18.97, -2.41, 1.7]}, {"number": "anonymized_c023594c-64d1-41b9-b914-b25f242b17ed", "optaId": "anonymized_36ef2e3f-8798-4bbd-8a7d-4150c07c8c3d", "playerId": "anonymized_bb5e89af-a64f-484d-902f-bd86d2594ba3", "speed": 1.02, "xyz": [-0.28, -23.24, 1.75]}, {"number": "anonymized_a542cecd-e41b-4a56-9b85-7e43bae25799", "optaId": "anonymized_c774b619-9241-4717-ac67-d419c48710a3", "playerId": "anonymized_8343e671-ae75-4276-b950-5242853b2c96", "speed": 0.36, "xyz": [-0.33, 2.92, 1.45]}, {"number": "anonymized_9c48f226-e727-41f6-b528-8b3f54caecc0", "optaId": "anonymized_127040ee-ade2-4621-94da-997dccba1c8b", "playerId": "anonymized_763102e3-f31d-47c1-87bb-fe3d4416fea8", "speed": 0.38, "xyz": [-31.78, -0.08, 1.7]}, {"number": "anonymized_152bd1de-a9f5-4f03-bfaf-92b5f8c98eb4", "optaId": "anonymized_c5ee62b4-c184-4244-baf4-a1e28908767a", "playerId": "anonymized_197ab9bf-bc01-420d-8b2e-49aced4d7a1f", "speed": 0.68, "xyz": [1.55, 1.24, 1.45]}, {"number": "anonymized_4bb9f503-a889-4dc9-b291-5574bd4b0ed5", "optaId": "anonymized_e44e98ba-ee63-4e31-a8d2-853428369c96", "playerId": "anonymized_f97f3227-c709-453d-b4b7-88f091271485", "speed": 1.55, "xyz": [27.1, -16.08, 1.55]}], "period": 1, "wallClock": 1661884065960}]}

请帮忙

python dataframe apache-spark pyspark apache-spark-sql
2个回答
2
投票

正如另一个答案已经说过的那样,您遇到的问题是因为您试图以只能

select
结构的方式来
select
数组。

首先,您需要

explode
该数组。然后,您可以去
explode
awayPlayers
homePlayers
数组,将它们放在单独的行上。

对于你的情况,你可以这样做:

from pyspark.sql.functions import explode, col

df = spark.read.json("./bigJson.json")

flattened_df = df \
    .select("league", "gameId", "feedName", "messageId", explode("data").alias("data")) \
    .select("league", "gameId", "feedName", "messageId", col("data.awayPlayers").alias("awayPlayers"), "data.ball", "data.frameIdx", "data.gameClock", col("data.homePlayers").alias("homePlayers")) \
    .withColumn("awayPlayers", explode("awayPlayers")) \
    .withColumn("homePlayers", explode("homePlayers"))

flattened_df.printSchema()
root
 |-- league: string (nullable = true)
 |-- gameId: string (nullable = true)
 |-- feedName: string (nullable = true)
 |-- messageId: string (nullable = true)
 |-- awayPlayers: struct (nullable = true)
 |    |-- number: string (nullable = true)
 |    |-- optaId: string (nullable = true)
 |    |-- playerId: string (nullable = true)
 |    |-- speed: double (nullable = true)
 |    |-- xyz: array (nullable = true)
 |    |    |-- element: double (containsNull = true)
 |-- ball: struct (nullable = true)
 |    |-- speed: double (nullable = true)
 |    |-- xyz: array (nullable = true)
 |    |    |-- element: double (containsNull = true)
 |-- frameIdx: long (nullable = true)
 |-- gameClock: string (nullable = true)
 |-- homePlayers: struct (nullable = true)
 |    |-- number: string (nullable = true)
 |    |-- optaId: string (nullable = true)
 |    |-- playerId: string (nullable = true)
 |    |-- speed: double (nullable = true)
 |    |-- xyz: array (nullable = true)
 |    |    |-- element: double (containsNull = true)

1
投票

您收到的错误来自您选择中的

data.*

您的原始行包含一个

data
数组,其中包含两个嵌套的
awayPlayers
homePlayers
数组。

您使用

explode
嵌套数组的方法是正确的,但在您需要分解
data
之前。您代码中的
data.*
应该用于取消嵌套
structs
data
是一个数组。

© www.soinside.com 2019 - 2024. All rights reserved.