使用python的pandas来处理aws dynamodb数据

Question

我从dynamodb表中获取数据，使用boto3 for python 2.7，我将使用pandas进行分组并对数据进行排序。

不幸的是，dynamodb数据格式有点棘手。像这样：

data = [{
      u 'permaname': {
        u 'S': u 'facebook'
      },
      u 'uuid': {
        u 'S': u '4b873085-c995-4ce4-9325-cfc70fcd4040'
      },
      u 'tags': {
        u 'L': []
      },
      u 'type': {
        u 'S': u 'xxxxxx'
      },
      u 'createdOn': {
        u 'N': u '1502099627'
      },
      u 'source': {
        u 'S': u 'xxxxxxx'
      },
      u 'data': {
        u 'NULL': True
      },
      u 'crawler': {
        u 'S': u 'xxxxxxx'
      }
    }, {
      u 'permaname': {
        u 'S': u 'facebook'
      },
      u 'uuid': {
        u 'S': u '25381aef-a7db-4b79-b599-89fd060fcf73'
      },
      u 'tags': {
        u 'L': []
      },
      u 'type': {
        u 'S': u 'xxxxxxx'
      },
      u 'createdOn': {
        u 'N': u '1502096901'
      },
      u 'source': {
        u 'S': u 'xxxxxxx'
      },
      u 'data': {
        u 'NULL': True
      },
      u 'crawler': {
        u 'S': u 'xxxxxxx'
      }
    }]

要做我的小组并对东西进行排序我必须创建一个pandas对象，我无法弄清楚该怎么做。

这是我正在尝试的方式：

obj = pandas.DataFrame(data)
print list(obj.sort_values(['createdOn'],ascending=False).groupby('source'))

如果我像这样打印obj：

print list(obj)

我有：

[u'crawler'，u'createdOn'，u'data'，u'permaname'，u'source'，u'tags'，u'type'，u'uuid']

有人知道如何使用dynamodb数据创建dataFrame obj吗？

Answer 1

要将dynamodb json转换为常规json，您可以使用此库：

https://github.com/Alonreznik/dynamodb-json

Answer 2

我将尝试用Python 3回答。

data = [{
       'permaname': {
         'S':  'facebook'
      },
       'uuid': {
         'S':  '4b873085-c995-4ce4-9325-cfc70fcd4040'
      },
       'tags': {
         'L': []
      },
       'type': {
         'S':  'xxxxxx'
      },
       'createdOn': {
         'N':  '1502099627'
      },
       'source': {
         'S':  'xxxxxxx'
      },
       'data': {
         'NULL': True
      },
       'crawler': {
         'S':  'xxxxxxx'
      }
    }, {
       'permaname': {
         'S':  'facebook'
      },
       'uuid': {
         'S':  '25381aef-a7db-4b79-b599-89fd060fcf73'
      },
     'tags': {
         'L': []
      },
       'type': {
         'S':  'xxxxxxx'
      },
       'createdOn': {
         'N':  '1502096901'
      },
       'source': {
         'S':  'xxxxxxx'
      },
       'data': {
         'NULL': True
      },
       'crawler': {
         'S':  'xxxxxxx'
      }
    }]

如前所述使用dynamodb_json。

from dynamodb_json import json_util as json
obj = pd.DataFrame(json.loads(data))
obj

随着输出：

    crawler     createdOn   data    permaname   source  tags    type    uuid
0   xxxxxxx     1502099627  None    facebook    xxxxxxx     []  xxxxxx  4b873085-c995-4ce4-9325-cfc70fcd4040
1   xxxxxxx     1502096901  None    facebook    xxxxxxx     []  xxxxxxx     25381aef-a7db-4b79-b599-89fd060fcf73

分组依据（我使用max（）来汇总结果）

obj.sort_values(['createdOn'],ascending=False).groupby('source').max()

随着输出

       crawler  createdOn   data    permaname   tags    type    uuid
source                          
xxxxxxx     xxxxxxx     1502099627  NaN     facebook    []  xxxxxxx     4b873085-c995-4ce4-9325-cfc70fcd4040

打印列表

print(list(obj))

输出：

[u'crawler', u'createdOn', u'data', u'permaname', u'source', u'tags', u'type', u'uuid']

我希望它有所帮助。

使用python的pandas来处理aws dynamodb数据

问题描述投票：2回答：2

2个回答

最新问题

使用python的pandas来处理aws dynamodb数据

问题描述 投票：2回答：2

2个回答

最新问题

问题描述投票：2回答：2