从具有约1000万用户和txns的系统中获取较旧的用户余额,过滤掉不活动的用户(即没有任何txns之后)

问题描述 投票:0回答:1

这看起来更像是db.stackexchange问​​题,但也可以在这里放置脚本解决方案。请原谅问题的框架。

涉及的表-

帐户

CREATE TABLE `account` (
  `id` bigint(15) NOT NULL AUTO_INCREMENT,
  `account_id` bigint(14) NOT NULL,
  `acc_complete_id` bigint(14) DEFAULT NULL,
  `uuid` varchar(400) NOT NULL,
  `type` int(11) DEFAULT NULL,
  `created` datetime DEFAULT NULL,
  `balance` decimal(19,2) DEFAULT '0.00',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uuid_UNIQUE` (`uuid`),
  UNIQUE KEY `account_id_UNIQUE` (`account_id`),
  UNIQUE KEY `acc_complete_id_UNIQUE` (`acc_complete_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

交易

CREATE TABLE `transaction` (
  `id` bigint(19) NOT NULL AUTO_INCREMENT,
  `type` int(4) DEFAULT NULL,
  `created` datetime DEFAULT NULL,
  `amount` decimal(19,2) DEFAULT '0.00',
  `debit` bigint(14) DEFAULT NULL,
  `credit` bigint(14) DEFAULT NULL,
  `status` varchar(45) DEFAULT NULL,
  `debit_bal` decimal(19,2) DEFAULT '0.00',
  `credit_bal` decimal(19,2) DEFAULT '0.00',
  PRIMARY KEY (`id`),
  KEY `transaction_credit_index` (`credit`),
  KEY `transaction_debit_index` (`debit`),
  KEY `transaction_created_index` (`created`),
  KEY `transaction_ref_index` (`ref`),
  KEY `transaction_narrative_index` (`narrative`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

表列

  • debit_bal和credit_bal是txn之后涉及txn的两个帐户的余额。

我们目前找到余额为零的不活跃用户总数(不活跃者取决于谁拥有在一定时期内没有交易。但现在,最痛苦的部分是,我们需要获取过去几个月的数据(这段时间没有活动,并且帐户余额都处于du)]

当前正在使用查询来获取余额为零且创建日期,类型等为某些条件的非活动用户的数量-

SELECT
  count(DISTINCT( a.uuid )),
  Sum(a.balance) 
FROM
  account a 
WHERE
  a.balance = 0.00 and a.type = "1" 
  AND a.created <= '2018-02-28 18:29:59' 
  AND 
  (
    a.account_id + 100000000000 
  )
  NOT IN 
  (
    SELECT DISTINCT
( pt.debit ) 
    FROM
      transaction pt 
    WHERE
      pt.created BETWEEN '2018-02-28 18:29:59' AND '2019-11-30 18:29:59' 
      AND MOD(pt.debit, 100000000000) IN 
      (
        SELECT
          pa.account_id 
        FROM
          account pa 
        WHERE
          pa.type = "1" 
          AND pa.created <= '2018-02-28 18:29:59' 
      )
    UNION
    SELECT DISTINCT
( pt.credit ) 
    FROM
      transaction pt 
    WHERE
      pt.created BETWEEN '2018-02-28 18:29:59' AND '2019-11-30 18:29:59' 
      AND MOD(pt.credit, 100000000000) IN 
      (
        SELECT
          pa.account_id 
        FROM
          account pa 
        WHERE
          pa.type = "1" 
          AND pa.created <= '2018-02-28 18:29:59' 
      )
  )

以上查询返回大约10分钟内具有余额的非活动用户的数量。

非活动用户=用户-(已完成借记的UNION用户的贷记用户的集合)。

但是,我不能在较早的月份中运行此查询,因为我将获得的值将基于当前余额,而那时的余额并不相同。帐户类型也可能不相同,但是我们找到了这些帐户,并在重复表格中对其进行了更新。

现在,当我尝试通过删除count()并最后通过uuid添加组来获取不活动用户的数量以及当前余额时,查询运行超过15个小时,并且mysql线程状态显示为大多数情况下都是“删除重复项”。

说明输出-

+----+--------------------+------------+------------+-------------+------------------------------------------------------------------------------------------+----------------------------------+---------+------+----------+----------+----------------------------------------------+
| id | select_type        | table      | partitions | type        | possible_keys                                                                            | key                              | key_len | ref  | rows     | filtered | Extra                                        |
+----+--------------------+------------+------------+-------------+------------------------------------------------------------------------------------------+----------------------------------+---------+------+----------+----------+----------------------------------------------+
|  1 | PRIMARY            | a          | NULL       | ALL         | PRIMARY,uuid_UNIQUE,account_id_UNIQUE,acc_complete_id_UNIQUE,created_index,updated_index | NULL                             | NULL    | NULL | 23745634 |     5.00 | Using where; Using temporary; Using filesort |
|  2 | DEPENDENT SUBQUERY | pt         | NULL       | ref_or_null | transaction_debit_index,transaction_created_index                        | transaction_debit_index  | 9       | func |       32 |     7.52 | Using where                                  |
|  2 | DEPENDENT SUBQUERY | pa         | NULL       | eq_ref      | account_id_UNIQUE,created_index                                                          | account_id_UNIQUE                | 8       | func |        1 |     5.00 | Using index condition; Using where           |
|  4 | DEPENDENT UNION    | pt         | NULL       | ref_or_null | transaction_credit_index,transaction_created_index                       | transaction_credit_index | 9       | func |       22 |     7.52 | Using where                                  |
|  4 | DEPENDENT UNION    | pa         | NULL       | eq_ref      | account_id_UNIQUE,created_index                                                          | account_id_UNIQUE                | 8       | func |        1 |     5.00 | Using index condition; Using where           |
| NULL | UNION RESULT       | <union2,4> | NULL       | ALL         | NULL                                                                                     | NULL                             | NULL    | NULL |     NULL |     NULL | Using temporary                              |
+----+--------------------+------------+------------+-------------+------------------------------------------------------------------------------------------+----------------------------------+---------+------+----------+----------+----------------------------------------------+
6 rows in set, 1 warning (0.00 sec)

现在,我需要获取用户列表,这需要很多时间-

       SELECT DISTINCT
( a.uuid ),
  Sum(a.balance) 
FROM
  account a 
WHERE
  a.type = "1" 
  AND a.created <= '2018-02-28 18:29:59' 
  AND 
  (
    a.account_id + 100000000000 
  )
  NOT IN 
  (
    SELECT DISTINCT
( pt.debit ) 
    FROM
      transaction pt 
    WHERE
      pt.created BETWEEN '2018-02-28 18:29:59' AND '2019-11-30 18:29:59' 
      AND MOD(pt.debit, 100000000000) IN 
      (
        SELECT
          pa.account_id 
        FROM
          account pa 
        WHERE
          pa.type = "1" 
          AND pa.created <= '2018-02-28 18:29:59'
      )
    UNION
    SELECT DISTINCT
( pt.credit ) 
    FROM
      transaction pt 
    WHERE
      pt.created BETWEEN '2018-02-28 18:29:59' AND '2019-11-30 18:29:59' 
      AND MOD(pt.credit, 100000000000) IN 
      (
        SELECT
          pa.account_id 
        FROM
          account pa 
        WHERE
          pa.type = "1" 
          AND pa.created <= '2018-02-28 18:29:59'
      )
  )
GROUP BY
  a.id;

需要将近15个小时,并且仍在继续。这太长了,因为我需要这样做几个月,并且任何错误都意味着我需要再次运行。

一些样本数据

一些样本数据-

帐户表-

+------+------------+-----------------+---------------------+---------------------+---------------------+---------+
| id   | account_id | acc_complete_id | uuid                | last_updated        | created             | balance |
+------+------------+-----------------+---------------------+---------------------+---------------------+---------+
|   29 |      50536 |    100000050536 | 1026651502611722400 | 2020-01-09 12:43:49 | 2018-01-01 00:00:01 | 2092.10 |
| 1337 |      53071 |    100000053071 | 7266704751953077361 | 2019-12-26 11:45:54 | 2019-10-22 18:13:21 |   99.00 |
|   30 |      50673 |    100000050673 | 8799857402485889540 | 2020-01-05 13:21:16 | 2017-01-01 00:00:01 | 2166.10 |
+------+------------+-----------------+---------------------+---------------------+---------------------+---------+

交易

+---------+---------------------+--------+--------------+--------------+-----------+------------+
| id      | created             | amount | debit        | credit       | debit_bal | credit_bal |
+---------+---------------------+--------+--------------+--------------+-----------+------------+
| 2001705 | 2019-12-07 14:14:18 |   1.00 | 100000050536 |            3 |   2092.00 | 2332445.91 |
| 2001869 | 2020-05-08 14:29:00 |   4.00 | 100000050673 | 200000052870 |   2088.10 |       4.00 |
| 2001874 | 2020-05-09 14:45:04 |   4.00 | 100000050673 | 200000052870 |   2084.10 |       8.00 |
| 2001875 | 2020-05-09 14:46:37 |   4.00 | 100000050673 | 200000052870 |   2080.10 |      12.00 |
| 2002018 | 2019-11-29 18:05:41 |  50.00 | 100000053071 | 300000050673 |      0.00 |    2170.10 |
| 2002019 | 2019-11-29 18:07:41 |   1.00 | 100000053071 | 300000050673 |    100.00 |    2170.10 |
| 2002020 | 2019-11-29 18:07:56 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002021 | 2019-11-29 18:15:22 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002022 | 2019-11-29 18:18:45 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002023 | 2019-11-29 18:20:41 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002024 | 2019-11-29 18:24:18 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002025 | 2019-11-29 18:26:19 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002026 | 2019-11-29 18:28:41 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002027 | 2019-11-29 18:29:37 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002028 | 2019-11-29 18:30:40 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002029 | 2019-11-29 18:35:55 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002030 | 2019-11-29 18:42:16 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002031 | 2019-12-02 13:12:01 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002032 | 2019-12-02 13:18:21 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002033 | 2019-12-02 13:27:53 |   1.00 | 100000053071 |            5 |    100.00 |  580037.00 |
| 2002034 | 2019-12-02 13:38:11 |   1.00 | 100000053071 |            5 |     99.00 |  580038.00 |
+---------+---------------------+--------+--------------+--------------+-----------+------------+

摘要

  • 因此,我必须从此处获取其当前余额的用户列表。这是瓶颈,我无法考虑分解这部分来得出最终结果。

  • 一旦获得了具有当前余额的此类用户的列表,我就可以在随后的几个月中查询针对每个用户的另一张借方总贷方表,然后进行一些加减运算以得出每个方的旧余额用户,然后将它们加起来以查找所有此类用户。在给定月份中,使用txns的用户几乎没有两位数,因此这一部分发生得很快。

我现在正在考虑获取数据的替代方法。请注意,我们已经隔离了这些表,并且现在没有实时流量,因此如果需要,我们可以添加更多索引。

我没有很多时间去尝试许多方法,但是我接下来要考虑的是向帐户表中添加标志字段,例如“ nov_inactive”,“ dec_inactive”等,表示用户处于非活动状态在那个月。我想尝试使用相同的选择条件更新重复表也将花费相似的时间-

update
  account_copy 
set
  nov_updates = 
  (
    1
  )
WHERE
  a.type = "1" 
  AND a.created <= '2018-02-28 18:29:59' 
  AND 
  (
    a.account_id + 100000000000 
  )
  NOT IN 
  (
    SELECT DISTINCT
( pt.debit ) 
    FROM
      transaction pt 
    WHERE
      pt.created BETWEEN '2018-02-28 18:29:59' AND '2019-11-30 18:29:59' 
      AND MOD(pt.debit, 100000000000) IN 
      (
        SELECT
          pa.account_id 
        FROM
          account pa 
        WHERE
          pa.type = "1" 
          AND pa.created <= '2018-02-28 18:29:59'
      )
    UNION
    SELECT DISTINCT
( pt.credit ) 
    FROM
      transaction pt 
    WHERE
      pt.created BETWEEN '2018-02-28 18:29:59' AND '2019-11-30 18:29:59' 
      AND MOD(pt.credit, 100000000000) IN 
      (
        SELECT
          pa.account_id 
        FROM
          account pa 
        WHERE
          pa.type = "1" 
          AND pa.created <= '2018-02-28 18:29:59'
      )
  )
GROUP BY
  a.id;

有什么想法吗?

mysql optimization data-retrieval
1个回答
0
投票

这是我们计算旧余额的方式-

select count(account_id), sum(if(temp2.old_balance is null,temp1.balance, temp2.old_balance)) 
from
         (
          select 
           pa.account_id, pa.balance, temp.acc_id as acc_id from account as pa force index (created_index)
           left join
                   ((select mod(debit,100000000000) as acc_id from transaction where created BETWEEN '2018-02-28 18:29:59' AND '2019-11-30 18:29:59') 
                                  union 
                     (select mod(credit,100000000000) as acc_id from transaction where created BETWEEN '2018-02-28 18:29:59' AND '2019-11-30 18:29:59')
                    ) as temp
           on pa.account_id=temp.acc_id
           where pa.type = '1' AND pa.created <= '2018-02-28 18:29:59'
           having acc_id is null
          )
  as temp1
  left join 
  (
      select temp.acc_id,temp.txn_amt,b.balance,(b.balance-temp.txn_amt) as old_balance from  
      (
         select mod(temp.acc_id,100000000000) as acc_id, sum(if(type=1,temp.amount,0-temp.amount)) as txn_amt from 
         (
              select credit as acc_id,sum(amount) as amount, '1' as type from transaction where created > '2019-11-30 18:29:59' and status= "SUCCESSFUL" group by credit 
              UNION
              select debit as acc_id, sum(amount) as amount, '0' as type from transaction where created > '2019-11-30 18:29:59' and status= "SUCCESSFUL" group by debit
          ) as temp group by temp.acc_id
      ) as temp join account as b on temp.acc_id=b.account_id where b.created <= '2018-02-28 18:29:59' and type='1'
   ) as temp2 
   on temp1.account_id=temp2.acc_id

在temp1别名中,我们获得了当前余额,在temp2别名中,我们获得了在报告月份之后进行交易的那些用户的旧余额。

© www.soinside.com 2019 - 2024. All rights reserved.