我有一个股票信息数据库,并正在尝试从中挖掘数据。
首先设置索引:
CREATE INDEX IF NOT EXISTS s1 on Income (symbol, period);
CREATE INDEX IF NOT EXISTS s2 on BalanceSheet (symbol, period);
CREATE INDEX IF NOT EXISTS s3 on CashFlow (symbol, period);
然后我建立了一个临时表:
DROP TABLE IF EXISTS _;
CREATE TABLE _(symbol, period);
INSERT INTO _(symbol, period) VALUES ('AAPL', 'Annual');
然后我做我的选择:
SELECT
a.yearmonth [Date], a.symbol, a.periodtype [Period],
-- Income and revenues
COALESCE(MAX(CASE WHEN a.statementitem = 'Revenues' THEN a.value END), 0) Revenue,
COALESCE(MAX(CASE WHEN a.statementitem = 'Other Operating Expense/(Income)' THEN a.value END),0) [Other Operating Expense/(Income)],
-- ...
-- ...
-- ...
COALESCE(MAX(CASE WHEN c.statementitem = 'Depreciation & Amortization, Total' THEN c.value END),0) [Depreciation & Amortization, Total], -- This one comes from CashFlow
-- ...
-- ...
-- ...
COALESCE(MAX(CASE WHEN b.statementitem = 'Cash And Equivalents' THEN b.value END),0) [Cash And Equivalents],
COALESCE(MAX(CASE WHEN b.statementitem = 'Total Cash & ST Investments' THEN b.value END),0) [Total Cash & ST Investments],
-- ...
-- ...
-- ...
FROM _
INNER JOIN Income a ON a.symbol = _.symbol AND a.periodtype = _.period
INNER JOIN BalanceSheet b ON b.symbol = _.symbol AND b.periodtype = _.period
INNER JOIN CashFlow c ON c.symbol=_.symbol AND c.periodtype = _.period
GROUP BY a.yearmonth, a.symbol, a.periodtype
以上两件事:
请问如何做才能使其更快?
所有表上的索引应在(symbol, period)
上。但是,我认为这不会产生很大的影响。
据我所知,您无能为力。数据需要聚合大量数据,这大概是花费时间。
您所有的联接都是INNER
联接,但实际上您是在对表进行CROSS
联接:Income
,BalanceSheet
和CashFlow
过滤掉symbol = 'AAPL' and periodtype = 'Annual'
所在的行。我看不到临时表的需要。3个表的cross
联接总是很昂贵,索引可能会有所帮助。您可以做的是使cross
joinσ的结果更浅,是在连接之前删除临时表并进行过滤:
...............................
FROM (SELECT * FROM Income WHERE symbol = 'AAPL' AND periodtype = 'Annual') a
CROSS JOIN (SELECT * FROM BalanceSheet WHERE symbol = 'AAPL' AND periodtype = 'Annual') b
CROSS JOIN (SELECT * FROM CashFlow WHERE symbol = 'AAPL' AND periodtype = 'Annual') c
GROUP BY a.yearmonth, a.symbol, a.periodtype