我在vb.net中有一个2D数组(出于测试目的,它只有1000行,但实际上最多可达2.5米)。
该阵列可以分为三组,这取决于SQL Server,TD和UID中的两个字段:
我的原始代码只是循环遍历数组的每一行
伪代码显示结构
for row_id = 0 to upper_bound
"select ... where TD = " & Pricetable(0, rowid)
executereader
if sqlreader.Hasrows()
... (getting UID from reader and other validation)
closereader
If UID <> UID_VBarray
"delete from ... "
executenonquery
InsertPrices(command, UID, rowid)
end if
else
closereader
InsertPrices(command, UID, rowid)
end if
next row_id
这个速度非常慢,1200条记录大约需要3分钟。我假设这是因为sql server和vb.net之间的所有开销通信。
所以我想最小化查询量,但问题是我应该在SQL Server或vb.net中执行逻辑。
从sql server读入每个记录(TD,UID)到vb.net中的另一个数组似乎有点过分(更不用说我可能需要更多ram lol),但是逻辑很容易,我可以做一些插入。
我可以将整个阵列发送到SQL Server吗?是否建议发送一个查询,其中包含... IN / not in(250万个元素的列表)(显然构建带有循环连接的字符串)
我最终将数组转换为数据表并使用sqlbulkcopy。然后我只使用了sql命令,让逻辑放在我的sql中。注意:我使用的是动态sql,因为sql注入不是威胁,这是针对用户自己的本地实例。
For rowid = 0 To upper_bound
If PriceTable(0, rowid) <> Nothing Then
workrow = DTPriceTable.NewRow()
For columnid = 0 To 14
workrow(columnid) = PriceTable(columnid, rowid)
Next
DTPriceTable.Rows.Add(workrow)
i += 1
If i > 10000 Then
i = 0
SQLServerInfo.SQLUpdateInfo("Preparing records to be written", 0, 7, rowid / upper_bound)
End If
End If
Next
SQLServerInfo.SQLUpdateInfo("Write Prices to empty table", 1, 7)
Using bc As SqlBulkCopy = New SqlBulkCopy(myConn)
bc.DestinationTableName = "dbo.NewPrices"
Try
bc.WriteToServer(DTPriceTable)
Console.WriteLine("Wrote to db")
Catch ex As Exception
MsgBox("Failed to write to Db")
MsgBox(ex.Message)
End Try
End Using
SQLServerInfo.SQLUpdateInfo("Transfering splits and dividends to SavedInfo Table", 2, 7)
command.CommandText =
"Insert into SavedInfo
Select ticker, [date], ex_dividend, split_ratio, tickerdate
From [dbo].[NewPrices]
Where ex_dividend <> 0 Or split_ratio <> 1"
Dim DailySplitsandDiv As Integer = command.ExecuteNonQuery()
'Update the UID column
command.CommandText = "
update NewPrices
Set UID_All_Concat = CONCAT([tickerdate]
, [open]
, [high]
, [low]
, [close]
, [volume]
, [ex_dividend]
, [split_ratio]
, [adj_open]
, [adj_high]
, [adj_low]
, [adj_close]
, [adj_volume])"
command.ExecuteNonQuery()
'New tickerdates
SQLServerInfo.SQLUpdateInfo("Inserting new prices", 3, 7)
command.CommandText =
"Insert into Prices
Select
np.[ticker]
, np.[date]
, np.[open]
, np.[high]
, np.[low]
, np.[close]
, np.[volume]
, np.[ex_dividend]
, np.[split_ratio]
, np.[adj_open]
, np.[adj_high]
, np.[adj_low]
, np.[adj_close]
, np.[adj_volume]
, np.UID_All_Concat
, np.tickerdate
From [dbo].[NewPrices] as np
Left Join Prices on np.tickerdate = Prices.tickerdate
where Prices.tickerdate Is null"
Dim RowsInserted1 As Integer = command.ExecuteNonQuery()
'TD match UID no match (Historical data changed due to splits or dividends)
'First delete the records that need to be updated
SQLServerInfo.SQLUpdateInfo("Updating historical prices for splits and dividends", 4, 7)
command.CommandText =
"Delete Prices
From [dbo].[Prices]
Left Join NewPrices np on np.tickerdate = Prices.tickerdate
where np.UID_All_Concat <> Prices.UID_All_Concat"
Dim rowsDeletedduetoSandD As Integer = command.ExecuteNonQuery()
SQLServerInfo.SQLUpdateInfo("Updating historical prices for splits and dividends", 5, 7)
command.CommandText =
"Insert into Prices
Select
np.[ticker]
, np.[date]
, np.[open]
, np.[high]
, np.[low]
, np.[close]
, np.[volume]
, np.[ex_dividend]
, np.[split_ratio]
, np.[adj_open]
, np.[adj_high]
, np.[adj_low]
, np.[adj_close]
, np.[adj_volume]
, np.UID_All_Concat
, np.tickerdate
From [dbo].[NewPrices] as np
Left Join Prices on np.tickerdate = Prices.tickerdate
where Prices.tickerdate is null"
Dim RowsInserted2 As Integer = command.ExecuteNonQuery()
'Cleanup NewPrices table
SQLServerInfo.SQLUpdateInfo("Cleanup newprices table", 6, 7)
command.CommandText = "Delete from dbo.NewPrices"
command.ExecuteNonQuery()
SQLServerInfo.SQLUpdateInfo("Finished", 7, 7, 100)
End Sub
与之前的评论一样 - SQL在此任务中表现最佳,但是,我确实想提一下处理大量数据时的一些技巧。
字典 - 对这些字典有一个很好的了解,因为它们使用大型数据比列表更快,而且在某些情况下比SQL更快。根据你的结果需要,我可以将~750k的电子邮件记录加载到字典<4秒内从SQL开始但是一旦你在客户端上有它们我就可以立即拉出其中任何一个,查询它们是毫秒!
你在RAM上是正确的,将所有数据加载到客户端机器上是非常昂贵的,更不用说可能容易损坏,特别是如果在后台进行recodsets的更新。
如果您确实想查询SQL,请记住con.open ...>>stuff>> ... con.close
需要很长时间才能处理,因此请尽量避免查询每条记录。始终一次性获取所需数据,然后(如果需要)执行计算。
我知道这不是一个答案,希望它能给你一些好的指示和想法:)
小鸡