最终结果:
我想知道如果字符串更长,下面的结果是否会改变。我在同一台计算机上进行了完全相同的测试,不同的是每个单元都有一个由34个字符而不是四个字符组成的随机字符串。这些是结果:
Comintern (Regexp): 136.1 ms
brettdj (Regexp): 139.9 ms
Slai (Regexp): 158.4 ms
*Original Regex: 161.0 ms*
Comintern (AN): 170.1 ms
Comintern (Hash): 183.6 ms
ThunderFrame: 232.9 ms
*Original replace: 372.9 ms*
*Original InStr: 478.1 ms*
CallumDA33: 1218.1 ms
这确实显示了Regex的速度-所有使用Regex.replace的解决方案都明显更快,最好的方法就是Comintern的实现。
总之,如果字符串很长,请使用数组;如果字符串很短,请使用剪贴板。如果不确定,最佳结果是使用数组,但这可能会牺牲短字符串的性能。
最终结果:
非常感谢您的所有建议,很明显,我还有很多东西要学习。昨天我一直在考虑这件事,所以我决定在家里重新运行所有内容。这是根据将每个字符串应用于30,000个四个字符串得出的最终结果。
我在家中的计算机是Intel i7 @ 3.6 GHz,8GB RAM,64位Windows 10和Excel2016。与以前相似的条件是,我在后台运行进程,但是在整个过程中我都没有积极地做任何事情。测试。
Original replace: 97.67 ms
Original InStr: 106.54 ms
Original Regex: 113.46 ms
ThunderFrame: 82.21 ms
Comintern (AN): 96.98 ms
Comintern (OR): 81.87 ms
Comintern (Hash): 101.18 ms
brettdj: 81.66 ms
CallumDA33: 201.64 ms
Slai: 68.38 ms
因此,我接受了Slai的回答,因为它显然是常规实施中最快的方法,但是我将根据实际数据重新运行它们,以检查是否仍然有效。
原始帖子:
我在Excel中有一个数组,该数组是零件编号的列表。我需要将数组的每个成员都设为字母数字,例如
ABC123-001 -> ABC123001
ABC123/001 -> ABC123001
ABC123001 -> ABC123001
最快的方法是什么?
对于上下文,我们的零件号可以采用不同的形式,因此我正在编写一个函数,该函数在给定范围内找到最佳匹配。此刻,使所有字母数字内容运行的功能部分大约需要50毫秒才能运行,而该功能的其余部分总共需要30毫秒。我也无法避免使用Excel。
我自己完成了一些工作(请参见下面的答案),但是主要问题是我必须一个接一个地遍历数组的每个元素-会有更好的方法吗?我也从未进行过测试,因此,对改进它们的任何反馈将不胜感激。
[这是到目前为止我尝试过的。
我正在使用MicroTimer,并且我的计算机具有Intel i5 @ 2.5GHz,4GB RAM,64位Windows7。我在后台运行进程,但是在此期间我没有积极进行任何其他操作这些都运行。
我使用此代码创建了30,000行随机符号:
=CHAR(RANDBETWEEN(1,60))&CHAR(RANDBETWEEN(48,57))&CHAR(RANDBETWEEN(37,140))&CHAR(RANDBETWEEN(37,140))
[里面的字符。)1。使用基于案例的循环。平均时间:175毫秒
使用char(61)
中的函数,我们将范围加载到数组中,将该函数应用于数组的每个元素,然后将其粘贴回去。代码:定义一串有效值。如果有效值出现在数组元素中,请一一检查:2。使用InStr()检查每个字符。平均时间:201毫秒
Function AlphaNumericOnly(strSource As Variant) As String
Dim i As Integer
Dim strResult As String
For i = 1 To Len(strSource)
Select Case Asc(Mid(strSource, i, 1))
Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space
strResult = strResult & Mid(strSource, i, 1)
End Select
Next
AlphaNumericOnly = strResult
End Function
Sub Replace()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("Replace")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim arr As Variant
arr = inputRng
Dim i As Integer
For i = LBound(arr) To UBound(arr)
arr(i, 1) = AlphaNumericOnly(arr(i, 1))
Next i
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
定义一个正则表达式,并使用它来替换数组的每个元素。3。在数组上使用regex.Replace。时间:171毫秒
Sub InStr()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("InStr")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim arr As Variant
arr = inputRng
Dim validValues As String
validValues = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 'put numbers and capitals at the start as they are more likely'
Dim i As Integer, j As Integer
Dim result As String
For i = LBound(arr) To UBound(arr)
result = vbNullString
For j = 1 To Len(arr(i, 1))
If InStr(validValues, Mid(arr(i, 1), j, 1)) <> 0 Then
result = result & Mid(arr(i, 1), j, 1)
End If
Next j
arr(i, 1) = result
Next i
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
编辑:
@@ ThunderFrame-我们的零件号通常采用以下格式:所有数字(例如32523452)
- 字母和数字的混合(例如AB324K234或123H45645)
- 由非字母数字字符(例如ABC001-001,ABC001 / 001、123 / 4557-121)链接的字母和数字的混合]
我已经考虑过在启动替换字符串之前在每个字符串上使用regex.test,但是我不确定这是否只是复制字符串然后对其进行测试,在这种情况下,我也可以直接进行替换与。 @@ Slai-感谢您的链接-我将对此进行更详细的研究
Sub Regex()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("Regex")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim arr As Variant
arr = inputRng
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.ignorecase = True
.Pattern = "[^\w]"
End With
Dim i As Integer
For i = LBound(arr) To UBound(arr)
arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString)
Next i
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
也匹配下划线和Unicode字母,因此在正则表达式中指定的更具体可以使其更快。 \w
我期望由于剪贴板的开销,它对于较小的值会较慢,而对于更多的值可能会由于所需的内存而较慢。禁用事件在我的测试中似乎没有什么作用,但可能值得尝试。
请注意,宏正在使用剪贴板时,另一个应用程序使用剪贴板的可能性很小。
如果早期绑定导致在不同计算机上运行相同的编译宏时出现问题,则可以搜索
'[a1:b30000] = [{"ABC123-009",""}]: Dim t As Double: t = Timer ' used for testing Dim r As Range, s As String Set r = ThisWorkbook.Worksheets("Data").UsedRange.Resize(, 1) ' Data!A1:A30000 With New MSForms.DataObject ' needs reference to "Microsoft Forms 2.0 Object Library" or use a bit slower late binding - With CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") r.Copy .GetFromClipboard Application.CutCopyMode = False s = .GetText .Clear ' optional - clear the clipboard if using Range.PasteSpecial instead of Worksheet.PasteSpecial "Text" With New RegExp ' needs reference to "Microsoft VBScript Regular Expressions 5.5" or use a bit slower late binding - With CreateObject("VBScript.RegExp") .Global = True '.IgnoreCase = False ' .IgnoreCase is False by default .Pattern = "[^0-9A-Za-z\r\n]+" ' because "[^\w\r\n]+" also matches _ and Unicode letters s = .Replace(s, vbNullString) End With .SetText s .PutInClipboard End With ' about 70% of the time is spent here in pasting the data r(, 2).PasteSpecial 'xlPasteValues ' paste the text from clipboard in B1 'Debug.Print Timer - t
或删除引用并切换到后期绑定。
tl; dr-正则表达式破坏了VBA实现。如果这是代码挑战,则@brettj或@Slai应该赢得大奖。
有很多技巧可以使您的macro decompiler更快。首先,您可以通过将其视为字节数组而不是字符串来摆脱绝大多数函数调用。这将删除对AlphaNumericOnly
和Mid$
的所有调用。尽管它们是非常快的功能,但它们仍然增加了将开销推入和弹出调用堆栈的开销。这总共有数十万次迭代。
第二种优化是,如果可以避免,则不要使用Asc
语法。原因与它的编译方式有关-它没有像Case x To y
这样的测试进行编译,它实际上创建了一个具有这样的提前退出条件的循环:
Case = Condition >= x And Condition <= y
再次,不是huge表现出色,但总和起来。第三种优化是对测试进行排序,以使其按照数据集中最有可能的结果对电路进行排序。我在下面针对主要字母量身定制了示例,其中大多数都是大写的。使用不同的顺序可能会做得更好。放在一起,您将得到如下所示的内容:
Case = False For i = x To y If Condition = i Then Case = True End If Next
它是怎么做的?很好:
Public Function ByteAlphaNumeric(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) 'Load the array up. Dim bound As Long bound = UBound(chars) 'Size the outbound array. ReDim outVal(bound) Dim i As Long, pos As Long For i = 0 To bound Step 2 'Wide characters, only care about the ASCII range. Dim temp As Byte temp = chars(i) 'Pointer math isn't free. Cache it. Select Case True 'Order is important here. Case temp > 64 And temp < 91 outVal(pos) = temp pos = pos + 2 'Advance the output pointer. Case temp < 48 Case temp > 122 Case temp > 96 outVal(pos) = temp pos = pos + 2 Case temp < 58 outVal(pos) = temp pos = pos + 2 End Select Next 'This is likely the most expensive operation. ReDim Preserve outVal(pos) 'Trim the output array. ByteAlphaNumeric = outVal End Function
结果为随机包含15个字符的Public Sub Benchmark() Dim starting As Single, i As Long, dummy As String, sample As Variant sample = GetRandomString starting = Timer For i = 1 To 1000000 dummy = AlphaNumericOnlyOP(sample) Next i Debug.Print "OP's AlphaNumericOnly: ", Timer - starting starting = Timer For i = 1 To 1000000 dummy = AlphaNumericOnlyThunderframe(sample) Next i Debug.Print "ThunderFrame's AlphaNumericOnly: ", Timer - starting starting = Timer For i = 1 To 1000000 dummy = AlphaNumeric(sample) Next i Debug.Print "CallumDA33's AlphaNumeric: ", Timer - starting starting = Timer For i = 1 To 1000000 dummy = ByteAlphaNumeric(sample) Next i Debug.Print "ByteAlphaNumeric: ", Timer - starting Dim cast As String cast = CStr(sample) starting = Timer For i = 1 To 1000000 dummy = ByteAlphaNumericString(cast) Next i Debug.Print "ByteAlphaNumericString: ", Timer - starting Set stripper = Nothing starting = Timer For i = 1 To 1000000 dummy = OptimizedRegex(sample) Next i Debug.Print "OptimizedRegex: ", Timer - starting End Sub Private Function GetRandomString() As Variant Dim chars(30) As Byte, i As Long Randomize For i = 0 To 30 Step 2 chars(i) = Int(96 * Rnd + 32) Next i Dim temp As String temp = chars GetRandomString = CVar(temp) End Function
:String
注意,我省略了转换为函数的微不足道的提交。您可能会注意到2个附加测试-OP`s AlphaNumericOnly: 6.565918 ThunderFrame`s AlphaNumericOnly: 3.617188 CallumDA33`s AlphaNumeric: 23.518070 ByteAlphaNumeric: 2.354980
与ByteAlphaNumericString
函数完全相同,但是它使用ByteAlphaNumeric
作为输入而不是String
,并且摆脱了强制转换。这不是小事:Variant
最后,是难以捉摸的ByteAlphaNumericString: 2.226074
函数(基本上是@brettj的函数形式的代码,用于比较定时):
OptimizedRegex
Private stripper As RegExp 'Module level Function OptimizedRegex(strSource As Variant) As String If stripper Is Nothing Then Set stripper = New RegExp With stripper .Global = True .Pattern = "[^0-9A-Za-z]" End With End If OptimizedRegex = stripper.Replace(strSource, vbNullString) End Function
编辑:奖励实施![我发现哈希表查找可能比
OptimizedRegex: 1.094727
结构快,所以我用Select Case
构建了一个:
Scripting.Dictionary
事实证明不是太破旧:Private hash As Scripting.Dictionary 'Module level Function HashLookups(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) Dim bound As Long bound = UBound(chars) ReDim outVal(bound) Dim i As Long, pos As Long With hash For i = 0 To bound Step 2 Dim temp As Byte temp = chars(i) If .Exists(temp) Then outVal(pos) = temp pos = pos + 2 End If Next End With ReDim Preserve outVal(pos) HashLookups = outVal End Function Private Sub LoadHashTable() Set hash = New Scripting.Dictionary Dim i As Long For i = 48 To 57 hash.Add i, vbNull Next For i = 65 To 90 hash.Add i, vbNull Next For i = 97 To 122 hash.Add i, vbNull Next End Sub 'Test code: starting = Timer LoadHashTable For i = 1 To 1000000 dummy = HashLookups(sample) Next i Debug.Print "HashLookups: ", Timer - starting
最终版本醒来,以为我会尝试向量查找而不是哈希查找(只需填充值的字节数组即可保留并用于测试)。这似乎很合理,因为它只是一个256个元素的数组-本质上是一个真值表:
HashLookups: 1.655273
假设查找表仅生成一次,它的时钟运行速度比上述任何其他纯VBA方法快10-15%。
Private lookup(255) As Boolean 'Module level
Function VectorLookup(source As Variant) As String
Dim chars() As Byte
Dim outVal() As Byte
chars = CStr(source)
Dim bound As Long
bound = UBound(chars)
ReDim outVal(bound)
Dim i As Long, pos As Long
For i = 0 To bound Step 2
Dim temp As Byte
temp = chars(i)
If lookup(temp) Then
outVal(pos) = temp
pos = pos + 2
End If
Next
ReDim Preserve outVal(pos)
VectorLookup = outVal
End Function
Private Sub GenerateTable()
Dim i As Long
For i = 48 To 57
lookup(i) = True
Next
For i = 65 To 90
lookup(i) = True
Next
For i = 97 To 122
lookup(i) = True
Next
End Sub
的傻子),但我通过早期的Mid$
有了更小的调整,获得了更好的性能:RegExp
而不是Value2
Value
是多余的code
.ignorecase = True
Sub Replace2()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("Replace")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim arr As Variant
Dim objRegex As VBScript_RegExp_55.RegExp
Dim i As Long
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[^\w]"
End With
arr = inputRng.Value2
For i = LBound(arr) To UBound(arr)
arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString)
Next i
outputRng.Value2 = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
我使用了一些优化方法,但是首先,您在一个循环中多次分配了Function AlphaNumericOnly(strSource As Variant) As String Dim i As Long Dim charCount As Long Dim strResult As String Dim char As String strResult = Space$(Len(strSource)) For i = 1 To Len(strSource) char = Mid$(strSource, i, 1) Select Case Asc(char) Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space charCount = charCount + 1 Mid$(strResult, charCount, 1) = char End Select Next AlphaNumericOnly = Left$(strResult, charCount) End Function
,这非常昂贵,而当字符串更长(并且循环运行更多次)时,则更加昂贵。最好使用strResult
。而且,使用$后缀的函数已针对字符串进行了优化,因此您也可以在其中获得更好的性能
优化RegEx版本
您的Regex方法具有合理的性能,但是您使用的是后期绑定Mid$
,它将作为早绑定,强类型引用更快。此外,您的Regex模式和选项每次都相同,您可以将regex对象声明为变量,并且仅在不存在时创建它,然后每次重新使用现有的regex。
我希望测试字符是否为字母的方法更快。我敢肯定,测试数字可以更快一些。
CreateObject
性能差是由于字符串连接,而VBA则不擅长。字符串越长,效果越差。
我下面的InStr方法的版本根本不使用串联。它比原始速度快很多倍。实际上,它的执行速度与后期Regex匹配。当然,早期绑定的Regex仍然要快两倍。但是此InStr版本完全是VBA固有的,并且非常非常快。相对于连接,源字符串越长,获取速度越快。
此方法还通过使用字符串函数的($)版本而不是变体版本来降低性能。 InStrB比InStr快一点。并且使用临时字符串变量(t和arx)也节省了大量时间。
Function AlphaNumeric(s As String) As String
Dim char As String, tempStr As String
Dim i As Integer
Dim t As Variant
For i = 1 To Len(s)
char = Mid(s, i, 1)
If IsLetter(char) Or IsNumber(char) Then
tempStr = tempStr & char
End If
Next i
AlphaNumeric = tempStr
End Function
Private Function IsLetter(s As String) As Boolean
If UCase(s) = s And LCase(s) = s Then
IsLetter = False
Else:
IsLetter = True
End If
End Function
Private Function IsNumber(s As String)
On Error GoTo 1
s = s * 1
IsNumber = True
Exit Function
1:
IsNumber = False
End Function