转动数组字母数字的每个成员的最快方法是什么?

问题描述 投票:13回答:6

最终结果:

我想知道如果字符串更长,下面的结果是否会改变。我在同一台计算机上进行了完全相同的测试,不同的是每个单元都有一个由34个字符而不是四个字符组成的随机字符串。这些是结果:

Comintern (Regexp):       136.1  ms  
brettdj (Regexp):         139.9  ms  
Slai (Regexp):            158.4  ms  
*Original Regex:          161.0  ms*    
Comintern (AN):           170.1  ms  
Comintern (Hash):         183.6  ms  
ThunderFrame:             232.9  ms    
*Original replace:        372.9  ms*  
*Original InStr:          478.1  ms*  
CallumDA33:              1218.1 ms

这确实显示了Regex的速度-所有使用Regex.replace的解决方案都明显更快,最好的方法就是Comintern的实现。

总之,如果字符串很长,请使用数组;如果字符串很短,请使用剪贴板。如果不确定,最佳结果是使用数组,但这可能会牺牲短字符串的性能。

最终结果:

非常感谢您的所有建议,很明显,我还有很多东西要学习。昨天我一直在考虑这件事,所以我决定在家里重新运行所有内容。这是根据将每个字符串应用于30,000个四个字符串得出的最终结果。

我在家中的计算机是Intel i7 @ 3.6 GHz,8GB RAM,64位Windows 10和Excel2016。与以前相似的条件是,我在后台运行进程,但是在整个过程中我都没有积极地做任何事情。测试。

Original replace:  97.67  ms
Original InStr:    106.54 ms
Original Regex:    113.46 ms
ThunderFrame:      82.21  ms
Comintern (AN):    96.98  ms
Comintern (OR):    81.87  ms
Comintern (Hash):  101.18 ms
brettdj:           81.66  ms
CallumDA33:        201.64 ms
Slai:              68.38  ms

因此,我接受了Slai的回答,因为它显然是常规实施中最快的方法,但是我将根据实际数据重新运行它们,以检查是否仍然有效。


原始帖子:

我在Excel中有一个数组,该数组是零件编号的列表。我需要将数组的每个成员都设为字母数字,例如

ABC123-001 -> ABC123001
ABC123/001 -> ABC123001
ABC123001  -> ABC123001

最快的方法是什么?

对于上下文,我们的零件号可以采用不同的形式,因此我正在编写一个函数,该函数在给定范围内找到最佳匹配。此刻,使所有字母数字内容运行的功能部分大约需要50毫秒才能运行,而该功能的其余部分总共需要30毫秒。我也无法避免使用Excel。

我自己完成了一些工作(请参见下面的答案),但是主要问题是我必须一个接一个地遍历数组的每个元素-会有更好的方法吗?我也从未进行过测试,因此,对改进它们的任何反馈将不胜感激。

[这是到目前为止我尝试过的。

我正在使用MicroTimer,并且我的计算机具有Intel i5 @ 2.5GHz,4GB RAM,64位Windows7。我在后台运行进程,但是在此期间我没有积极进行任何其他操作这些都运行。

我使用此代码创建了30,000行随机符号:

=CHAR(RANDBETWEEN(1,60))&CHAR(RANDBETWEEN(48,57))&CHAR(RANDBETWEEN(37,140))&CHAR(RANDBETWEEN(37,140))
[里面的字符。)

1。使用基于案例的循环。平均时间:175毫秒

使用char(61)中的函数,我们将范围加载到数组中,将该函数应用于数组的每个元素,然后将其粘贴回去。代码:

this post

2。使用InStr()检查每个字符。平均时间:201毫秒

定义一串有效值。如果有效值出现在数组元素中,请一一检查:

Function AlphaNumericOnly(strSource As Variant) As String Dim i As Integer Dim strResult As String For i = 1 To Len(strSource) Select Case Asc(Mid(strSource, i, 1)) Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space strResult = strResult & Mid(strSource, i, 1) End Select Next AlphaNumericOnly = strResult End Function Sub Replace() Dim inputSh As Worksheet Dim inputRng As Range Set inputSh = Sheets("Data") Set inputRng = inputSh.Range("A1:A30000") Dim outputSh As Worksheet Dim outputRng As Range Set outputSh = Sheets("Replace") Set outputRng = outputSh.Range("A1:A30000") Dim time1 As Double, time2 As Double time1 = MicroTimer Dim arr As Variant arr = inputRng Dim i As Integer For i = LBound(arr) To UBound(arr) arr(i, 1) = AlphaNumericOnly(arr(i, 1)) Next i outputRng = arr time2 = MicroTimer Debug.Print (time2 - time1) * 1000 End Sub

3。在数组上使用regex.Replace。时间:171毫秒

定义一个正则表达式,并使用它来替换数组的每个元素。

Sub InStr() Dim inputSh As Worksheet Dim inputRng As Range Set inputSh = Sheets("Data") Set inputRng = inputSh.Range("A1:A30000") Dim outputSh As Worksheet Dim outputRng As Range Set outputSh = Sheets("InStr") Set outputRng = outputSh.Range("A1:A30000") Dim time1 As Double, time2 As Double time1 = MicroTimer Dim arr As Variant arr = inputRng Dim validValues As String validValues = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 'put numbers and capitals at the start as they are more likely' Dim i As Integer, j As Integer Dim result As String For i = LBound(arr) To UBound(arr) result = vbNullString For j = 1 To Len(arr(i, 1)) If InStr(validValues, Mid(arr(i, 1), j, 1)) <> 0 Then result = result & Mid(arr(i, 1), j, 1) End If Next j arr(i, 1) = result Next i outputRng = arr time2 = MicroTimer Debug.Print (time2 - time1) * 1000 End Sub


编辑:

@@ ThunderFrame-我们的零件号通常采用以下格式:

    所有数字(例如32523452)
  • 字母和数字的混合(例如AB324K234或123H45645)
  • 由非字母数字字符(例如ABC001-001,ABC001 / 001、123 / 4557-121)链接的字母和数字的混合]
  • 我已经考虑过在启动替换字符串之前在每个字符串上使用regex.test,但是我不确定这是否只是复制字符串然后对其进行测试,在这种情况下,我也可以直接进行替换与。

    @@ Slai-感谢您的链接-我将对此进行更详细的研究

  • arrays regex excel vba excel-vba
    6个回答
    7
    投票
    不确定这是否会更快,因为它取决于太多因素,但可能值得测试。您可以从剪贴板中获取复制的Range文本并立即替换所有值,而不必分别替换Regex。请注意,Sub Regex() Dim inputSh As Worksheet Dim inputRng As Range Set inputSh = Sheets("Data") Set inputRng = inputSh.Range("A1:A30000") Dim outputSh As Worksheet Dim outputRng As Range Set outputSh = Sheets("Regex") Set outputRng = outputSh.Range("A1:A30000") Dim time1 As Double, time2 As Double time1 = MicroTimer Dim arr As Variant arr = inputRng Dim objRegex As Object Set objRegex = CreateObject("vbscript.regexp") With objRegex .Global = True .ignorecase = True .Pattern = "[^\w]" End With Dim i As Integer For i = LBound(arr) To UBound(arr) arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString) Next i outputRng = arr time2 = MicroTimer Debug.Print (time2 - time1) * 1000 End Sub 也匹配下划线和Unicode字母,因此在正则表达式中指定的更具体可以使其更快。

    \w

    我期望由于剪贴板的开销,它对于较小的值会较慢,而对于更多的值可能会由于所需的内存而较慢。 

    禁用事件在我的测试中似乎没有什么作用,但可能值得尝试。

    请注意,宏正在使用剪贴板时,另一个应用程序使用剪贴板的可能性很小。

    如果早期绑定导致在不同计算机上运行相同的编译宏时出现问题,则可以搜索'[a1:b30000] = [{"ABC123-009",""}]: Dim t As Double: t = Timer ' used for testing Dim r As Range, s As String Set r = ThisWorkbook.Worksheets("Data").UsedRange.Resize(, 1) ' Data!A1:A30000 With New MSForms.DataObject ' needs reference to "Microsoft Forms 2.0 Object Library" or use a bit slower late binding - With CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") r.Copy .GetFromClipboard Application.CutCopyMode = False s = .GetText .Clear ' optional - clear the clipboard if using Range.PasteSpecial instead of Worksheet.PasteSpecial "Text" With New RegExp ' needs reference to "Microsoft VBScript Regular Expressions 5.5" or use a bit slower late binding - With CreateObject("VBScript.RegExp") .Global = True '.IgnoreCase = False ' .IgnoreCase is False by default .Pattern = "[^0-9A-Za-z\r\n]+" ' because "[^\w\r\n]+" also matches _ and Unicode letters s = .Replace(s, vbNullString) End With .SetText s .PutInClipboard End With ' about 70% of the time is spent here in pasting the data r(, 2).PasteSpecial 'xlPasteValues ' paste the text from clipboard in B1 'Debug.Print Timer - t 或删除引用并切换到后期绑定。


    7
    投票

    tl; dr-正则表达式破坏了VBA实现。如果这是代码挑战,则@brettj或@Slai应该赢得大奖。

    有很多技巧可以使您的macro decompiler更快。

    首先,您可以通过将其视为字节数组而不是字符串来摆脱绝大多数函数调用。这将删除对AlphaNumericOnlyMid$的所有调用。尽管它们是非常快的功能,但它们仍然增加了将开销推入和弹出调用堆栈的开销。这总共有数十万次迭代。

    第二种优化是,如果可以避免,则不要使用Asc语法。原因与它的编译方式有关-它没有像Case x To y这样的测试进行编译,它实际上创建了一个具有这样的提前退出条件的循环:

    Case = Condition >= x And Condition <= y

    再次,不是

    huge表现出色,但总和起来。第三种优化是对测试进行排序,以使其按照数据集中最有可能的结果对电路进行排序。我在下面针对主要字母量身定制了示例,其中大多数都是大写的。使用不同的顺序可能会做得更好。放在一起,您将得到如下所示的内容:

    Case = False For i = x To y If Condition = i Then Case = True End If Next
    它是怎么做的?很好:

    Public Function ByteAlphaNumeric(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) 'Load the array up. Dim bound As Long bound = UBound(chars) 'Size the outbound array. ReDim outVal(bound) Dim i As Long, pos As Long For i = 0 To bound Step 2 'Wide characters, only care about the ASCII range. Dim temp As Byte temp = chars(i) 'Pointer math isn't free. Cache it. Select Case True 'Order is important here. Case temp > 64 And temp < 91 outVal(pos) = temp pos = pos + 2 'Advance the output pointer. Case temp < 48 Case temp > 122 Case temp > 96 outVal(pos) = temp pos = pos + 2 Case temp < 58 outVal(pos) = temp pos = pos + 2 End Select Next 'This is likely the most expensive operation. ReDim Preserve outVal(pos) 'Trim the output array. ByteAlphaNumeric = outVal End Function

    结果为随机包含15个字符的Public Sub Benchmark()
        Dim starting As Single, i As Long, dummy As String, sample As Variant
    
        sample = GetRandomString
    
        starting = Timer
        For i = 1 To 1000000
            dummy = AlphaNumericOnlyOP(sample)
        Next i
        Debug.Print "OP's AlphaNumericOnly: ", Timer - starting
    
        starting = Timer
        For i = 1 To 1000000
            dummy = AlphaNumericOnlyThunderframe(sample)
        Next i
        Debug.Print "ThunderFrame's AlphaNumericOnly: ", Timer - starting
    
        starting = Timer
        For i = 1 To 1000000
            dummy = AlphaNumeric(sample)
        Next i
        Debug.Print "CallumDA33's AlphaNumeric: ", Timer - starting
    
        starting = Timer
        For i = 1 To 1000000
            dummy = ByteAlphaNumeric(sample)
        Next i
        Debug.Print "ByteAlphaNumeric: ", Timer - starting
    
        Dim cast As String
        cast = CStr(sample)
        starting = Timer
        For i = 1 To 1000000
            dummy = ByteAlphaNumericString(cast)
        Next i
        Debug.Print "ByteAlphaNumericString: ", Timer - starting
    
        Set stripper = Nothing
        starting = Timer
        For i = 1 To 1000000
            dummy = OptimizedRegex(sample)
        Next i
        Debug.Print "OptimizedRegex: ", Timer - starting
    
    End Sub
    
    Private Function GetRandomString() As Variant
        Dim chars(30) As Byte, i As Long
        Randomize
        For i = 0 To 30 Step 2
            chars(i) = Int(96 * Rnd + 32)
        Next i
        Dim temp As String
        temp = chars
        GetRandomString = CVar(temp)
    End Function
    

    String
    注意,我省略了转换为函数的微不足道的提交。您可能会注意到2个附加测试-OP`s AlphaNumericOnly:                     6.565918 
    ThunderFrame`s AlphaNumericOnly:           3.617188 
    CallumDA33`s AlphaNumeric:                23.518070 
    ByteAlphaNumeric:                          2.354980
    ByteAlphaNumericString函数完全相同,但是它使用ByteAlphaNumeric作为输入而不是String,并且摆脱了强制转换。这不是小事:

    Variant
    最后,是难以捉摸的ByteAlphaNumericString:                    2.226074
    函数(基本上是@brettj的函数形式的代码,用于比较定时):

    OptimizedRegex

    Private stripper As RegExp 'Module level Function OptimizedRegex(strSource As Variant) As String If stripper Is Nothing Then Set stripper = New RegExp With stripper .Global = True .Pattern = "[^0-9A-Za-z]" End With End If OptimizedRegex = stripper.Replace(strSource, vbNullString) End Function
    编辑:奖励实施!

    [我发现哈希表查找可能比OptimizedRegex: 1.094727 结构快,所以我用Select Case构建了一个:

    Scripting.Dictionary

    事实证明不是太破旧:

    Private hash As Scripting.Dictionary 'Module level Function HashLookups(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) Dim bound As Long bound = UBound(chars) ReDim outVal(bound) Dim i As Long, pos As Long With hash For i = 0 To bound Step 2 Dim temp As Byte temp = chars(i) If .Exists(temp) Then outVal(pos) = temp pos = pos + 2 End If Next End With ReDim Preserve outVal(pos) HashLookups = outVal End Function Private Sub LoadHashTable() Set hash = New Scripting.Dictionary Dim i As Long For i = 48 To 57 hash.Add i, vbNull Next For i = 65 To 90 hash.Add i, vbNull Next For i = 97 To 122 hash.Add i, vbNull Next End Sub 'Test code: starting = Timer LoadHashTable For i = 1 To 1000000 dummy = HashLookups(sample) Next i Debug.Print "HashLookups: ", Timer - starting
    最终版本

    醒来,以为我会尝试向量查找而不是哈希查找(只需填充值的字节数组即可保留并用于测试)。这似乎很合理,因为它只是一个256个元素的数组-本质上是一个真值表:

    HashLookups: 1.655273

    假设查找表仅生成一次,它的时钟运行速度比上述任何其他纯VBA方法快10-15%。

    5
    投票
    贷记给ThunderFrame(我是LHS Private lookup(255) As Boolean 'Module level Function VectorLookup(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) Dim bound As Long bound = UBound(chars) ReDim outVal(bound) Dim i As Long, pos As Long For i = 0 To bound Step 2 Dim temp As Byte temp = chars(i) If lookup(temp) Then outVal(pos) = temp pos = pos + 2 End If Next ReDim Preserve outVal(pos) VectorLookup = outVal End Function Private Sub GenerateTable() Dim i As Long For i = 48 To 57 lookup(i) = True Next For i = 65 To 90 lookup(i) = True Next For i = 97 To 122 lookup(i) = True Next End Sub 的傻子),但我通过早期的Mid$有了更小的调整,获得了更好的性能:

      使用RegExp而不是Value2
    • long而不是integer声明循环
    • Value是多余的
  • code

    .ignorecase = True

  • 4
    投票
    如果将您的第一个(也是目前效果最好的例程)的功能更改为以下内容,根据数据的不同,您将获得至少40-50%的性能提升:

    Sub Replace2() Dim inputSh As Worksheet Dim inputRng As Range Set inputSh = Sheets("Data") Set inputRng = inputSh.Range("A1:A30000") Dim outputSh As Worksheet Dim outputRng As Range Set outputSh = Sheets("Replace") Set outputRng = outputSh.Range("A1:A30000") Dim time1 As Double, time2 As Double time1 = MicroTimer Dim arr As Variant Dim objRegex As VBScript_RegExp_55.RegExp Dim i As Long Set objRegex = CreateObject("vbscript.regexp") With objRegex .Global = True .Pattern = "[^\w]" End With arr = inputRng.Value2 For i = LBound(arr) To UBound(arr) arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString) Next i outputRng.Value2 = arr time2 = MicroTimer Debug.Print (time2 - time1) * 1000 End Sub

    我使用了一些优化方法,但是首先,您在一个循环中多次分配了Function AlphaNumericOnly(strSource As Variant) As String
        Dim i As Long
        Dim charCount As Long
        Dim strResult As String
        Dim char As String
        strResult = Space$(Len(strSource))
        For i = 1 To Len(strSource)
            char = Mid$(strSource, i, 1)
            Select Case Asc(char)
                Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space
                    charCount = charCount + 1
                    Mid$(strResult, charCount, 1) = char
            End Select
        Next
        AlphaNumericOnly = Left$(strResult, charCount)
    End Function
    ,这非常昂贵,而当字符串更长(并且循环运行更多次)时,则更加昂贵。最好使用strResult

    而且,使用$后缀的函数已针对字符串进行了优化,因此您也可以在其中获得更好的性能

    优化RegEx版本

    您的Regex方法具有合理的性能,但是您使用的是后期绑定Mid$,它将作为早绑定,强类型引用更快。

    此外,您的Regex模式和选项每次都相同,您可以将regex对象声明为变量,并且仅在不存在时创建它,然后每次重新使用现有的regex。


    0
    投票
    如果没有其他内容,我将把它扔在那里,看看它的性能如何。我相信也可以整理一下。

    我希望测试字符是否为字母的方法更快。我敢肯定,测试数字可以更快一些。

    CreateObject


    0
    投票
    有趣的问题。原始InStr方法应该比这里显示的结果快得多。

    性能差是由于字符串连接,而VBA则不擅长。字符串越长,效果越差。

    我下面的InStr方法的版本根本不使用串联。它比原始速度快很多倍。实际上,它的执行速度与后期Regex匹配。当然,早期绑定的Regex仍然要快两倍。但是此InStr版本完全是VBA固有的,并且非常非常快。相对于连接,源字符串越长,获取速度越快。

    此方法还通过使用字符串函数的($)版本而不是变体版本来降低性能。 InStrB比InStr快一点。并且使用临时字符串变量(t和arx)也节省了大量时间。

    Function AlphaNumeric(s As String) As String Dim char As String, tempStr As String Dim i As Integer Dim t As Variant For i = 1 To Len(s) char = Mid(s, i, 1) If IsLetter(char) Or IsNumber(char) Then tempStr = tempStr & char End If Next i AlphaNumeric = tempStr End Function Private Function IsLetter(s As String) As Boolean If UCase(s) = s And LCase(s) = s Then IsLetter = False Else: IsLetter = True End If End Function Private Function IsNumber(s As String) On Error GoTo 1 s = s * 1 IsNumber = True Exit Function 1: IsNumber = False End Function

    © www.soinside.com 2019 - 2024. All rights reserved.