将文本行拆分为单词,并根据投票决定哪一个是正确的

问题描述 投票:-2回答:1

下面的代码将每一行拆分为单词,并将每行中的第一个单词存储到数组列表中,将第二个单词存储到另一个数组列表中,依此类推。然后,它从每个列表中选择最常用的单词作为正确的单词。

Module Module1

Sub Main()
    Dim correctLine As String = ""
    Dim line1 As String = "Canda has more than ones official language"
    Dim line2 As String = "Canada has more than one oficial languages"
    Dim line3 As String = "Canada has nore than one official lnguage"
    Dim line4 As String = "Canada has nore than one offical language"

    Dim wordsOfLine1() As String = line1.Split(" ")
    Dim wordsOfLine2() As String = line2.Split(" ")
    Dim wordsOfLine3() As String = line3.Split(" ")
    Dim wordsOfLine4() As String = line4.Split(" ")


    For i As Integer = 0 To wordsOfLine1.Length - 1
        Dim wordAllLinesTemp As New List(Of String)(New String() {wordsOfLine1(i), wordsOfLine2(i), wordsOfLine3(i), wordsOfLine4(i)})
        Dim counts = From n In wordAllLinesTemp
        Group n By n Into Group
        Order By Group.Count() Descending
        Select Group.First
        correctLine = correctLine & counts.First & " "
    Next
    correctLine = correctLine.Remove(correctLine.Length - 1)
    Console.WriteLine(correctLine)
    Console.ReadKey()

End Sub
End Module

我的问题:我怎样才能使用不同数量的单词。我的意思是这里每行的长度是7个单词,for循环使用这个长度(长度为1)。假设第3行包含5个单词。

vb.net text-parsing text-extraction
1个回答
0
投票

编辑:意外地有正确的索引应该是最短的。

据我所知,你试图看哪条线最接近正确的线。

您可以使用以下代码获取levenshtein距离:

Public Function LevDist(ByVal s As String,
                                ByVal t As String) As Integer
    Dim n As Integer = s.Length
    Dim m As Integer = t.Length
    Dim d(n + 1, m + 1) As Integer

    If n = 0 Then
        Return m
    End If

    If m = 0 Then
        Return n
    End If

    Dim i As Integer
    Dim j As Integer

    For i = 0 To n
        d(i, 0) = i
    Next

    For j = 0 To m
        d(0, j) = j
    Next

    For i = 1 To n
        For j = 1 To m

            Dim cost As Integer
            If t(j - 1) = s(i - 1) Then
                cost = 0
            Else
                cost = 1
            End If

            d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1),
                               d(i - 1, j - 1) + cost)
        Next
    Next

    Return d(n, m)
End Function

然后,这将用于确定哪条线最接近:

    Dim correctLine As String = ""
    Dim line1 As String = "Canda has more than ones official language"
    Dim line2 As String = "Canada has more than one oficial languages"
    Dim line3 As String = "Canada has nore than one official lnguage"
    Dim line4 As String = "Canada has nore than one offical language"
    Dim lineArray As new ArrayList
    Dim countArray As new ArrayList

    lineArray.Add(line1)
    lineArray.Add(line2)
    lineArray.Add(line3)
    lineArray.Add(line4)

    For i = 0 To lineArray.Count - 1
        countArray.Add(LevDist(lineArray(i), correctLine))
    Next

    Dim shortest As Integer = Integer.MaxValue
    Dim correctIndex As Integer = 0
    For i = 0 To countArray.Count - 1
        If countArray(i) <= shortest Then
            correctIndex = i
            shortest = countArray(i)
        End If
    Next
    Console.WriteLine(lineArray(correctIndex))
© www.soinside.com 2019 - 2024. All rights reserved.