如何在CLR函数中得到一个SQL字符串的整理？

Question

我正在用C#编写一个Levenshtein Distance函数来计算两个字符串之间的编辑距离。问题是，我想用不同的整理方式多次调用该方法，但只有一个整理方式能够通过SQL到CLR接口--那是数据库的默认整理方式。

下面是CLR函数的代码。

[SqlFunction(IsDeterministic = true, Name = "LevenshteinDistance")]
public static SqlInt64 Distance(SqlString textA, SqlString textB)
{
    // get a collation-aware comparer so string/character comparisons 
    // will match the inputs' specified collation
    var aCompareInfo = textA.CompareInfo;
    var compareOptions = ConvertCompareOptions(textA.SqlCompareOptions);
    var aLength = textA.Value.Length;
    var bLength = textB.Value.Length;

    // degenerate cases
    if (aCompareInfo.Compare(textA.Value, 0, aLength, textB.Value, 0, bLength, compareOptions) == 0) { return 0; }
    if (aLength == 0) { return bLength; }
    if (bLength == 0) { return aLength; }

    // create two work vectors of integer distances
    var previousDistances = new SqlInt64[Maximum(aLength, bLength) + 1];
    var currentDistances = new SqlInt64[Maximum(aLength, bLength) + 1];

    // initialize previousDistances (the previous row of distances)
    // this row is A[0][i]: edit distance for an empty textA
    // the distance is just the number of characters to delete from textB
    for (var i = 0; i < previousDistances.Length; i++)
    {
        previousDistances[i] = i;
    }

    for (var i = 0; i < aLength; i++)
    {
        // calculate currentDistances from the previous row previousDistances

        // first element of currentDistances is A[i+1][0]
        //   edit distance is delete (i+1) chars from textA to match empty textB
        currentDistances[0] = i + 1;

        // use formula to fill in the rest of the row
        for (var j = 0; j < bLength; j++)
        {
            var cost = (aCompareInfo.Compare(textA.Value, i, 1, textB.Value, j, 1, compareOptions) == 0) ? 0 : 1;
            currentDistances[j + 1] = Minimum(currentDistances[j] + 1, previousDistances[j + 1] + 1, previousDistances[j] + cost);
        }

        // copy currentDistances to previousDistances for next iteration
        for (var j = 0; j < previousDistances.Length; j++)
        {
            previousDistances[j] = currentDistances[j];
        }
    }

    return currentDistances[bLength];
}

将CLR组件部署到SQL Server（2008 R2）后，像这样调用它。

print dbo.LevenshteinDistance('abc' collate Latin1_General_CI_AI, 'ABC' collate Latin1_General_CI_AI)
print dbo.LevenshteinDistance('abc' collate Latin1_General_CS_AS_KS_WS, N'ABC' collate Latin1_General_CS_AS_KS_WS)

两次调用都返回0（0）。因为我为第二次调用指定了区分大小写的整理，所以我希望第二次调用返回三（3）。

在 SQL Server 中使用 CLR 函数，是否可以指定数据库默认值以外的整理方式，并在 CLR 函数中使用它们？如果可以，如何使用？

Answer 1

如何在CLR函数中获取一个SQL字符串的整理？

很遗憾，你不能。根据TechNet页面上的整理和CLR集成数据类型，在 "参数整理 "部分。

当您创建一个通用语言运行时（CLR）例程，并且与该例程绑定的CLR方法的一个参数是类型为 SqlString，SQL Server用包含调用例程的数据库的默认整理来创建参数的实例。如果一个参数不是一个 SqlType (例如： 字符串 而非 SqlString)，来自数据库的整理信息并不与参数相关联。

所以，你所看到的行为是关于 CompareInfo 和 SqlCompareOptions 的特性 textA 输入参数是，虽然不幸的是，令人沮丧的是，无法理解的是，至少与文档中所说的系统应该如何工作是一致的。

因此，你的解决方案是通过单独的输入参数来传递属性，这才是正确的做法（尽管你的真的应该使用SqlTypes的 SqlInt32 和 SqlBoolean ;-).

Answer 2

还有一种方法，如果你的解决方案不涉及大于4K的字符串，有些人可能认为更好。让你的数据类型为'对象'而不是SqlString。这相当于SQL_VARIANT。虽然变体会比标准类型产生更多的开销，但它们可以容纳任意整理的字符串。

SELECT dbo.ClrCollationTest(N'Anything' collate latin1_general_cs_as),
       dbo.ClrCollationTest(N'Anything' collate SQL_Latin1_General_CP1_CI_AS);

当CLR这样编码时，上面分别返回0和1。

public static SqlBoolean ClrCollationTest(object anything)
{
    if (anything is SqlString)
        return new SqlBoolean(((SqlString)anything).SqlCompareOptions.HasFlag(SqlCompareOptions.IgnoreCase));
    else throw new ArgumentException(anything.GetType().Name + " is not a valid parameter data type.  SqlString is required.");
}

Answer 3

在互联网上没有看到任何替代方案，也没有看到对这个问题的回答，我决定将所需的整理属性指定为函数参数，并选择了一个... CultureInfo 对象和 CompareOptions 基于输入或从数据库传入的默认整理。

[SqlFunction(IsDeterministic = true, Name = "LevenshteinDistance")]
public static SqlInt64 Distance(SqlString textA, SqlString textB, int? lcid, bool? caseInsensitive, bool? accentInsensitive, bool? kanaInsensitive, bool? widthInsensitive)
{
    // get a collation-aware comparer so string/character comparisons 
    // will match the inputs' specified collation
    //var aCompareInfo = textA.CompareInfo;
    var aCompareInfo = CultureInfo.GetCultureInfo(lcid ?? textA.LCID).CompareInfo;
    //var compareOptions = ConvertCompareOptions(textA.SqlCompareOptions);
    var compareOptions = GetCompareOptions(caseInsensitive, accentInsensitive, kanaInsensitive, widthInsensitive);

    // ...  more code ...

    // first comparison
    if (aCompareInfo.Compare(textA.Value, 0, aLength, textB.Value, 0, bLength, compareOptions) == 0) { return 0; }

    // ...  more code ...

    var cost = (aCompareInfo.Compare(textA.Value, i, 1, textB.Value, j, 1, compareOptions) == 0) ? 0 : 1;

    // ...  more code ...
}

private static CompareOptions GetCompareOptions(bool? caseInsensitive, bool? accentInsensitive, bool? kanaInsensitive, bool? widthInsensitive)
{
    var compareOptions = CompareOptions.None;

    compareOptions |= (caseInsensitive ?? false) ? CompareOptions.IgnoreCase : CompareOptions.None;
    compareOptions |= (accentInsensitive ?? false) ? CompareOptions.IgnoreNonSpace : CompareOptions.None;
    compareOptions |= (kanaInsensitive ?? false) ? CompareOptions.IgnoreKanaType : CompareOptions.None;
    compareOptions |= (widthInsensitive ?? false) ? CompareOptions.IgnoreWidth : CompareOptions.None;

    return compareOptions;
}

更新了我的汇编和UDF声明后，我可以这样调用函数。

print dbo.LevenshteinDistance('abc', 'ABC', null, 1, 1, 1, 1)
print dbo.LevenshteinDistance('abc', 'ABC', null, 0, 0, 0, 0)

现在第一次调用返回0（数据库默认文化，一切不敏感）而第二次调用返回3（数据库默认文化，一切敏感）。

如何在CLR函数中得到一个SQL字符串的整理？

问题描述投票：2回答：3

3个回答

最新问题

如何在CLR函数中得到一个SQL字符串的整理？

问题描述 投票：2回答：3

3个回答

最新问题

问题描述投票：2回答：3