如何通过相似度比较字符串而不忽略错别字?

问题描述 投票:2回答:2

我需要按接近度比较两个字符串,以防整个字符串上的string.equals失败,我需要始终比较名字,中间名和/或姓氏。

我已经找到了一些比较算法,但是他们都考虑了结果的拼写错误,因此我必须比较确切的输入。

示例:

  1. [Maria souza silva =玛丽亚·苏扎·席尔瓦=好的
  2. [Maria souza silva = Maria silva = OK
  3. [Maria souza silva = Maria Carvalho = Nok
  4. [Maria souza silva =安娜·苏扎·席尔瓦= Nok
  5. [Maria de souza silva = Maria de = Nok
  6. [Maria de souza silva = Maria souza = OK

我正在尝试这样的事情:

String name = "Maria da souza Silva";

String nameRequest = "Maria da Silva";

if(name.equalsIgnoreCase(nameRequest)){
    System.out.print("ok 0");
}

String[] names = name.split(" ");

int nameLenght = names.length-1;

if(nameRequest.startsWith(names[0])){
    System.out.println("ok 1, next");
} else {
    System.out.print("nok, stop");
}

if(nameRequest.endsWith(names[nameLenght])){
    System.out.print("ok 2");
}

结果是ok 1, nextok 2

名字和姓氏都可以,但是我需要比较中间名,而忽略诸如“ de / da”之类的名字。

java algorithm comparison
2个回答
1
投票

我本来打算使用纯正则表达式,也许有一种方法,但是此代码将使用first和last或first和middle并忽略de和da来产生您想要的结果。

private void checkName(String target, String source) {
    Pattern pattern = Pattern.compile("^(?<firstName>[^\\s]+)\\s((de|da)(\\s|$))?(?<otherName>.*)$");
    Matcher targetMatcher = pattern.matcher(target.trim().toLowerCase());
    Matcher sourceMatcher = pattern.matcher(source.trim().toLowerCase());
    if (!targetMatcher.matches() || !sourceMatcher.matches()) {
        System.out.println("Nok");
    }

    boolean ok = true;
    if (!sourceMatcher.group("firstName").equals(targetMatcher.group("firstName"))) {
        ok = false;
    } else {
        String[] otherSourceName = sourceMatcher.group("otherName").split("\\s");
        String[] otherTargetName = targetMatcher.group("otherName").split("\\s");

        int targetIndex = 0;
        for (String s : otherSourceName) {
            boolean hit = false;
            for (; targetIndex < otherTargetName.length; targetIndex++) {
                if (s.equals(otherTargetName[targetIndex])) {
                    hit = true;
                    break;
                }
            }
            if (!hit) {
                ok = false;
                break;
            }
        }
    }
    System.out.println(ok ? "ok" : "Nok");
}

对于您的示例,输出为:

ok
ok
Nok
Nok
Nok
ok

1
投票

您可以使用这样的正则表达式:

String firstName = "maria";
String lastName = "silva";

String regex = ("^" + firstName + "([ ].*[ ]|[ ])" + lastName + "$");

System.out.println("maria de silva".matches(regex));
System.out.println("maria silva".matches(regex));
System.out.println("maria deb".matches(regex));
System.out.println("a silva".matches(regex));
System.out.println("mariasilva".matches(regex));

true
true
false
false
false

正则表达式将在字符串的开头和末尾查找名字,在字符串的末尾和中间查找两个名字,中间两个字符或一个空格。

© www.soinside.com 2019 - 2024. All rights reserved.