更清晰的方法来检查字符串是否是Java中ISO语言的ISO国家

问题描述 投票:12回答:2

假设有两个字符的String,它应代表ISO 639国家或语言名称。

你知道,Locale类有两个函数getISOLanguagesgetISOCountries,它们分别返回所有ISO语言和ISO国家的String数组。

要检查特定的String对象是否是有效的ISO语言或ISO国家,我应该在该数组中查找匹配的String。好的,我可以通过使用二进制搜索(例如Arrays.binarySearch或ApacheCommons ArrayUtils.contains)来做到这一点。

问题是:存在任何提供更清洁方式的实用程序(例如来自Guava或Apache Commons库),例如:返回boolean以将String验证为有效的ISO 639语言或ISO 639国家/地区的函数?

例如:

public static boolean isValidISOLanguage(String s)
public static boolean isValidISOCountry(String s)
java validation guava apache-commons iso
2个回答
20
投票

我不打算使用二进制搜索或任何第三方库 - HashSet可以这样做:

public final class IsoUtil {
    private static final Set<String> ISO_LANGUAGES = new HashSet<String>
        (Arrays.asList(Locale.getISOLanguages()));
    private static final Set<String> ISO_COUNTRIES = new HashSet<String>
        (Arrays.asList(Locale.getISOCountries()));

    private IsoUtil() {}

    public static boolean isValidISOLanguage(String s) {
        return ISO_LANGUAGES.contains(s);
    }

    public static boolean isValidISOCountry(String s) {
        return ISO_COUNTRIES.contains(s);
    }
}

你可以先检查一下字符串的长度,但我不确定我是否会打扰 - 至少除非你想要保护自己免受性能攻击的影响,否则你需要花费很长时间才能获得大量的字符串。

编辑:如果你想使用第三方库,ICU4J是最有可能的竞争者 - 但它可能有一个比Locale支持的更新的列表,所以你想要移动到任何地方使用ICU4J ,可能。


0
投票

到目前为止我知道在任何库中都没有任何这样的方法,但至少你可以自己声明:

import static java.util.Arrays.binarySearch;
import java.util.Locale;

/**
 * Validator of country code.
 * Uses binary search over array of sorted country codes.
 * Country code has two ASCII letters so we need at least two bytes to represent the code.
 * Two bytes are represented in Java by short type. This is useful for us because we can use Arrays.binarySearch(short[] a, short needle)
 * Each country code is converted to short via countryCodeNeedle() function.
 *
 * Average speed of the method is 246.058 ops/ms which is twice slower than lookup over HashSet (523.678 ops/ms).
 * Complexity is O(log(N)) instead of O(1) for HashSet.
 * But it consumes only 520 bytes of RAM to keep the list of country codes instead of 22064 (> 21 Kb) to hold HashSet of country codes.
 */
public class CountryValidator {
  /** Sorted array of country codes converted to short */
  private static final short[] COUNTRIES_SHORT = initShortArray(Locale.getISOCountries());

  public static boolean isValidCountryCode(String countryCode) {
    if (countryCode == null || countryCode.length() != 2 || countryCodeIsNotAlphaUppercase(countryCode)) {
      return false;
    }
    short needle = countryCodeNeedle(countryCode);
    return binarySearch(COUNTRIES_SHORT, needle) >= 0;
  }

  private static boolean countryCodeIsNotAlphaUppercase(String countryCode) {
    char c1 = countryCode.charAt(0);
    if (c1 < 'A' || c1 > 'Z') {
      return true;
    }
    char c2 = countryCode.charAt(1);
    return c2 < 'A' || c2 > 'Z';
  }

  /**
   * Country code has two ASCII letters so we need at least two bytes to represent the code.
   * Two bytes are represented in Java by short type. So we should convert two bytes of country code to short.
   * We can use something like:
   * short val = (short)((hi << 8) | lo);
   * But in fact very similar logic is done inside of String.hashCode() function.
   * And what is even more important is that each string object already has cached hash code.
   * So for us the conversion of two letter country code to short can be immediately.
   * We can relay on String's hash code because it's specified in JLS
   **/
  private static short countryCodeNeedle(String countryCode) {
    return (short) countryCode.hashCode();
  }

  private static short[] initShortArray(String[] isoCountries) {
    short[] countriesShortArray = new short[isoCountries.length];
    for (int i = 0; i < isoCountries.length; i++) {
      String isoCountry = isoCountries[i];
      countriesShortArray[i] = countryCodeNeedle(isoCountry);
    }
    return countriesShortArray;
  }
}

Locale.getISOCountries()将始终创建一个新数组,因此我们应将其存储到静态字段中以避免不必要的分配。与此同时,HashSetTreeSet消耗大量内存,因此该验证器将在阵列上使用二进制搜索。这是速度和记忆之间的权衡。

© www.soinside.com 2019 - 2024. All rights reserved.