假设有两个字符的String
,它应代表ISO 639国家或语言名称。
你知道,Locale
类有两个函数getISOLanguages
和getISOCountries
,它们分别返回所有ISO语言和ISO国家的String
数组。
要检查特定的String
对象是否是有效的ISO语言或ISO国家,我应该在该数组中查找匹配的String
。好的,我可以通过使用二进制搜索(例如Arrays.binarySearch
或ApacheCommons ArrayUtils.contains
)来做到这一点。
问题是:存在任何提供更清洁方式的实用程序(例如来自Guava或Apache Commons库),例如:返回boolean
以将String
验证为有效的ISO 639语言或ISO 639国家/地区的函数?
例如:
public static boolean isValidISOLanguage(String s)
public static boolean isValidISOCountry(String s)
我不打算使用二进制搜索或任何第三方库 - HashSet
可以这样做:
public final class IsoUtil {
private static final Set<String> ISO_LANGUAGES = new HashSet<String>
(Arrays.asList(Locale.getISOLanguages()));
private static final Set<String> ISO_COUNTRIES = new HashSet<String>
(Arrays.asList(Locale.getISOCountries()));
private IsoUtil() {}
public static boolean isValidISOLanguage(String s) {
return ISO_LANGUAGES.contains(s);
}
public static boolean isValidISOCountry(String s) {
return ISO_COUNTRIES.contains(s);
}
}
你可以先检查一下字符串的长度,但我不确定我是否会打扰 - 至少除非你想要保护自己免受性能攻击的影响,否则你需要花费很长时间才能获得大量的字符串。
编辑:如果你想使用第三方库,ICU4J是最有可能的竞争者 - 但它可能有一个比Locale
支持的更新的列表,所以你想要移动到任何地方使用ICU4J ,可能。
到目前为止我知道在任何库中都没有任何这样的方法,但至少你可以自己声明:
import static java.util.Arrays.binarySearch;
import java.util.Locale;
/**
* Validator of country code.
* Uses binary search over array of sorted country codes.
* Country code has two ASCII letters so we need at least two bytes to represent the code.
* Two bytes are represented in Java by short type. This is useful for us because we can use Arrays.binarySearch(short[] a, short needle)
* Each country code is converted to short via countryCodeNeedle() function.
*
* Average speed of the method is 246.058 ops/ms which is twice slower than lookup over HashSet (523.678 ops/ms).
* Complexity is O(log(N)) instead of O(1) for HashSet.
* But it consumes only 520 bytes of RAM to keep the list of country codes instead of 22064 (> 21 Kb) to hold HashSet of country codes.
*/
public class CountryValidator {
/** Sorted array of country codes converted to short */
private static final short[] COUNTRIES_SHORT = initShortArray(Locale.getISOCountries());
public static boolean isValidCountryCode(String countryCode) {
if (countryCode == null || countryCode.length() != 2 || countryCodeIsNotAlphaUppercase(countryCode)) {
return false;
}
short needle = countryCodeNeedle(countryCode);
return binarySearch(COUNTRIES_SHORT, needle) >= 0;
}
private static boolean countryCodeIsNotAlphaUppercase(String countryCode) {
char c1 = countryCode.charAt(0);
if (c1 < 'A' || c1 > 'Z') {
return true;
}
char c2 = countryCode.charAt(1);
return c2 < 'A' || c2 > 'Z';
}
/**
* Country code has two ASCII letters so we need at least two bytes to represent the code.
* Two bytes are represented in Java by short type. So we should convert two bytes of country code to short.
* We can use something like:
* short val = (short)((hi << 8) | lo);
* But in fact very similar logic is done inside of String.hashCode() function.
* And what is even more important is that each string object already has cached hash code.
* So for us the conversion of two letter country code to short can be immediately.
* We can relay on String's hash code because it's specified in JLS
**/
private static short countryCodeNeedle(String countryCode) {
return (short) countryCode.hashCode();
}
private static short[] initShortArray(String[] isoCountries) {
short[] countriesShortArray = new short[isoCountries.length];
for (int i = 0; i < isoCountries.length; i++) {
String isoCountry = isoCountries[i];
countriesShortArray[i] = countryCodeNeedle(isoCountry);
}
return countriesShortArray;
}
}
Locale.getISOCountries()
将始终创建一个新数组,因此我们应将其存储到静态字段中以避免不必要的分配。与此同时,HashSet
或TreeSet
消耗大量内存,因此该验证器将在阵列上使用二进制搜索。这是速度和记忆之间的权衡。