或者从java中查询mongodb，同时使用“like”和“line break”和“case insensitive”

Question

这是我的mongodb集合page_link_titles中的一个文档示例：

{
    "_id" : ObjectId("553b11f30b81511d64152416"),
    "id" : 36470831,
    "linkTitles" : [ 
        "Syrian civil war", 
        "Damascus", 
        "Geographic coordinate system", 
        "Bashar al-Assad", 
        "Al Jazeera English", 
        "Free Syrian Army", 
        ...

        "February 2012 Aleppo bombings", 
        "2012 Deir ez-Zor bombing", 
        "Aleppo University bombings"
    ]
}

我想找到所有文件，他们的linkTitles中的文字包含一个短语，如'%term1%'或'%term2%'或（等等）。 term1和term2必须在两边都有换行符。例如，查看"Syrian civil war"。如果term1 = "war"我想要将此文档作为查询结果返回，但是如果term1 = "yria"是本文档中单词的一部分，则不应返回。

这是我的java代码：

for (String term : segment.terms) {
    DBObject clause1 = new BasicDBObject("linkTitles",
            java.util.regex.Pattern.compile("\\b"
                    + stprocess.singularize(term) + "\\b"));
    or.add(clause1);
}

DBObject mongoQuery = new BasicDBObject("$or", or);
DBCursor cursor = pageLinks.find(mongoQuery);

在线：java.util.regex.Pattern.compile("\\b"+ stprocess.singularize(term) + "\\b"));我只假设换行。我不知道如何编写正则表达式来考虑我的所有条件：line break，case insensitive，like。

有任何想法吗？

Answer 1

可以做一个能够达到你想要的正则表达式。您也可以使用单个正则表达式而不是使用$or。

我正在使用shell作为一个快速示例，并希望搜索boxer或cat。首先插入测试数据：

db.test.drop()
db.test.insert([
{ "a" : "Boxer One" },
{ "a" : "A boxer dog" },
{ "a" : "A box shouldn't match" },
{ "a" : "should match BOXER" },
{ "a" : "wont match as this it the plural BOXERs" },
{ "a" : "also match on cat" }])

使用以下正则表达式，我们可以搜索所有术语：

                                       
      /(^|\b)(boxer|cat)(\b|$)/i       
       +---+ +-------+  +---+         
          |       |        |           
          |       |        |           
   Start or space |       Space or end 
                  |                    
              Search terms

做一个像这样的发现：

db.test.find({a: /(^|\b)(boxer|cat)(\b|$)/i})

该查询将返回以下结果：

{ "_id" : ObjectId("555f18eee7b6d1b7e622de36"), "a" : "Boxer One" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de37"), "a" : "A boxer dog" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de39"), "a" : "should match BOXER" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de3b"), "a" : "also match on cat" }

在Java中，您可以像这样构建此查询：

StringBuilder singularizedTerms = new StringBuilder();
for (String term : terms) {
    singularizedTerms.append("|").append(stprocess.singularize(term));
}
String regexPattern = format("(^|\\b)(%s)(\\b|$)", singularizedTerms.substring(1));
Pattern regex = Pattern.compile(regexPattern, Pattern.CASE_INSENSITIVE);

这种方法存在两个问题。

它会很慢它不能使用索引所以会对集合进行全面扫描，如果你有1000万个文件，它会检查每个！
它不会匹配复数例如它与包含“BOXER”的文档不匹配，因为我们的正则表达式明确不允许部分匹配！

Text indexes支持这一点。使用索引可以使操作更快，也可以匹配多个或单个值，例如：

db.test.createIndex( { a: "text" } )
db.test.find({ $text: { $search: "boxer cat"}})

{ "_id" : ObjectId("555f18eee7b6d1b7e622de3b"), "a" : "also match on cat" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de3a"), "a" : "wont match as this it the plural BOXERs" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de36"), "a" : "Boxer One" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de37"), "a" : "A boxer dog" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de39"), "a" : "should match BOXER" }

或者从java中查询mongodb，同时使用“like”和“line break”和“case insensitive”

问题描述投票：0回答：1

1个回答

最新问题

或者从java中查询mongodb，同时使用“like”和“line break”和“case insensitive”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1