页面上的元素需要内容,我试图在与Nokogiri解析后拉这里的element.content
["\n \n \n \n itemId[0]=1234;\n \n \n \n \n \n \n \n My Project: First Edition\n \n ", "\n \n \n \n itemId[1]=2345;\n \n \n \n \n \n \n \n My Second Edition\n \n ", "\n \n \n \n itemId[2]=1234;\n \n \n \n \n \n \n \n Third\n \n \n"]
我能够获得itemId[0]=1234
的qacxswpoi的RegEx,但我完全坚持如何获取内容的名称。有什么建议?也许我可以通过HTML解析Ruby?
给出这样的字符串:
(/itemId.\d+..\d{4}/)
你可以这样做:
s= "\n \n \n \n itemId[0]=1234;\n \n \n \n \n \n \n \n My Project: First Edition\n \n "
基本上你使用(或多或少)或现有表达式拉出m = s.match(/(itemId\[\d+\]=\d+);(.*)/m)
item = m[1]
# itemId[0]=1234
name = m[2].strip
# My Project: First Edition
部分,在多行模式下抓取其余字符串(itemId...
)((.*)
,以便/m
匹配换行符),然后剥离外面的有问题的空白使用.
的正则表达式。您不必构建一个无法读取的正则表达式,它可以执行您需要的所有操作,允许对匹配结果进行后处理,有时甚至可以鼓励。