Ruby regexp - 使用独特的正则表达式从不同的URL获取facebook视频ID

问题描述 投票:0回答:3

我想从可能不同的URL中提取视频ID

https://www.facebook.com/{page-name}/videos/{video-id}/
https://www.facebook.com/{username}/videos/{video-id}/
https://www.facebook.com/video.php?id={video-id}
https://www.facebook.com/video.php?v={video-id}

如何使用单个ruby正则表达式检索视频ID?

我没有设法将其转换为Ruby正则表达式,但我(部分)设法在标准JS正则表达式中编写它:

^(https?://www\.facebook\.com/(?:video\.php\?v=\d+|.*?/videos/\d+))$

当我在Ruby中运行以下代码时,它给出了一个错误:

text = "https://www.facebook.com/pili.morillo.56/videos/352355988613922/"
id = text.gsub( ^(https?://www\.facebook\.com/(?:video\.php\?v=\d+|.*?/videos/\d+))$ )
ruby-on-rails ruby regex
3个回答
1
投票

这是我提出的正则表达式:/(?<=\/videos\/)\d+?(?=\/|$)|(?<=[?&]id=)\d+?(?=&|$)|(?<=[?&]v=)\d+?(?=&|$)/

打破这一点,我们可以得到这个:

(?<=\/videos\/)\d+(?=\/|$)|
(?<=[?&]id=)\d+(?=&|$)|
(?<=[?&]v=)\d+(?=&|$)

三个选项中的每一个都遵循以下简单结构:(?<=beforeMatch)target(?=afterMatch)。这是第一个作为例子:

(?<=\/videos\/) # Positive lookbehind
\d+             # Matching the digits
(?=\/|$)        # Positive lookahead

所以,这意味着,匹配qazxsw poi任何数字,只要它之前是qazxsw poi,然后是\d+,或者它就是行的结尾。

因此,我们可以匹配'id =','v ='或'videos /'。

完整的解释:

\/videos\/

其中'EOL'表示行尾。


0
投票
\/

0
投票

您可以使用:

(?<=\/videos\/) # Match as long as preceeded by '\/videos\/' \d+ # Matching the id digits (?=\/|$) # As long as it's followed by '\/' or the EOL | # Or (?<=[?&]id=) # Match as long as preceeded by '?id' or '&id' \d+ # Matching the id digits (?=&|$) # As long as it's followed by either '&' or the EOL | # Or (?<=[?&]v=) # Match as long as preceeded by '?v' or '&v' \d+ # Matching the id digits (?=&|$) # As long as it's followed by either '&' or the EOL

那会匹配

字符串的开头并开始url

RE = %r[https://www.facebook.com/(?:.+?/)?video(?:.*?[/=])(.+?)(?:/?\z)] %w[ https://www.facebook.com/{page-name}/videos/{video-id}/ https://www.facebook.com/{username}/videos/{video-id}/ https://www.facebook.com/video.php?id={video-id} https://www.facebook.com/video.php?v={video-id} ].map { |url| url[RE, 1] } #⇒ ["{video-id}", "{video-id}", "{video-id}", "{video-id}"]

其次是:

.*?          # Match any character zero or more times
video        # Match video
(?:          # Non capturing group
  s          # Match s
  |          # Or
  \.php      # Match .php
  .*?        # Match any character zero or more times         
  [?&]       # Match ? or &
  (?:id|v)=  # Match id or v in non capturing group followed by =
)            # Close non capturing group
\/?          # Match optional /
(            # Capturing group (group 1)
  [^\/&\n]+  # Match not / or & or newline
)            # Close capturing group
.*           # Match any character zero or more times
$            # End of the string
^https?:\/\/www\.facebook\.com\/.*?video(?:s|\.php.*?[?&](?:id|v)=)\/?([^\/&\n]+).*$

这将导致:^https?:\/\/www\.facebook\.com\/

text = "https://www.facebook.com/pili.morillo.56/videos/352355988613922/" id = text.gsub(/^https?:\/\/www\.facebook\.com\/.*?video(?:s|\.php.*?[?&](?:id|v)=)\/?([^\/&\n]+).*$/, "\\1") puts id

© www.soinside.com 2019 - 2024. All rights reserved.