我有一个文本列,其中包含这样的值
text2 (https://example.foo.com/url/path/site.js:1:44541)
text2 (https://example.foo.com/url/path/site.js:1:44541)
text2 https://example.foo.com/url/path/site.js:1:44541:1:25413)
text2 (https://exampel2.fooo.io/unique/bundle.js:2:172704)
text2 (ttps://exampel2.fooo.io/unique/bundle.js:2:173513)
text2 https://exampel2.fooo.io/unique/bundle.js:2:171673
text3 (https://exampel2.fooo.io/unique/bundle.js:2:171651)
我试图仅从上面推断主机名和路径名,这样我就剩下:
https://example.foo.com/url/path/site.js
https://exampel2.fooo.io/unique/bundle.js
我无法找到或创建一个正则表达式来满足我的需求。任何帮助将不胜感激。
数据
CREATE TABLE yourtable(
column1 VARCHAR(100) NOT NULL
);
INSERT INTO yourtable(column1) VALUES
('text2 (https://example.foo.com/url/path/site.js:1:44541)'),
('text2 (https://example.foo.com/url/path/site.js:1:44541)'),
('text2 https://example.foo.com/url/path/site.js:1:44541:1:25413)'),
('text2 (https://exampel2.fooo.io/unique/bundle.js:2:172704)'),
('text2 (ttps://exampel2.fooo.io/unique/bundle.js:2:173513)'),
('text2 https://exampel2.fooo.io/unique/bundle.js:2:171673'),
('text3 (https://exampel2.fooo.io/unique/bundle.js:2:171651)');
从列中删除文本#,如下
SELECT regexp_replace(column1, 'text\d+\s*', '', 'g') AS cleaned_column
FROM yourtable
然后删除
()
regexp_replace(cleaned_column, '()', '', 'g')
然后使用后删除
:
regexp_replace(cleaned_column, ':\d+.*', '', 'g')
完整的灵魂
select
regexp_replace(regexp_replace(cleaned_column, '()', '', 'g'), ':\d+.*', '', 'g')
from (
SELECT regexp_replace(column1, 'text\d+\s*', '', 'g') AS cleaned_column
FROM yourtable) a
或更难
select
regexp_replace(regexp_replace( regexp_replace(column1, 'text\d+\s*', '', 'g'), '[()]', '', 'g'), ':\d+.*', '', 'g')
FROM yourtable