无法从存储在csv中的URL获取NetLock

问题描述 投票:0回答:1

我有一个带有4k行url的csv文件。我想获取每个网址的NetLoc并绘制一个条形图,显示该网址中排名前5位的Netloc。

#i have used this code
!pip install wget
link_to_data = 'https://github.com/tulip-lab/sit742/raw/master/Assessment/2020/data/JobPostings.csv'
DataSet = wget.download(link_to_data)
from urllib.parse import urlparse
text = pd.read_csv('JobPostings.csv')
a= pd.DataFrame(text['url'])
pars=[]
for i in a:
  c = urlparse(i)
  x = c.netloc
  pars = x
print(pars)

我是stackoverflow和python的新手。抱歉,如果我没有遵循任何行为准则提问。

先谢谢您。

python url tokenize
1个回答
0
投票

这应该运行没有错误。与您的代码相比,我在解释不同的行,作为注释。

import wget
from urllib.parse import urlparse
import pandas as pd # You had not imported pandas prior to using it!

link_to_data = 'https://github.com/tulip-lab/sit742/raw/master/Assessment/2020/data/JobPostings.csv'
DataSet = wget.download(link_to_data)
print('Downloaded') # It is optional , but is good to know when the files has been downloaded

text = pd.read_csv('JobPostings.csv')
a= pd.DataFrame(text['url'])

pars=[]
for i in range(len(a)):
    c = urlparse(a.at[i,'url']) # a.at[i,'url'] is getting the url element
    x = c.netloc
    pars.append(x) # the append() function appends the element x in the list pars
print(pars)
© www.soinside.com 2019 - 2024. All rights reserved.