我有一个带有4k行url的csv文件。我想获取每个网址的NetLoc并绘制一个条形图,显示该网址中排名前5位的Netloc。
#i have used this code
!pip install wget
link_to_data = 'https://github.com/tulip-lab/sit742/raw/master/Assessment/2020/data/JobPostings.csv'
DataSet = wget.download(link_to_data)
from urllib.parse import urlparse
text = pd.read_csv('JobPostings.csv')
a= pd.DataFrame(text['url'])
pars=[]
for i in a:
c = urlparse(i)
x = c.netloc
pars = x
print(pars)
我是stackoverflow和python的新手。抱歉,如果我没有遵循任何行为准则提问。
先谢谢您。
这应该运行没有错误。与您的代码相比,我在解释不同的行,作为注释。
import wget
from urllib.parse import urlparse
import pandas as pd # You had not imported pandas prior to using it!
link_to_data = 'https://github.com/tulip-lab/sit742/raw/master/Assessment/2020/data/JobPostings.csv'
DataSet = wget.download(link_to_data)
print('Downloaded') # It is optional , but is good to know when the files has been downloaded
text = pd.read_csv('JobPostings.csv')
a= pd.DataFrame(text['url'])
pars=[]
for i in range(len(a)):
c = urlparse(a.at[i,'url']) # a.at[i,'url'] is getting the url element
x = c.netloc
pars.append(x) # the append() function appends the element x in the list pars
print(pars)