我正在尝试解决这个程序
“使用 urllib 复制之前的练习
(1) 从 URL 检索文档 (2) 显示最多 3000 个字符,并且 (3)统计文档中的字符总数。不要
担心此练习的标题,只需显示前 3000 个
文档内容的字符。”
这是我想出的结果,它给了我结果,但我想知道是否有一种方法可以在不使用列表的情况下做到这一点
import urllib.request, urllib.parse, urllib.error
user_url = input("Enter a link: ")
if len(user_url) < 1 : user_url = 'http://data.pr4e.org/romeo-full.txt'
try :
fhand = urllib.request.urlopen(user_url)
except :
print("Enter a proper URL", user_url)
quit()
lst = list()
count = 0
for line in fhand :
words = line.decode().split()
for word in words :
#print(word)
for char in word :
count = count + 1
lst.append(char)
print(lst[:3001])
print(count)
没有被接受的答案,为什么?对于“(2) 显示最多 3000 个字符”以及如果您想要获取文件长度“(3) 计算文档中的字符总数”有 2 个答案。 (这至少适用于 txt 文档)您可以使用下面的代码。
import urllib.request, urllib.parse, urllib.error
user_url = 'http://data.pr4e.org/romeo-full.txt'
fhand = urllib.request.urlopen(user_url)
result = fhand.read()
print(len(result))
这是稍微修改了ForceBru的代码。
你可以这样做:
fhand = urllib.request.urlopen(user_url)
result = fhand.read(3000) # read 3000 BYTES (since it's not specified what a 'character' is)
或者读取所有内容,解码并输出 3000 个字符:
result = fhand.read().decode()[:3000] # note that whitespace is a character too
如果您需要避免字符中的空格并仅保留 3000 个字符,这是一种方法。
char_count = 30
curr_char_count = 0
complete_str = ""
for line in fhand :
new_line = line.decode().replace(" ", "")
if len(complete_str) + len(new_line) <= char_count:
complete_str = complete_str + new_line
else:
complete_str = complete_str + new_line[:((len(complete_str)+len(new_line)) - char_count)]
break
print complete_str
因为我们只需要打印完整的文件,如果它是 <= 3000 we can specify the length (in bytes) to be read and print.
import urllib.request, urllib.parse, urllib.error
url = input("Type the full url you want to connect: ")
fhand = urllib.request.urlopen(url)
content = fhand.read()
print(content[:3001].decode().strip())
print("\nDocument length is {}".format(len(content)))
import urllib.request, urllib.parse, urllib.error
# ask the user for the url
user_url = input('Enter a url: ')
# urlopen() makes the connection to the webserver(HOST)
# encodes the HTTP REQUEST
# sends the HTTP REQUEST
# it retrieves the header but keep them for us somewhere else
# returns an object that is like a filehandle
try:
file_handle = urllib.request.urlopen(user_url)
except:
print('Invalid URL.')
quit()
# read each line
# lines are in bytes UTF-8
# so, we MANUALLY HAVE TO DECODE THEM TO UNICODE
data_count = 0
for line in file_handle:
data_count += len(line)
if data_count <= 3000:
print(line.decode().strip())
print(data_count)