显示特定数量的字符

问题描述 投票:0回答:5

我正在尝试解决这个程序

“使用 urllib 复制之前的练习

(1) 从 URL 检索文档 (2) 显示最多 3000 个字符,并且 (3)统计文档中的字符总数。不要

担心此练习的标题,只需显示前 3000 个

文档内容的字符。”

这是我想出的结果,它给了我结果,但我想知道是否有一种方法可以在不使用列表的情况下做到这一点

import urllib.request, urllib.parse, urllib.error


user_url = input("Enter a link: ")
if len(user_url) < 1 : user_url = 'http://data.pr4e.org/romeo-full.txt'
try :
fhand = urllib.request.urlopen(user_url)
except :
    print("Enter a proper URL", user_url)
    quit()

lst = list()
count = 0
for line in fhand :
    words = line.decode().split()
    for word in words :
        #print(word)
        for char in word :
            count = count + 1
            lst.append(char)
print(lst[:3001])
print(count)
python urllib
5个回答
0
投票

没有被接受的答案,为什么?对于“(2) 显示最多 3000 个字符”以及如果您想要获取文件长度“(3) 计算文档中的字符总数”有 2 个答案。 (这至少适用于 txt 文档)您可以使用下面的代码。

import urllib.request, urllib.parse, urllib.error

user_url = 'http://data.pr4e.org/romeo-full.txt'
fhand = urllib.request.urlopen(user_url)
result = fhand.read() 
print(len(result))

这是稍微修改了ForceBru的代码。


0
投票

你可以这样做:

fhand = urllib.request.urlopen(user_url)
result = fhand.read(3000) # read 3000 BYTES (since it's not specified what a 'character' is)

或者读取所有内容,解码并输出 3000 个字符:

result = fhand.read().decode()[:3000] # note that whitespace is a character too

0
投票

如果您需要避免字符中的空格并仅保留 3000 个字符,这是一种方法。

char_count = 30
curr_char_count = 0
complete_str = ""
for line in fhand :
    new_line = line.decode().replace(" ", "")
    if len(complete_str) + len(new_line) <= char_count:
        complete_str = complete_str + new_line
    else:
        complete_str = complete_str + new_line[:((len(complete_str)+len(new_line)) - char_count)]
        break

print complete_str

0
投票

因为我们只需要打印完整的文件,如果它是 <= 3000 we can specify the length (in bytes) to be read and print.

import urllib.request, urllib.parse, urllib.error

url = input("Type the full url you want to connect: ")
fhand = urllib.request.urlopen(url)
content = fhand.read()

print(content[:3001].decode().strip()) 

print("\nDocument length is {}".format(len(content)))

0
投票
import urllib.request, urllib.parse, urllib.error

# ask the user for the url
user_url = input('Enter a url: ')


# urlopen() makes the connection to the webserver(HOST)
# encodes the HTTP REQUEST
# sends the HTTP REQUEST
#  it retrieves the header but keep them for us somewhere else
# returns an object that is like a filehandle
try:
    file_handle = urllib.request.urlopen(user_url)
except:
    print('Invalid URL.')
    quit()


# read each line
# lines are in bytes UTF-8
# so, we MANUALLY HAVE TO DECODE THEM TO UNICODE
data_count = 0
for line in file_handle:
    data_count += len(line)
    if data_count <= 3000:
        print(line.decode().strip())

print(data_count)
© www.soinside.com 2019 - 2024. All rights reserved.