用BS4整理已刮列表vs预设数据问题(为什么它不能正常工作?)

问题描述 投票:0回答:1

我正在尝试将这些抓取的数据保存到文件中(对其进行点刺),但是我无法弄清楚为什么我无法使用此代码来对其进行酸洗:

url = "https://www.imdb.com/list/ls016522954/?ref_=nv_tvv_dvd"

req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()
webpage = web_byte.decode('utf-8')
html_soup = BeautifulSoup(webpage, 'html5lib')
dvdNames = html_soup.find_all("div", class_="lister-item-content")
for dvd in dvdNames:
    dvdArray.append(dvd.a.string)
viewtitles = input("Finished!, do you want to view the DVD titles? (Y/N): ")
if viewtitles == "y".casefold():
    num = 1
    for name in dvdArray:
        print(""+ str(num) + " - " + name)
        num += 1
elif viewtitles == "n".casefold():
    print("Not Showing TItles!")
else:
    print("that is not an option!")
saveToFile = input("Do you want to save / update the data? (Y/N): ")
if saveToFile == "y".casefold():
    with open("IMDBDVDNames.dat", "wb") as f:
        pickle.dump(dvdArray, f)
        continue
elif saveToFile == "n".casefold():
    print("Data Not Saved!")
    continue
else:
    print("That's not one of the option!")
    continue

我尝试添加sys.setrecursionlimit(1000000),并且没有任何区别(FYI),并且收到此错误“在腌制对象时超出了最大递归深度”,但是当我运行此代码时:

import pickle

testarray = []

if input("1 or 2?: ") == "1":
    testarray = ['1917', 'Onward', 'The Hunt', 'The Invisible Man', 'Human Capital', 'Dolittle', 'Birds of Prey: And the Fantabulous Emancipation of One Harley Quinn', 'The Gentlemen', 'Bloodshot', 'The Way Back', 'Clemency', 'The Grudge', 'I Still Believe', 'The Song of Names', 'Treadstone', 'Vivarium', 'Star Wars: Episode IX - The Rise of Skywalker', 'The Current War', 'Downhill', 'The Call of the Wild', 'Resistance', 'Banana Split', 'Bad Boys for Life', 'Sonic the Hedgehog', 'Mr. Robot', 'The Purge', 'VFW', 'The Other Lamb', 'Slay the Dragon', 'Clover', 'Lazy Susan', 'Rogue Warfare: The Hunt', 'Like a Boss', 'Little Women', 'Cats', 'Madam Secretary', 'Escape from Pretoria', 'The Cold Blue', 'The Night Clerk', 'Same Boat', 'The 420 Movie: Mary & Jane', 'Manou the Swift', 'Gold Dust', 'Sea Fever', 'Miles Davis: Birth of the Cool', 'The Lost Husband', 'Stray Dolls', 'Mortal Kombat Legends: Scorpions Revenge', 'Just Mercy', 'The Righteous Gemstones', 'Criminal Minds', 'Underwater', 'Final Kill', 'Green Rush', 'Butt Boy', 'The Quarry', 'Abe', 'Bad Therapy', 'Yip Man 4', 'The Last Full Measure', 'Looking for Alaska', 'The Turning', 'True History of the Kelly Gang', 'To the Stars', 'Robert the Bruce', 'Papa, sdokhni', 'The Rhythm Section', 'Arrow', 'The Assistant', 'Guns Akimbo', 'The Dark Red', 'Dreamkatcher', 'Fantasy Island', 'The Etruscan Smile', "A Nun's Curse", 'Allagash']
    with open("test.dat", "wb") as f:
        pickle.dump(testarray, f)
else:
    with open("test.dat", "rb") as f:
        testarray = pickle.load(f)

print(testarray)

具有完全相同的(至少我希望它是相同的,我做了一个print(dvdArray)并以FYI的方式获得了列表)信息,但是当我这样做时,它将让我腌制它

有人可以让我知道为什么以及如何解决它吗?

我知道我正在从网站上抓取数据并将其转换为列表,但无法确定是什么导致了示例1与示例2中的错误

任何帮助将不胜感激

谢谢,

lttlejiver

python list beautifulsoup pickle
1个回答
0
投票

如果有人好奇,我在添加dvdArray时添加了“ strip()”,就可以了!

dvdArray.append(dvd.a.string.strip())
© www.soinside.com 2019 - 2024. All rights reserved.