美丽的汤自动将字符串转换为时间格式吗?

问题描述 投票:1回答:1

我正在尝试从网站中抓取具有“时间”信息的div(使用beautifulsoup +硒):

options = webdriver.ChromeOptions() 
options.add_argument('--no-sandbox')
options.add_argument('--window-size=1420,1080')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument("--disable-notifications")
options.add_experimental_option('useAutomationExtension', False)
options.binary_location='/usr/bin/google-chrome-stable'
chrome_driver_binary = "/usr/bin/chromedriver"
driver = webdriver.Chrome(chrome_driver_binary, 
chrome_options=options)

#Set base url (San Francisco)
base_url = 'https://www.bandsintown.com/?place_id=ChIJIQBpAG2ahYAR_6128GcTUEo&page='


events = []
eventContainerBucket = []

for i in range(1,35):
    #cycle through pages in range
    driver.get(base_url + str(i))
    pageURL = base_url + str(i)
    print(pageURL)

    # get events links
    event_list = driver.find_elements_by_css_selector('div[class^=_3buUBPWBhUz9KBQqgXm-gf] a[class^=_3UX9sLQPbNUbfbaigy35li]')
    # collect href attribute of events in even_list
    events.extend(list(event.get_attribute("href") for event in event_list))


# iterate through all events and open them.
item = {}
allEvents = []
for event in events:

      soup = bs(driver.find_element_by_css_selector('[class^=Y_sOCKLIZzxDZWauPTJlk]').get_attribute('outerHTML'))
      soup2 = bs(driver.find_element_by_css_selector('[class^=_2j34xcqD4slSOyTCMbA1dY]').get_attribute('outerHTML'))


        # Get time
        time = soup.select_one('img + div + div').text
        print(time)

我不希望将时间转换为UTC。我只想每次都提取原始文本,即9:00 PM。我已经尝试立即解析原始字符串,所以它只是抓取字符串:

time = soup.select_one('img + div + div').text
' '.join(time.split(' ')[0:2])
#time.replace('UTC','')

print(time)

但是它仍在使用UTC打印,即UTC凌晨2:00。

在将原始字符串自动转换为时间之前,是否有办法仅提取原始字符串?我不想处理时区,并且我认为我不需要进行此项目。只需要原始字符串。

python selenium beautifulsoup timezone utc
1个回答
0
投票

我不确定您为什么使用美丽汤select。您可以使用Selenium来获取元素的文本吗?

for event in events:
    # using locator from your example below, although it did not work for me
    element = driver.find_element_by_css_selector('[class^=Y_sOCKLIZzxDZWauPTJlk]')

    # Get time
    time = element.text
    print(time)

输出:

6:00 PM PDT

不确定这是您要找的东西,但希望对您有所帮助。祝你好运!

© www.soinside.com 2019 - 2024. All rights reserved.