我正在尝试使用tesseract对我的Twitter监视器实施ocr。我的问题是:如何从用户那里获取图像并立即运行ocr。我正在监视某些Twitter帐户的最新推文,如果有新推文并包含URL,则我正在浏览器中打开它,现在我想检查推文中是否还有图像,并在控制台中打印内容。我的代码如下:
import tweepy
import re
import webbrowser
import time
import urllib
from datetime import datetime
# a bunch of access keys
keys = [(example_keys)]
# which key is in use right now
key_index = 0
test = 0
url_store = ''
# Function to extract url from newest tweet
def get_tweets(username, tweet_mode='extended'):
# Authorization to consumer key and consumer secret
auth = tweepy.OAuthHandler(keys[key_index][0], keys[key_index][1])
# Access to user's access key and access secret
auth.set_access_token(keys[key_index][2], keys[key_index][3])
# Calling api
api = tweepy.API(auth)
# try to get latest tweet until rate limit is reached
try:
# Get newest tweet from profile
tweets = api.user_timeline(screen_name=username, count=1)
tweet = [tweet.text for tweet in tweets][0]
print(tweet)
global url_store
# regex through tweet for url
url = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', str(tweet))
# check if url was found and isn't the same as the url from the last tweet
if (url!=[] and url[0]!=url_store):
# store url in variable
url_store=url[0]
# open the url in webbrowser
webbrowser.open(url[0])
# save the html dom to a text file
urllib.request.urlretrieve(url[0], "test.txt")
# when rate limit is reached
except tweepy.TweepError:
# select the next key from array
changeKeys()
# right now function always returns false
return False
def changeKeys():
global key_index
# increment key_index by 1 unless end of key array is reached -> start from the beginning
if key_index >= len(keys) - 1:
key_index = 0
else:
key_index += 1
def getIMG():
# Driver code
if __name__ == '__main__':
# boolean if url was found (right now its always false)
found=False
# never ending for loop
while not found:
# get tweets from specific twitter handle
found = get_tweets("Trump",)
time.sleep(0.02)
这是一个很好的问题。您使用RegEx的方法是查找图像的错误方法。
每个推文包含“实体”-参见https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/entities-object
您可以使用它们直接从推文中获取图像。
例如:
tweet.entities.urls
将为您提供Tweet中的所有URL。