我正在开发我的第一个Python项目,用于从OCR读取字符串并输出盲文的设备。盲文设备一次只能输出6个字母。我试图扫描6个字符长的多字符串数组中的每个字符。
为简单起见,现在我只想为多字符串数组中的每个字符打印“this is(insert character)”。实际上,输出将运行代码,该代码告诉前两个电机以盲文创建字符,然后对剩余的5个字符执行此操作,其余10个电机在每个6个字符长的字符串之间有短暂延迟。如何扫描每个6个字符长的字符串并将其循环到数组中的其余字符串?
这是我到目前为止的地方:
from PIL import Image
import pytesseract
img = Image.open('img file path')
text = [item for item in (pytesseract.image_to_string(img, lang='eng', config='--psm 6')).split('\n')]
oneLineStr = ' '.join(text)
# displays: The quick brown fox jumps over the lazy dog.
print(oneLineStr)
arr6elem = []
for idx in range(0, len(oneLineStr), 6):
arr6elem.append(oneLineStr[idx:idx + 6])
# displays: ['The qu', 'ick br', 'own fo', 'x jump', 's over', ' the l', 'azy do', 'g.']
print(arr6elem)
# Don't know what to do from this point
# Want to scan each 6-element string in list and for each string, see which elements it consists of
# (capital/lower case characters, numbers, spaces, commas, apostrophes, periods, etc.)
# Then, print "this is a" for letter a, or "this is a colon" for :, etc.
# So that output looks like:
# ["'this is T', 'this is h', 'this is e', this is a space', 'this is q', 'this is u'", "'this is i', 'this is c'...]
字典应该可以解决问题:
punctuation = {
' ': 'a space',
',': 'a comma',
"'": 'an apostrophes',
'.': 'a period'
}
for word in arr6elem:
for char in word:
print('This is {}'.format(punctuation.get(char, char)))
一旦你用你需要的所有项目构建了标点符号,循环将从中获取相应的值,或者默认为实际的char。
Output:
# This is T
# This is h
# This is e
# This is a space
# This is q
# This is u
# This is i
# This is c
# This is k
# This is a space
# This is b
# This is r
# This is o
# This is w
# This is n
# This is a space
# This is f
# ...