计算给定文本中的特定标点符号，不使用正则表达式或其他模块

Question

我有一个文本文件，其中的段落中有大量文字。我需要计算某些标点符号：

不使用任何模块，甚至不使用regex
计数,和;
还需要计算'和-，但仅在某些情况下。具体来说：
- count '个标记，但仅当它们显示为被字母包围的撇号时，即表示诸如“ shouldt't”或“ wo n't”之类的收缩。（包括撇号是为了指示更多非正式的写作，也许是直接演讲。）
- 计数-个符号，但仅当它们被字母包围时，表示一个复合词，例如“自尊”。
其他标点符号或字母，例如数字，应视为空格，因此只能用作结尾词。
注：我们将使用的某些文本包括双连字符，即--。这应被视为空格字符。

我首先创建了一个字符串，并在其中存储了一些标点符号，例如punctuation_string = ";./'-"，但它给了我总数；我需要的是计算单个标点符号。因此，我必须更改certain_cha可变的次数。

with open("/Users/abhishekabhishek/downloads/l.txt") as f:
    text_lis = f.read().split()
punctuation_count = {}
certain_cha = "/"
freq_coun = 0
for word in text_lis:
    for char in word:
       if char in certain_char:
        freq_coun += 1
 punctuation_count[certain_char] = freq_count

我需要这样显示值：

等但是我得到的是总计（71）。

Answer 1

您将需要创建一个词典，其中每个条目都存储每个标点符号的计数。对于逗号和分号，我们可以简单地进行字符串搜索以计算单词中出现的次数。但是我们需要稍微不同地处理'和-。

这应该处理所有情况：

with open("/Users/abhishekabhishek/downloads/l.txt") as f:
    text_words = f.read().split()
punctuation_count = {}
punctuation_count[','] = 0
punctuation_count[';'] = 0
punctuation_count["'"] = 0
punctuation_count['-'] = 0


def search_for_single_quotes(word):
    single_quote = "'"
    search_char_index = word.find(single_quote)
    search_char_count = word.count(single_quote)
    if search_char_index == -1 and search_char_count != 1:
        return
    index_before = search_char_index - 1
    index_after = search_char_index + 1
    # Check if the characters before and after the quote are alphabets,
    # and the alphabet after the quote is the last character of the word.
    # Will detect `won't`, `shouldn't`, but not `ab'cd`, `y'ess`
    if index_before >= 0 and word[index_before].isalpha() and \
            index_after == len(word) - 1 and word[index_after].isalpha():
        punctuation_count[single_quote] += 1


def search_for_hyphens(word):
    hyphen = "-"
    search_char_index = word.find(hyphen)
    if search_char_index == -1:
        return
    index_before = search_char_index - 1
    index_after = search_char_index + 1
    # Check if the character before and after hyphen is an alphabet.
    # You can also change it check for characters as well as numbers
    # depending on your use case.
    if index_before >= 0 and word[index_before].isalpha() and \
            index_after < len(word) and word[index_after].isalpha():
        punctuation_count[hyphen] += 1


for word in text_words:
    for search_char in [',', ';']:
        search_char_count = word.count(search_char)
        punctuation_count[search_char] += search_char_count
    search_for_single_quotes(word)
    search_for_hyphens(word)


print(punctuation_count)

Answer 2

以下应该起作用：

text = open("/Users/abhishekabhishek/downloads/l.txt").read()

text = text.replace("--", " ")

for symbol in "-'":
    text = text.replace(symbol + " ", "")
    text = text.replace(" " + symbol, "")

for symbol in ".,/'-":
    print (symbol, text.count(symbol))

Answer 3

因为您不想导入任何东西，这会很慢并且会花费一些时间，但是应该可以：

file = open() # enter your file path as parameter
lines = file.readline() # enter the number of lines in your document as parameter
search_chars = [',', ';', "'", '-'] # store the values to be searched
search_values = {',':0, ';':0, "'":0, '-':0} # a dictionary saves the number of occurences
whitespaces = [' ', '--', '1', '2', ...] # you can add to this list whatever you need

for line in lines:
    for search in search_chars:
        if search in line and (search in search_chars):
            chars = line.split()
            for ch_index in chars:
                if chars [ch_index] == ',':
                    search_values [','] += 1
                elif chars [ch_index] == ';':
                    search_values [';'] += 1
                elif chars[ch_index] == "'" and not(chars[ch_index-1] in whitespaces) and not(chars[ch_index+1] in whitespaces):
                    search_values ["'"] += 1
                elif chars[ch_index] == "-" and not(chars[ch_index-1] in whitespaces) and not(chars[ch_index+1] in whitespaces):
                    search_values ["-"] += 1

for key in range(search_values.keys()):
    print(str(key) + ': ' + search_values[key])

这显然不是最佳选择，最好在这里使用正则表达式，但是应该可以使用。

随时询问是否有任何问题。

计算给定文本中的特定标点符号，不使用正则表达式或其他模块

问题描述投票：0回答：3

3个回答

最新问题

计算给定文本中的特定标点符号，不使用正则表达式或其他模块

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3