给定一个未缩进的字符串作为输入,执行以下步骤:
识别字符串中层次结构最高级别的列表项。这些顶级项目可以通过以下标准来识别:
对于步骤 1 中确定的每个顶级项目:
a.将其与所有后续较低级别项目分组,直到遇到下一个顶级项目。较低级别的项目可以通过以下标准来识别:
b.将顶级项目与其关联的较低级别项目连接成单个字符串,同时保留原始格式和分隔符。应保留输入字符串中出现的格式和分隔符。
将生成的分组列表项作为 Python 列表返回,其中每个元素代表一个顶级项及其关联的较低级项。列表中的每个元素应该是一个字符串,包含连接的顶级项目及其较低级别的项目。
从输出中排除出现在第一个顶级项目之前和最后一个顶级项目之后的任何文本。只有第一个和最后一个顶级项目之间的内容才应包含在输出列表中。
目标是创建一个 Python 方法,该方法采用不缩进的字符串作为输入,根据指定的条件识别顶级项目及其关联的较低级别项目,将它们连接成每个顶级项目的单个字符串,同时保持原始格式和分隔符,并将生成的分组列表项作为 Python 列表返回。输出列表应与所需的格式匹配,每个元素代表一个顶级项目及其关联的较低级别项目。
请提供有关如何创建可成功实现上述目标的 Python 方法的解释和指导。解释应包括所涉及的步骤、任何必要的数据结构或算法,以及处理不同场景和边缘情况的注意事项。
我尝试创建一个Python方法来实现上述任务,但我的尝试没有成功。我尝试过的方法不会为给定的输入产生预期的输出。
为了帮助测试和验证解决方案,我在下面创建并包含了大量示例输入及其相应的预期输出。这些测试用例涵盖了各种场景和边缘情况,以确保方法的稳健性。
尝试1:
def process_list_hierarchy(text):
# Helper function to determine the indentation level
def get_indentation_level(line):
return len(line) - len(line.lstrip())
# Helper function to parse the input text into a list of lines with their hierarchy levels
def parse_hierarchy(text):
lines = text.split('\n')
hierarchy = []
for line in lines:
if line.strip(): # Ignore empty lines
level = get_indentation_level(line)
hierarchy.append((level, line.strip()))
return hierarchy
# Helper function to build a tree structure from the hierarchy levels
def build_tree(hierarchy):
tree = []
stack = [(-1, tree)] # Start with a dummy root level
for level, content in hierarchy:
# Find the correct parent level
while stack and stack[-1][0] >= level:
stack.pop()
# Create a new node and add it to its parent's children
node = {'content': content, 'children': []}
stack[-1][1].append(node)
stack.append((level, node['children']))
return tree
# Helper function to combine the tree into a single list
def combine_tree(tree, combined_list=[], level=0):
for node in tree:
combined_list.append((' ' * level) + node['content'])
if node['children']:
combine_tree(node['children'], combined_list, level + 1)
return combined_list
# Parse the input text into a hierarchy
hierarchy = parse_hierarchy(text)
# Build a tree structure from the hierarchy
tree = build_tree(hierarchy)
# Combine the tree into a single list while maintaining the hierarchy
combined_list = combine_tree(tree)
# Return the combined list as a string
return '\n'.join(combined_list)
尝试2:
def organize_hierarchically(items):
def get_level(item):
match = re.match(r'^(\d+\.?|\-|\*)', item)
return len(match.group()) if match else 0
grouped_items = []
for level, group in groupby(items, key=get_level):
if level == 1:
grouped_items.append('\n'.join(group))
else:
grouped_items[-1] += '\n' + '\n'.join(group)
return grouped_items
尝试3:
from bs4 import BeautifulSoup
import nltk
def extract_sub_objectives(input_text):
soup = BeautifulSoup(input_text, 'html.parser')
text_content = soup.get_text()
# Tokenize the text into sentences
sentences = nltk.sent_tokenize(text_content)
# Initialize an empty list to store the sub-objectives
sub_objectives = []
# Iterate through the sentences and extract sub-objectives
current_sub_objective = ""
for sentence in sentences:
if sentence.startswith(("1.", "2.", "3.", "4.")):
if current_sub_objective:
sub_objectives.append(current_sub_objective)
current_sub_objective = ""
current_sub_objective += sentence + "\n"
elif current_sub_objective:
current_sub_objective += sentence + "\n"
# Append the last sub-objective, if any
if current_sub_objective:
sub_objectives.append(current_sub_objective)
return sub_objectives
尝试4:
def extract_sub_objectives(input_text, preserve_formatting=False):
# Modified to strip both single and double quotes
input_text = input_text.strip('\'"')
messages = []
messages.append("Debug: Starting to process the input text.")
# Debug message to show the input text after stripping quotes
messages.append(f"Debug: Input text after stripping quotes: '{input_text}'")
# Define possible starting characters for new sub-objectives
start_chars = [str(i) + '.' for i in range(1, 100)] # Now includes up to two-digit numbering
messages.append(f"Debug: Start characters defined: {start_chars}")
# Define a broader range of continuation characters
continuation_chars = ['-', '*', '+', '•', '>', '→', '—'] # Expanded list
messages.append(f"Debug: Continuation characters defined: {continuation_chars}")
# Replace escaped newline characters with actual newline characters
input_text = input_text.replace('\\n', '\n')
# Split the input text into lines
lines = input_text.split('\n')
messages.append(f"Debug: Input text split into lines: {lines}")
# Initialize an empty list to store the sub-objectives
sub_objectives = []
# Initialize an empty string to store the current sub-objective
current_sub_objective = ''
# Initialize a counter for the number of continuations in the current sub-objective
continuation_count = 0
# Function to determine if a line is a new sub-objective
def is_new_sub_objective(line):
# Strip away leading quotation marks and whitespace
line = line.strip('\'"').strip()
return any(line.startswith(start_char) for start_char in start_chars)
# Function to determine if a line is a continuation
def is_continuation(line, prev_line):
if not prev_line:
return False
# Check if the line starts with an alphanumeric followed by a period or parenthesis
if len(line) > 1 and line[0].isalnum() and (line[1] == '.' or line[1] == ')'):
# Check if it follows the sequence of the previous line
if line[0].isdigit() and prev_line[0].isdigit() and int(line[0]) == int(prev_line[0]) + 1:
return False
elif line[0].isalpha() and prev_line[0].isalpha() and ord(line[0].lower()) == ord(prev_line[0].lower()) + 1:
return False
else:
return True
# Add a condition to check for lower-case letters followed by a full stop
if line[0].islower() and line[1] == '.':
return True
return any(line.startswith(continuation_char) for continuation_char in continuation_chars)
# Iterate over each line
for i, line in enumerate(lines):
prev_line = lines[i - 1] if i > 0 else ''
# Check if the line is a new sub-objective
if is_new_sub_objective(line):
messages.append(f"Debug: Found a new sub-objective at line {i + 1}: '{line}'")
# If we have a current sub-objective, check the continuation count
if current_sub_objective:
if continuation_count < 2:
messages.append(f"Debug: Sub-objective does not meet the continuation criterion: '{current_sub_objective}'")
for message in messages:
print(message)
return None
# Check the preserve_formatting parameter before adding
sub_objectives.append(
current_sub_objective.strip() if not preserve_formatting else current_sub_objective)
messages.append(f"Debug: Added a sub-objective to the list. Current count: {len(sub_objectives)}.")
# Reset the current sub-objective to the new one and reset the continuation count
current_sub_objective = line
continuation_count = 0
# Check if the line is a continuation
elif is_continuation(line, prev_line):
messages.append(f"Debug: Line {i + 1} is a continuation of the previous line: '{line}'")
# Add the line to the current sub-objective, checking preserve_formatting
current_sub_objective += '\n' + line if preserve_formatting else ' ' + line.strip()
# Increment the continuation count
continuation_count += 1
# Handle lines that are part of the current sub-objective but don't start with a continuation character
elif current_sub_objective:
messages.append(f"Debug: Line {i + 1} is part of the current sub-objective: '{line}'")
# Add the line to the current sub-objective, checking preserve_formatting
current_sub_objective += '\n' + line if preserve_formatting else ' ' + line.strip()
# If we have a current sub-objective, check the continuation count before adding it to the list
if current_sub_objective:
if continuation_count < 2:
messages.append(f"Debug: Sub-objective does not meet the continuation criterion: '{current_sub_objective}'")
for message in messages:
print(message)
return None
# Check the preserve_formatting parameter before adding
sub_objectives.append(current_sub_objective.strip() if not preserve_formatting else current_sub_objective)
messages.append(f"Debug: Added the final sub-objective to the list. Final count: {len(sub_objectives)}.")
# Print the debug messages if no sub-objectives are found
if not sub_objectives:
for message in messages:
print(message)
return sub_objectives
根据我的理解,这应该有效:
def parse_list(items):
def helper(items, level):
result = []
i = 0
while i < len(items):
item = items[i]
if item.startswith(' ' * level):
if '.' in item:
key, value = item.split('.', 1)
subitems, i = helper(items[i + 1:], level + 1)
result.append({key.strip(): value.strip(), 'children': subitems})
else:
result.append({'item': item.strip(), 'children': []})
else:
break
i += 1
return result, i
items = [item.strip() for item in items.split('\n') if item.strip()]
parsed, _ = helper(items, 0)
return parsed
# Example usage:
unindented_string = """
Item 1
Subitem 1.1
Subitem 1.2
Item 2
Subitem 2.1
Subitem 2.2
Subsubitem 2.2.1
Subsubitem 2.2.2
Item 3
Subitem 3.1
Subitem 3.2
"""
parsed_list = parse_list(unindented_string)
print(parsed_list)