Sample of Script.txt:Expected Output:Code (work in progress):The above code seems to only work for the first line in Speech.txt and then stops. I want it to run through the entire file i.e. line 2, line 3 ...etc. I also haven't figured out how to output the results into a text file. I can only print out the results at the moment. Any help would be appreciated!EDITLinks to Script.txt and Speech.txt. I have two text files: Speech.txt and Script.txt. Speech.txt contains a list of filenames of audio files and Script.txt contains the relevant transcript. Script.txt contains transcripts for all ...You can load the lines into lists with the readlines() method and then iterate over them. This avoids the problem that Kuldeep Singh Sidhu correctly ifentified of the pointer reaching the end of the file. Using
is another approach as well since this seems like your typical join problem.
0x000f4a03.wav
0x000f4a07.wav
0x000f4a0f.wav
Then it is just a matter of selecting the columns you want and saving them out.
0x000f4a0f | | And unites the clans against Nilfgaard?
0x000f4a11 | | Of course. He's already decreed new longships be built.
0x000f4a03 | | Thinking long-term, then. Think she'll succeed?
0x000f4a05 | | She's got a powerful ally. In me.
0x000f4a07 | | Son's King of Skellige. Congratulations to you.
C:/Speech/0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
C:/Speech/0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
C:/Speech/0x000f4a0f.wav|And unites the clans against Nilfgaard?
f1=open(r'C:/Speech.txt',"r", encoding='utf8')
f2=open(r'C:/script.txt',"r", encoding='utf8')
for line1 in f1:
for line2 in f2:
if line1[0:10]==line2[0:10]:
print('C:/Speech/' + line2[0:10] + '.wav' + '|' + line2[26:-1])
f1.close()
f2.close()
I would read the
contents into a dictionary, then use this dictionary as your iterate the lines from , and only print lines that exist. This avoids the need to iterate the file multiple times, which could be quite slow if you have large files.Demo:Output:Its also much easier to use With Statement Context Managers
to open your files, since you don't need to callto get the filename from your
f1=open(r'C:/Speech.txt',"r", encoding='utf8')
f2=open(r'C:/script.txt',"r", encoding='utf8')
lines1 = f1.readlines()
lines2 = f2.readlines()
f1.close()
f2.close()
with open("output.txt","w") as outfile:
for line1 in lines1:
for line2 in lines2:
if line1[0:10]==line2[0:10]:
outfile.write('C:/Speech/' + line2[0:10] + '.wav' + '|' + line2[26:-1],"/n")
files. I find this easier to use than the functions. Although this is personal preference and all will work. pandas
if we want to write the output to a text file, we can open another output file in write mode using
import pandas as pd
df = pd.read_csv('speech.txt', header=None, names=['name'])
df1 = pd.read_csv('script.txt', sep='|', header=None, names=['name', 'blank', 'description'])
df1['name'] = df1.name.str.strip() + '.wav'
final = pd.merge(df, df1, how='left', left_on='name', right_on='name')
final['name'] = 'C:/Speech/' + final['name']
print(final)
name blank description
0 C:/Speech/0x000f4a03.wav Thinking long-term, then. Think she'll succeed?
1 C:/Speech/0x000f4a07.wav Son's King of Skellige. Congratulations to you.
2 C:/Speech/0x000f4a0f.wav And unites the clans against Nilfgaard?
:
final = final[['name', 'description']]
final.to_csv('some_name.csv', index=False, sep='|')
output.txtReading and Writing FilesScript.txt
from the documentation for more information on how to read and write files in python. Speech.txt
from pathlib import Path
with open("Speech.txt") as speech_file, open("Script.txt") as script_file:
script_dict = {}
for line in script_file:
key, _, text = map(str.strip, line.split("|"))
script_dict[key] = text
for line in map(str.strip, speech_file):
filename = Path(line).stem
if filename in script_dict:
print(f"C:\Speech\{line}|{script_dict[filename]}")
For each line of the
C:\Speech\0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
C:\Speech\0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
C:\Speech\0x000f4a0f.wav|And unites the clans against Nilfgaard?
file, you need to check if it exists or not in the file. Considering that the content of fits in memory you should load its content to avoid reading it every time. .close()
Once the content of
is loaded, you simply process each line of the pathlib.PurePath.stem
, search it in the dictionary and print it when required..wav
Next, I provide the code. Notice that:os.path.basename
I have added debug information. You can hide it by executing os.path.spltext
I use
to remove the extension from the filenamemode="w"
I
from pathlib import Path
with open("Speech.txt") as speech_file, open("Script.txt") as script_file, open("output.txt", mode="w") as output_file:
script_dict = {}
for line in script_file:
key, _, text = map(str.strip, line.split("|"))
script_dict[key] = text
for line in map(str.strip, speech_file):
filename = Path(line).stem
if filename in script_dict:
output_file.write(f"C:\Speech\{line}|{script_dict[filename]}\n")
every processed line to get rid of spaces
C:\Speech\0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
C:\Speech\0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
C:\Speech\0x000f4a0f.wav|And unites the clans against Nilfgaard?
我有两个文本文件。Speech. txt
和. Speech.txt
Speech.txtScript.txt
包含音频文件的文件名列表和 Script.txt
脚本.txt
包含相关的脚本。Script.txt
Script.txtSpeech.txt
载有以下内容的文字记录
都
python -O script.py
os.path.splittext(var)[0]
. 我想写一个python脚本,将文件名与成绩单进行比较,并返回一个包含文件路径、文件名、扩展名和成绩单的文本文件,并以 strip
代码:调试输出。
#!/usr/bin/python
# -*- coding: utf-8 -*-
# For better print formatting
from __future__ import print_function
# Imports
import sys
import os
#
# HELPER METHODS
#
def load_script_file(script_file_path):
# Parse each line of the script file and load to a dictionary
d = {}
with open(script_file_path, "r") as f:
for transcript_info in f:
if __debug__:
print("Loading line: " + str(transcript_info))
speech_filename, _, transcription = transcript_info.split("|")
speech_filename = speech_filename.strip()
transcription = transcription.strip()
d[speech_filename] = transcription
if __debug__:
print("Loaded values: " + str(d))
return d
#
# MAIN METHODS
#
def main(speech_file_path, script_file_path, output_file):
# Load the script data into a dictionary
speech_to_transcript = load_script_file(script_file_path)
# Check each speech entry
with open(speech_file_path, "r") as f:
for speech_audio_file in f:
speech_audio_file = speech_audio_file.strip()
if __debug__:
print()
print("Checking speech file: " + str(speech_audio_file))
# Remove extension
speech_code = os.path.splitext(speech_audio_file)[0]
if __debug__:
print(" + Obtained filename: " + speech_code)
# Find entry in transcript
if speech_code in speech_to_transcript.keys():
if __debug__:
print(" + Filename registered. Loading transcript")
transcript = speech_to_transcript[speech_code]
if __debug__:
print(" + Transcript: " + str(transcript))
# Print information
output_line = "C:/Speech/" + speech_audio_file + "|" + transcript
if output_file is None:
print(output_line)
else:
with open(output_file, 'a') as fw:
fw.write(output_line + "\n")
else:
if __debug__:
print(" + Filename not registered")
#
# ENTRY POINT
#
if __name__ == '__main__':
# Parse arguments
args = sys.argv[1:]
speech = str(args[0])
script = str(args[1])
if len(args) == 3:
output = str(args[2])
else:
output = None
# Log arguments if required
if __debug__:
print("Running with:")
print(" - SPEECH FILE = " + str(speech))
print(" - SCRIPT FILE = " + str(script))
print(" - OUTPUT FILE = " + str(output))
print()
# Execute main
main(speech, script, output)
输出:
$ python speech_transcript.py ./Speech.txt ./Script.txt
Running with:
- SPEECH FILE = ./Speech.txt
- SCRIPT FILE = ./Script.txt
Loaded values: {'0x000f4a03': "Thinking long-term, then. Think she'll succeed?", '0x000f4a11': "Of course. He's already decreed new longships be built.", '0x000f4a05': "She's got a powerful ally. In me.", '0x000f4a07': "Son's King of Skellige. Congratulations to you.", '0x000f4a0f': 'And unites the clans against Nilfgaard?'}
Checking speech file: 0x000f4a03.wav
+ Obtained filename: 0x000f4a03
+ Filename registered. Loading transcript
+ Transcript: Thinking long-term, then. Think she'll succeed?
C:/Speech/0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
Checking speech file: 0x000f4a07.wav
+ Obtained filename: 0x000f4a07
+ Filename registered. Loading transcript
+ Transcript: Son's King of Skellige. Congratulations to you.
C:/Speech/0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
Checking speech file: 0x000f4a0f.wav
+ Obtained filename: 0x000f4a0f
+ Filename registered. Loading transcript
+ Transcript: And unites the clans against Nilfgaard?
C:/Speech/0x000f4a0f.wav|And unites the clans against Nilfgaard?
写到文件的输出。
$ python -O speech_transcript.py ./Speech.txt ./Script.txt
C:/Speech/0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
C:/Speech/0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
C:/Speech/0x000f4a0f.wav|And unites the clans against Nilfgaard?
$ python -O speech_transcript.py ./Speech.txt ./Script.txt ./output.txt
$ more output.txt
C:/Speech/0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
C:/Speech/0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
C:/Speech/0x000f4a0f.wav|And unites the clans against Nilfgaard?