.

问题描述 投票:1回答:1
Sample of Speech.txt:

Sample of Script.txt:Expected Output:Code (work in progress):The above code seems to only work for the first line in Speech.txt and then stops. I want it to run through the entire file i.e. line 2, line 3 ...etc. I also haven't figured out how to output the results into a text file. I can only print out the results at the moment. Any help would be appreciated!EDITLinks to Script.txt and Speech.txt. I have two text files: Speech.txt and Script.txt. Speech.txt contains a list of filenames of audio files and Script.txt contains the relevant transcript. Script.txt contains transcripts for all ...You can load the lines into lists with the readlines() method and then iterate over them. This avoids the problem that Kuldeep Singh Sidhu correctly ifentified of the pointer reaching the end of the file. Using

is another approach as well since this seems like your typical join problem.

0x000f4a03.wav
0x000f4a07.wav
0x000f4a0f.wav

Then it is just a matter of selecting the columns you want and saving them out.

0x000f4a0f |            | And unites the clans against Nilfgaard?
0x000f4a11 |            | Of course. He's already decreed new longships be built.
0x000f4a03 |            | Thinking long-term, then. Think she'll succeed?
0x000f4a05 |            | She's got a powerful ally. In me.
0x000f4a07 |            | Son's King of Skellige. Congratulations to you.

C:/Speech/0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
C:/Speech/0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
C:/Speech/0x000f4a0f.wav|And unites the clans against Nilfgaard?

f1=open(r'C:/Speech.txt',"r", encoding='utf8')
f2=open(r'C:/script.txt',"r", encoding='utf8')
for line1 in f1:
    for line2 in f2:
        if line1[0:10]==line2[0:10]:
              print('C:/Speech/' + line2[0:10] + '.wav' + '|' + line2[26:-1])              
f1.close()
f2.close()

I would read the

contents into a dictionary, then use this dictionary as your iterate the lines from , and only print lines that exist. This avoids the need to iterate the file multiple times, which could be quite slow if you have large files.Demo:Output:Its also much easier to use With Statement Context Managers

to open your files, since you don't need to call
python string-comparison
1个回答
1
投票

to get the filename from your

f1=open(r'C:/Speech.txt',"r", encoding='utf8')
f2=open(r'C:/script.txt',"r", encoding='utf8')
lines1 = f1.readlines()
lines2 = f2.readlines()
f1.close()
f2.close()

with open("output.txt","w") as outfile:
    for line1 in lines1:
        for line2 in lines2:
            if line1[0:10]==line2[0:10]:
                  outfile.write('C:/Speech/' + line2[0:10] + '.wav' + '|' + line2[26:-1],"/n")              
files. I find this easier to use than the

1
投票

functions. Although this is personal preference and all will work. pandasif we want to write the output to a text file, we can open another output file in write mode using

import pandas as pd

df = pd.read_csv('speech.txt', header=None, names=['name'])
df1 = pd.read_csv('script.txt', sep='|', header=None, names=['name', 'blank', 'description'])

df1['name'] = df1.name.str.strip() + '.wav'

final = pd.merge(df, df1, how='left', left_on='name', right_on='name')
final['name'] = 'C:/Speech/' + final['name']

print(final)

                       name         blank                                       description
0  C:/Speech/0x000f4a03.wav                 Thinking long-term, then. Think she'll succeed?
1  C:/Speech/0x000f4a07.wav                 Son's King of Skellige. Congratulations to you.
2  C:/Speech/0x000f4a0f.wav                         And unites the clans against Nilfgaard?

:

final = final[['name', 'description']]
final.to_csv('some_name.csv', index=False, sep='|')
output.txt

1
投票

Reading and Writing FilesScript.txt from the documentation for more information on how to read and write files in python. Speech.txt

from pathlib import Path

with open("Speech.txt") as speech_file, open("Script.txt") as script_file:
    script_dict = {}
    for line in script_file:
        key, _, text = map(str.strip, line.split("|"))
        script_dict[key] = text

    for line in map(str.strip, speech_file):
        filename = Path(line).stem
        if filename in script_dict:
            print(f"C:\Speech\{line}|{script_dict[filename]}")

For each line of the

C:\Speech\0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
C:\Speech\0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
C:\Speech\0x000f4a0f.wav|And unites the clans against Nilfgaard?

file, you need to check if it exists or not in the file. Considering that the content of fits in memory you should load its content to avoid reading it every time. .close()Once the content of

is loaded, you simply process each line of the pathlib.PurePath.stem, search it in the dictionary and print it when required..wavNext, I provide the code. Notice that:os.path.basenameI have added debug information. You can hide it by executing os.path.spltextI use

to remove the extension from the filenamemode="w"I

from pathlib import Path

with open("Speech.txt") as speech_file, open("Script.txt") as script_file, open("output.txt", mode="w") as output_file:
    script_dict = {}
    for line in script_file:
        key, _, text = map(str.strip, line.split("|"))
        script_dict[key] = text

    for line in map(str.strip, speech_file):
        filename = Path(line).stem
        if filename in script_dict:
            output_file.write(f"C:\Speech\{line}|{script_dict[filename]}\n")

every processed line to get rid of spaces

C:\Speech\0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
C:\Speech\0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
C:\Speech\0x000f4a0f.wav|And unites the clans against Nilfgaard?

我有两个文本文件。Speech. txt


1
投票

. Speech.txtSpeech.txtScript.txt 包含音频文件的文件名列表和 Script.txt脚本.txt

包含相关的脚本。Script.txtScript.txtSpeech.txt 载有以下内容的文字记录

  • 角色和项目,但我只想要一个特定角色的成绩单。python -O script.py
  • 只是os.path.splittext(var)[0]. 我想写一个python脚本,将文件名与成绩单进行比较,并返回一个包含文件路径、文件名、扩展名和成绩单的文本文件,并以
  • 换行符。strip代码:

调试输出。

#!/usr/bin/python

# -*- coding: utf-8 -*-

# For better print formatting
from __future__ import print_function

# Imports
import sys
import os


#
# HELPER METHODS
#
def load_script_file(script_file_path):
    # Parse each line of the script file and load to a dictionary
    d = {}
    with open(script_file_path, "r") as f:
        for transcript_info in f:
            if __debug__:
                print("Loading line: " + str(transcript_info))
            speech_filename, _, transcription = transcript_info.split("|")
            speech_filename = speech_filename.strip()
            transcription = transcription.strip()
            d[speech_filename] = transcription

    if __debug__:
        print("Loaded values: " + str(d))
    return d


#
# MAIN METHODS
#

def main(speech_file_path, script_file_path, output_file):
    # Load the script data into a dictionary
    speech_to_transcript = load_script_file(script_file_path)

    # Check each speech entry
    with open(speech_file_path, "r") as f:
        for speech_audio_file in f:
            speech_audio_file = speech_audio_file.strip()
            if __debug__:
                print()
                print("Checking speech file: " + str(speech_audio_file))

            # Remove extension
            speech_code = os.path.splitext(speech_audio_file)[0]
            if __debug__:
                print(" + Obtained filename: " + speech_code)

            # Find entry in transcript
            if speech_code in speech_to_transcript.keys():
                if __debug__:
                    print(" + Filename registered. Loading transcript")
                transcript = speech_to_transcript[speech_code]
                if __debug__:
                    print(" + Transcript: " + str(transcript))

                # Print information
                output_line = "C:/Speech/" + speech_audio_file + "|" + transcript
                if output_file is None:
                    print(output_line)
                else:
                    with open(output_file, 'a') as fw:
                        fw.write(output_line + "\n")
            else:
                if __debug__:
                    print(" + Filename not registered")


#
# ENTRY POINT
#
if __name__ == '__main__':
    # Parse arguments
    args = sys.argv[1:]
    speech = str(args[0])
    script = str(args[1])
    if len(args) == 3:
        output = str(args[2])
    else:
        output = None

    # Log arguments if required
    if __debug__:
        print("Running with:")
        print(" - SPEECH FILE = " + str(speech))
        print(" - SCRIPT FILE = " + str(script))
        print(" - OUTPUT FILE = " + str(output))
        print()

    # Execute main
    main(speech, script, output)

输出:

$ python speech_transcript.py ./Speech.txt ./Script.txt
Running with:
 - SPEECH FILE = ./Speech.txt
 - SCRIPT FILE = ./Script.txt

Loaded values: {'0x000f4a03': "Thinking long-term, then. Think she'll succeed?", '0x000f4a11': "Of course. He's already decreed new longships be built.", '0x000f4a05': "She's got a powerful ally. In me.", '0x000f4a07': "Son's King of Skellige. Congratulations to you.", '0x000f4a0f': 'And unites the clans against Nilfgaard?'}

Checking speech file: 0x000f4a03.wav
 + Obtained filename: 0x000f4a03
 + Filename registered. Loading transcript
 + Transcript: Thinking long-term, then. Think she'll succeed?
C:/Speech/0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?

Checking speech file: 0x000f4a07.wav
 + Obtained filename: 0x000f4a07
 + Filename registered. Loading transcript
 + Transcript: Son's King of Skellige. Congratulations to you.
C:/Speech/0x000f4a07.wav|Son's King of Skellige. Congratulations to you.

Checking speech file: 0x000f4a0f.wav
 + Obtained filename: 0x000f4a0f
 + Filename registered. Loading transcript
 + Transcript: And unites the clans against Nilfgaard?
C:/Speech/0x000f4a0f.wav|And unites the clans against Nilfgaard?

写到文件的输出。

$ python -O speech_transcript.py ./Speech.txt ./Script.txt 
C:/Speech/0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
C:/Speech/0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
C:/Speech/0x000f4a0f.wav|And unites the clans against Nilfgaard?

$ python -O speech_transcript.py ./Speech.txt ./Script.txt ./output.txt
$ more output.txt 
C:/Speech/0x000f4a03.wav|Thinking long-term, then. Think she'll succeed?
C:/Speech/0x000f4a07.wav|Son's King of Skellige. Congratulations to you.
C:/Speech/0x000f4a0f.wav|And unites the clans against Nilfgaard?
最新问题
© www.soinside.com 2019 - 2024. All rights reserved.