如何使用Perl脚本从FASTA文件中的匹配字符串中提取ID?

问题描述 投票:0回答:1

我是Perl编程的新手,并且坚持使用自己的脚本。

我正在尝试在FASTA文件中搜索一个基序,如果找到,则打印出包含该基序的蛋白质的ID。

我可以加载我的文件,但是在放置主题之后,什么也没有发生。我收到以下错误:在串联(。)或test.pl第36行第2行的字符串中使用未初始化的值$ data [0]。

这是我的代码:

#!/usr/bin/perl -w
# Searching for motifs
print "Please type the filename of the protein sequence data: ";
$proteinfilename = <STDIN>;
# Remove the newline from the protein filename
chomp $proteinfilename;
# open the file, or exit
unless ( open(FA, $proteinfilename) ) {
print "Cannot open file \"$proteinfilename\"\n\n";
exit;
}
@protein = <FA>; # Read the protein sequence data from the file, and store it into the array variable @protein

my (@description, @ID, @data);

while (my $protein = <FA>) {
    chomp($protein);
    @description = split (/\s+/, $protein);
    push (@ID, $description[0]);   
}
# Close the file 
close FA;
my %params = map { $_ => 1 } @ID;
# Put the protein sequence data into a single string, as it's easier to search for a motif in a string than in an array of lines
$protein = join( '', @protein);
# Remove whitespace
$protein =~ s/\s//g;

# ask for a motif or exit if no motif is entered.
do {
print "Enter a motif to search for: ";
$motif = <STDIN>;
# Remove the newline at the end of $motif
chomp $motif;

# Look for the motif
@data = split (/\s+/, $protein);

if ( $protein =~ /$motif/ ) {
    print $description[0]."\n" if(exists($params{$data[0]}));
}
# exit on an empty user input
} until ( $motif =~ /^\s*$/ );
# exit the program
exit;

输入的示例是:

sp | O60341 | KDM1A_HUMAN赖氨酸特异性组蛋白脱甲基酶1A OS =智人OX = 9606 GN = KDM1A PE = 1 SV = 2MLSGKKAAAAAAAAAAAAAAATGTEAGPGTAGGSENGSEVAAQPAGLSGPAEVGPGAVGERTPRKKEPPRASPPGGLAEPPGSAGPQAGPTVVPGSATPMETGIAETPEGRRTSRRKRAKVEY

假设我想在给定序列中找到基序'PMET'。如果存在,我想获取一个ID作为输出-> O60341

非常感谢!

非常感谢任何反馈!

string perl design-patterns fasta motif
1个回答
0
投票

我在这里为单行输入文件编写了示例代码。

my $motif = <STDIN>;
chomp($motif);

my $str = "sp|O60341|KDM1A_HUMAN Lysine-specific histone demethylase 1A OS=Homo sapiens OX=9606 GN=KDM1A PE=1 SV=2 MLSGKKAAAAAAAAAAAATGTEAGPGTAGGSENGSEVAAQPAGLSGPAEVGPGAVGERTP RKKEPPRASPPGGLAEPPGSAGPQAGPTVVPGSATPMETGIAETPEGRRTSRRKRAKVEY";

if($str=~m/$motif/)
{
    if($str=~m/^([^|]+)\|([^|]+)\|/gm)
    {
        print "Expected Value: $2\n";
    }
}
else { print "Not matched...\n";  }

Input>$: PMET

Expected Value: O60341

© www.soinside.com 2019 - 2024. All rights reserved.