我有一个data.frame,它有一堆行,带有唯一的ID,后跟一个氨基酸序列。我想知道是否有办法将行分成单独的唯一data.frame。
这是一个例子
bigdf
>ENSCAFP00000018847.4
FGHFGHFGHFGHFHFGHFGHFGHFGHFHFGHFGHFHFGHFGHFHFHFHFGTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.3
VCXVNSFRERYTRIOUHFSDAADSSAASAAAAGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.2
ASDASDADASDASDASDASDASSADASASRPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.1
QWEQWEQWEQWEWQREWRQWEQWRQRQQRERPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
如果我可以将新data.frames的名称作为他们的ID,那将是很好的,所以希望结果看起来像这样
ENSCAFP00000018847.4
>ENSCAFP00000018847.4
FGHFGHFGHFGHFHFGHFGHFGHFGHFHFGHFGHFHFGHFGHFHFHFHFGTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
ENSCAFP00000018847.3
>ENSCAFP00000018847.3
VCXVNSFRERYTRIOUHFSDAADSSAASAAAAGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
ENSCAFP00000018847.2
>ENSCAFP00000018847.2
ASDASDADASDASDASDASDASSADASASRPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
ENSCAFP00000018847.1
>ENSCAFP00000018847.1 QWEQWEQWEQWEWQREWRQWEQWRQRQQRERPGPVVTANHVEEPAMTPGVRTNSEGAFQTADLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
我知道这应该是一件奇怪的事情,但是需要为成千上万种不同的氨基酸序列做这件事,所以如果我能找到一种方法将它们全部分解为R将会很酷
dput(df[1:3, c(1)])
c("> ENSCAFP00000018847.4 MFFINIISLIIPILLAVAFLTLVERKVLGYMQLRKGPNIVGPYGLLQPIADAVKLFTKEPLRPLTSSMSMFILAPILALSLALTMWIPLPMPYPLINMNLGVLFMLAMSSLAVYSILWSGWASNSKYALIGALRAVAQTISYEVTLAIILLSVLLMNGSFTLSTLIITQEHMWLIFPAWPLAMMWFISTLAETNRAPFDLTEGESELVSGFNVEYAAGPFALFFLAEYANIIMMNILTTILFFGAFHNPFMPELYSINFTMKTLLLTICFLWIRASYPRFRYDQLMHLLWKNFLPLTLALCMWHVALPIITASIPPQT",
"> ENSCAFP00000018847.3 MKPPILIIIMATIMTGTMIVMLSSHWLLIWIGFEMNMLAIIPILMKKYNPRAMEASTKYFLTQATASMLLMMGVTINLLYSGQWVISKISNPIASIMMTTALTMKLGLSPFHFWVPEVTQGITLMSGMILLTWQKIAPMSILYQISPSINTNLLMLMALTSVLVGGWGGLNQTQLRKIMAYSSIAHMGWMAAIITYNPTMMVLNLTLYILMTLSTFMLFMLNSSTTTLSLSHMWNKFPLITSMILILMLSLGGLPPLSGFIPKWMIIQELTKNNMIIIPTLMAITALLNLYFYLRLTYSTALTMFPSTNNMKMKWQFEYTKKATLLPPLIITSTMLLPLTPMLSVLD",
"> ENSCAFP00000018847.2 MFINRWLFSTNHKDIGTLYLLFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSFLLLLASSMVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGNIKWSPAMLWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFAHWFPLFSGYTLNDTWAKIHFTIMFVGVNMTFFPQHFLGLSGMPRRYSDYPDAYTTWNTVSSMGSFISLTAVMLMIFMIWEAFASKREVAMVELTTTNIEWLHGCPPPYHTFEEPTYVIQK"
)
您可以将所有行放在命名的数据框列表中,然后使用list2env()
将它们放在全局环境中,如下所示:
dfs <- apply(bigdf, MARGIN = 1, as.data.frame)
names(dfs) <- str_sub(bigdf[,1], start = 1, end = 20)
list2env(dfs, envir = .GlobalEnv)
你可以在行中使用apply
函数和as.data.frame
:
mydfs <- apply(df, 1, as.data.frame)
mydfs将作为单个数据帧的行列表。请注意他们将被胁迫。