如何将data.frame的行分成单独的data.frames

问题描述 投票:0回答:2

我有一个data.frame,它有一堆行,带有唯一的ID,后跟一个氨基酸序列。我想知道是否有办法将行分成单独的唯一data.frame。

这是一个例子

bigdf

>ENSCAFP00000018847.4  
FGHFGHFGHFGHFHFGHFGHFGHFGHFHFGHFGHFHFGHFGHFHFHFHFGTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.3  
VCXVNSFRERYTRIOUHFSDAADSSAASAAAAGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.2  
ASDASDADASDASDASDASDASSADASASRPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK
>ENSCAFP00000018847.1  
QWEQWEQWEQWEWQREWRQWEQWRQRQQRERPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK

如果我可以将新data.frames的名称作为他们的ID,那将是很好的,所以希望结果看起来像这样

ENSCAFP00000018847.4

>ENSCAFP00000018847.4  
FGHFGHFGHFGHFHFGHFGHFGHFGHFHFGHFGHFHFGHFGHFHFHFHFGTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK

ENSCAFP00000018847.3

>ENSCAFP00000018847.3  
VCXVNSFRERYTRIOUHFSDAADSSAASAAAAGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK

ENSCAFP00000018847.2

>ENSCAFP00000018847.2  
ASDASDADASDASDASDASDASSADASASRPGPVVTANHVEEPAMTPGVRTNSEGAFQTA
DLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK

ENSCAFP00000018847.1

>ENSCAFP00000018847.1 QWEQWEQWEQWEWQREWRQWEQWRQRQQRERPGPVVTANHVEEPAMTPGVRTNSEGAFQTADLLETSVPSHMPLETQTLSPQTFDWTLILANSNSEAETRDTKTTFPAMEGRAFTKMTPSK

我知道这应该是一件奇怪的事情,但是需要为成千上万种不同的氨基酸序列做这件事,所以如果我能找到一种方法将它们全部分解为R将会很酷

dput(df[1:3, c(1)])
c("> ENSCAFP00000018847.4 MFFINIISLIIPILLAVAFLTLVERKVLGYMQLRKGPNIVGPYGLLQPIADAVKLFTKEPLRPLTSSMSMFILAPILALSLALTMWIPLPMPYPLINMNLGVLFMLAMSSLAVYSILWSGWASNSKYALIGALRAVAQTISYEVTLAIILLSVLLMNGSFTLSTLIITQEHMWLIFPAWPLAMMWFISTLAETNRAPFDLTEGESELVSGFNVEYAAGPFALFFLAEYANIIMMNILTTILFFGAFHNPFMPELYSINFTMKTLLLTICFLWIRASYPRFRYDQLMHLLWKNFLPLTLALCMWHVALPIITASIPPQT", 
"> ENSCAFP00000018847.3 MKPPILIIIMATIMTGTMIVMLSSHWLLIWIGFEMNMLAIIPILMKKYNPRAMEASTKYFLTQATASMLLMMGVTINLLYSGQWVISKISNPIASIMMTTALTMKLGLSPFHFWVPEVTQGITLMSGMILLTWQKIAPMSILYQISPSINTNLLMLMALTSVLVGGWGGLNQTQLRKIMAYSSIAHMGWMAAIITYNPTMMVLNLTLYILMTLSTFMLFMLNSSTTTLSLSHMWNKFPLITSMILILMLSLGGLPPLSGFIPKWMIIQELTKNNMIIIPTLMAITALLNLYFYLRLTYSTALTMFPSTNNMKMKWQFEYTKKATLLPPLIITSTMLLPLTPMLSVLD", 
"> ENSCAFP00000018847.2 MFINRWLFSTNHKDIGTLYLLFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSFLLLLASSMVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGNIKWSPAMLWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFAHWFPLFSGYTLNDTWAKIHFTIMFVGVNMTFFPQHFLGLSGMPRRYSDYPDAYTTWNTVSSMGSFISLTAVMLMIFMIWEAFASKREVAMVELTTTNIEWLHGCPPPYHTFEEPTYVIQK"
)
r
2个回答
1
投票

您可以将所有行放在命名的数据框列表中,然后使用list2env()将它们放在全局环境中,如下所示:

dfs <- apply(bigdf, MARGIN = 1, as.data.frame) names(dfs) <- str_sub(bigdf[,1], start = 1, end = 20) list2env(dfs, envir = .GlobalEnv)


1
投票

你可以在行中使用apply函数和as.data.frame

mydfs <- apply(df, 1, as.data.frame)

mydfs将作为单个数据帧的行列表。请注意他们将被胁迫。

© www.soinside.com 2019 - 2024. All rights reserved.