如何从一个单独的列表中删除一个特定的libsvm值的列表

问题描述 投票:0回答:1
def parseline(line):
    line = line.values.flatten().tolist() # flatten labeled point pandas dataframe to python list
    strLine1 = listToString(line) # custom function just converts list to string for regex operations.
    strLine2 = re.sub(r"^1:1 |2:\d+.\d+ ","",strLine1) # filter string to eliminate first two indices; python string
    splitLine = strLine2.replace("0    ", "").split(" ") # eliminate specific val; split on spaces; python list of strings

    positive = 0 # variable for presence/absence of something instantiated

    for feature in splitLine:
        featureIndex = feature.split(":")[0]
        featureValue = feature.split(":")[1]

        if featureIndex in toRemove: # toRemove is a list of vals to eliminate from each line; this works
            positive = 1 

        newLine = ""

        if positive == 1:
            newLine = [i for i in toRemove not in splitLine] # goal here is to remove values found in the toRemove from the newLine 
            newLine = "1" + " " + newLine
            print(newLine)
        else:
            newLine = "0" + " " + strLine2

        return newLine

这是我正在完成的一个项目的一些代码。我已经成功地生成了一个列表,其中包含了我不希望在每一行中包含的值。该列表被称为 "toRemove"。

条件语句 "if featureIndex in toRemove "是有效的,打印语句证实了这一点,在 "toRemove "中发现的每个 "featureIndex "旁边打印 "This index needs removing from final list"。

问题是第二个条件语句(if positive ==1,vs,else)从 "if positive ==1 "条件中返回一个列表,这个列表只是 "toRemove "的重复。而 "else "条件实际上返回的是正确的列表。

例如

'if positive == 1:' list output:
['20', '68', '112', '264', '384', '449', '454', '749', '839',...] #this is just a copy of the 'toRemove' list

'else:' list output:
0 3:0.0 4:1 12:1 36710:1 36725:1 36791:1 86715:1 98190:1

我最初试图把这个问题作为一个数据类型的问题来处理,因此在转换语句旁边有记账的注释。

我在这里到底错在哪里?

EDIT:通过'parseline'函数发送的输入文件有以下格式。

1:1 2:00 3:00 4:1 9:1 20:1 40:1... # say index 20 is one of the indices in 'toRemove'
1:1 2:10 3:00 45:1 85:1 99:1 100:1... # say none of the index vals in this line are in 'toRemove'

"parseline(line) "删除了索引1和2,然后通过 "toRemove "列表解析,从列表中删除项目,为原始输入文件中的每一行输出 "newLine "字符串。

对于同样的两个示例输入,'newLine'的输出应该是

1 3:00 4:1 9:1 40:1... #notice index 20 is gone, and its presence in the list is accounted for by the 1 

0 3:00 45:1 85:1 99:1 100:1... #notice since none of the indices in the original list were in the 'toRemove' list, 
python list-comprehension sparse-matrix libsvm
1个回答
0
投票

是一个数据类型的问题。问题已经解决了。谢谢大家。

© www.soinside.com 2019 - 2024. All rights reserved.