Powershell或Python3-CSV文件:根据一列中的重复项删除行,而另一列中基于IF ELSE的条件

问题描述 投票:0回答:1

所以我的编码有些虚弱,并且对powershell和python都有一定的经验,所以我对其中的任何一种都持开放态度。

这可能很难描述,所以我创建了一个虚假的数据集,希望可以使其更加清晰。

我想做的是根据名称对目录中每个CSV的行进行重复数据删除,但是顺序如下:如果NARRATIVE =“ CAUGHT”,我想保留该行其他如果NARRATIVE包含一个URL,我想保留该行其他如果这两个都不正确,我想保留最后/最下面的条目。

我感觉我在Powershell中最接近,所以我将使用此示例,但是如果您可以在python中解决此问题,我也完全可以接受。我在哪里失败?

gci -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Foreach-Object {Select-Object where $_.NARRATIVE -Contains "Caught"} | export-csv test1.csv -NoTypeInformation

主数据集:

SITE,DATE,URL,SITE2,NAME,NARRATIVE
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME1,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME2,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME3,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME4,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME5,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME6,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME7,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME1,only visited http://thisismyhouse.com once
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME2,NAME2 did some stuff and here's how/why
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME5,NAME5 just sat there
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME3,NAME3 was really important right here
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME6,NAME6 fell down and couldn’t get up
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME3,NAME3 was MOST important right here
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME8,NAME8 Dropped the beat
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME9,After the game NAME9 went home
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME4,"while NAME4 was at the store, they found a grape"
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME7,NAME7 got hit in the head
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME9,NAME9 spends a lot of time on http://dungeondepths.com
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME1,On Friday the 13th NAME1 got a tattoo
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME4,For dinner NAME4 ordered pizza
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME8,NAME8 Fired the Bass Cannon
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME9,NAME9 is rebooting
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME6 ,NAME6 broke their leg
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME8,NAME8 Put the needle on the record

所需结果:

SITE,DATE,URL,SITE2,NAME,NARRATIVE
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME1,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME2,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME3,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME4,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME5,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME6,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME7,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME9,NAME9 spends a lot of time on http://dungeondepths.com
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME8,NAME8 Put the needle on the record
python-3.x powershell csv duplicates multiple-conditions
1个回答
1
投票

现在我已经完全理解了,尝试一下(还有一些假设):

© www.soinside.com 2019 - 2024. All rights reserved.