C#:从CSV创建2D数组并获得指定列的字数

问题描述 投票:1回答:1

我有一个看起来像这样的CSV文件:

,Location_Code,Location_Desc,Type_Code,Fault_type,Prod_Number,Model,Causer,Auditor,Prio,Capture_Date,Steer,Engine,Country,Current shift number,VIN,Comment,Shift,Year,Fault location C_Code,Fault location C_Desc,Fault type C_Code,Fault type C_Desc,Comment R,Baumuster Sales desc.,Baumuster Technical desc.,T24
0,09122,Engine,42,Poor fit,7117215,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0092,55SWF8DB7KU316971,,A,2019,,,,,,C 300,205 E20 G,
1,09122,Engine,42,Poor fit,7117235,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0122,55SWF8DB2KU316991,,A,2019,,,,,,C 300,205 E20 G,
2,09122,Transmission,42,Poor fit,7117237,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0126,55SWF8DB6KU316993,,A,2019,,,,,,C 300,205 E20 G,

我想编写代码,在标记所选列的单词(以字典样式键-值对)之后,获取所选列标题的单词计数。我还想使单词计数按值降序排列。例如。

Location_Desc

Engine: 2

Transmission: 1

这是我到目前为止的代码:

            int colNumber;
            for(colNumber=0; colNumber<columns.Length; colNumber++)
            {
                if ( columns[colNumber].Equals(columnHeader))
                {
                    break;
                }
            }

            Debug.WriteLine("Column Number: " + colNumber);
            for(int i=0; i<inputCsv.Length; i++)
            {
                string[] row = inputCsv[i].Split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
                string column = row[colNumber];
                Debug.WriteLine(row.ToString());
            }

我能够通过for循环获取列标题名称,但不仅不能忽略引号内的逗号,而且无法从列标题中获取值(在Python的Pandas中也称为Series) 。

非常感谢您的帮助!

c# string multidimensional-array tokenize word-count
1个回答
0
投票

我可能会将您的计数存储在Dictionary<string, Dictionary<string , long>>中,而不是2D数组中。然后,您可以轻松访问每个列。

使用CsvHelper NuGet包,我们可以创建一个类来为CSV文件建模。方法使用的文档可以在CsvHelper中找到。

here

然后我们可以使用public class CsvModel { [Name("Location_Code")] public long LocationCode { get; set; } [Name("Location_Desc")] public string LocationDesc { get; set; } [Name("Type_Code")] public long TypeCode { get; set; } [Name("Fault_type")] public string FaultType { get; set; } [Name("Prod_Number")] public long ProdNumber { get; set; } public string Model { get; set; } public string Causer { get; set; } public string Auditor { get; set; } public long Prio { get; set; } [Name("Capture_Date")] public DateTime CaptureDate { get; set; } public long Steer { get; set; } public long Engine { get; set; } public long Country { get; set; } [Name("Current shift number")] public string CurrentShiftNumber { get; set; } public string VIN { get; set; } public string Comment { get; set; } public string Shift { get; set; } public long Year { get; set; } [Name("Fault location C_Code")] public string FaultLocationCCode { get; set; } [Name("Fault location C_Desc")] public string FaultLocationCDesk { get; set; } [Name("Fault type C_Code")] public string FaultTypeCCode { get; set; } [Name("Fault type C_Desc")] public string FaultTypeCDesc { get; set; } [Name("Comment R")] public string CommentR { get; set; } [Name("Baumuster Sales desc.")] public string BaumusterSalesDesc { get; set; } [Name("Baumuster Technical desc.")] public string BaumusterTechnicalDesc { get; set; } public string T24 { get; set; } } 将记录读入IEnumerable<CsvMode>

GetRecords<T>

然后使用反射将列数计入var path = "C:\\data.csv"; using var reader = new StreamReader(path); using var csv = new CsvReader(reader, CultureInfo.InvariantCulture); var records = csv.GetRecords<CsvModel>();

Dictionary<string, Dictionary<string , long>>

然后,我们可以通过使用LINQ创建新字典来对列数进行降序排序:

var recordCounts = new Dictionary<string, Dictionary<string, long>>();

foreach (var record in records)
{
    var properties = record.GetType().GetProperties();

    foreach (var property in properties)
    {
        var propertyName = property.Name;
        if (!recordCounts.ContainsKey(propertyName))
        {
            recordCounts.Add(propertyName, new Dictionary<string, long>());
        }

        var propertyValue = property.GetValue(record, null);
        var propertyKey = propertyValue.ToString();
        if (propertyValue != null && !string.IsNullOrEmpty(propertyKey))
        {
            var count = recordCounts[propertyName].GetValueOrDefault(propertyKey, 0) + 1;
            recordCounts[propertyName][propertyKey] = count;
        }
    }
}

[使用var sortedRecordCounts = recordCounts .ToDictionary( kvp => kvp.Key, kvp => new SortedDictionary<string, long>( kvp.Value.OrderByDescending(kvp => kvp.Value) .ToDictionary( kvp => kvp.Key, kvp => kvp.Value))); 创建字典(内部+外部),并以Enumerable.ToDictionary降序对计数进行排序。我们也使用Enumerable.ToDictionary来保证字典的排序顺序,因为无法保证Enumerable.OrderByDescending的排序。

然后我们可以迭代此字典以显示记录计数,这还指示是否未找到有效(空或空)值:

Enumerable.OrderByDescending

输出:

OrderedDictionary
© www.soinside.com 2019 - 2024. All rights reserved.