如何在二维数组中使用LINQ？

Question

我有一个2维字节数组，看起来像这样。

0 0 0 0 1

1 1 1 1 0

0 0 1 1 1

1 0 1 0 1

数组中的每个值只能是0或1。上面的简化例子显示了4行，每行有5列。我试图弄清楚如何使用LINQ来返回具有最大数量的1集的行的索引，在上面的例子中应该返回1。

下面的非LINQ的C#代码解决了这个问题。

static int GetMaxIndex(byte[,] TwoDArray)
{
   // This method finds the row with the greatest number of 1s set.
   //
   int NumRows = TwoDArray.GetLength(0);
   int NumCols = TwoDArray.GetLength(1);
   int RowCount, MaxRowCount = 0, MaxRowIndex = 0;
   //
   for (int LoopR = 0; LoopR < NumRows; LoopR++)
   {
      RowCount = 0;
      for (int LoopC = 0; LoopC < NumCols; LoopC++)
      {
         if (TwoDArray[LoopR, LoopC] != 0)
            RowCount++;
      }
      if (RowCount > MaxRowCount)
      {
         MaxRowCount = RowCount;
         MaxRowIndex = LoopR;
      }
   }
   return MaxRowIndex;
}

static void Main()
{
   byte[,] Array2D = new byte[4, 5] { { 0, 0, 0, 0, 1 }, { 1, 1, 1, 1, 0 }, { 0, 0, 1, 1, 1 }, { 1, 0, 1, 0, 1 } };
   int MaxInd = GetMaxIndex(Array2D);
   Console.WriteLine("MaxInd = {0}", MaxInd);
}

所以，我的问题是：

如何使用LINQ来解决这个问题？在这里使用LINQ会不会比使用上面的非LINQ代码效率低？
是否可以用PLINQ来解决这个问题？或者，直接使用任务并行库（TPL）来处理上面的代码，并将每行的1数的统计拆分出来给一个单独的线程，假设每行至少有1000列，会不会效率更高？

Answer 1

用LINQ处理多维数组是很难的，但你可以这样做。

var arr = new [,] { { 0, 0, 0, 0, 1 }, { 1, 1, 1, 1, 0 }, { 0, 0, 1, 1, 1 }, { 1, 0, 1, 0, 1 } };

var data =
    Enumerable.Range(0, 4)
        .Select(
            row =>
                new
                {
                    index = row,
                    count = Enumerable.Range(0, 5).Select(col => arr[row, col]).Count(x => x == 1)
                })
        .OrderByDescending(x => x.count)
        .Select(x => x.index)
        .First();

Answer 2

我是这样做的这和别人多少有些相同，但没有任何的 Enumerable.Range (并不是说这些有什么问题(我一直都在用)......只是在这种情况下会让代码更加缩进)。

这个还包括PLINQ的东西。TPL（asyncawait）就不适合这个，因为它是计算绑定的，而TPL更适合IO绑定的操作。如果你使用asyncawait而不是PLINQ，你的代码最终会按顺序执行。这是因为asyncawait在线程被释放之前不会并行（它可以启动下一个任务......然后可以并行），而纯同步函数（比如CPU的东西）不会每次都实际等待......它们只会一直运行下去。基本上，它会在开始下一件事之前就完成你列表中的第一件事，使得它按顺序执行。PLINQ显式启动并行任务，不存在这个问题。

//arry is your 2d byte array (byte[,] arry)
var maxIndex = arry
    .Cast<byte>() //cast the entire array into bytes
    .AsParallel() //make the transition to PLINQ (remove this to not use it)
    .Select((b, i) => new // create indexes
        {
            value = b,
            index = i
        })
    .GroupBy(g => g.index / arry.GetLength(1)) // group it by rows
    .Select((g, i) => new
        {
            sum = g.Select(g2 => (int)g2.value).Sum(), //sum each row
            index = i
        })
    .OrderByDescending(g => g.sum) //max by sum
    .Select(g => g.index) //grab the index
    .First(); //this should be the highest index

在效率方面，你的for循环可能会得到更好的结果。我想问的问题是，哪种方式更易读，更清晰？

Answer 3

1）你可以用LINQ这种方式来做....

private static int GetMaxIndex(byte[,] TwoDArray) {
    return Enumerable.Range(0, TwoDArray.GetLength(0))
                     .Select(
                         x => new {
                             Index = x,
                             Count = Enumerable.Range(0, TwoDArray.GetLength(1)).Count(y => TwoDArray[x, y] == 1)
                         })
                     .OrderByDescending(x => x.Count)
                     .First()
                     .Index;
}

.你得测试一下，看看LINQ是快还是慢。

2）可以用PLINQ。只要用 ParallelEnumerable.Range 的行索引生成器

private static int GetMaxIndex2(byte[,] TwoDArray) {
    return ParallelEnumerable.Range(0, TwoDArray.GetLength(0))
                             .Select(
                                 x => new {
                                     Index = x,
                                     Count = Enumerable.Range(0, TwoDArray.GetLength(1)).Count(y => TwoDArray[x, y] == 1)
                                 })
                             .OrderByDescending(x => x.Count)
                             .First()
                             .Index;
}

Answer 4

// This code is extracted from
// http://www.codeproject.com/Articles/170662/Using-LINQ-and-Extension-Methods-in-C-to-Sort-Vect
private static IEnumerable<T[]> ConvertToSingleDimension<T>(T[,] source)
{
    T[] arRow;
    for (int row = 0; row < source.GetLength(0); ++row)
    {
        arRow = new T[source.GetLength(1)];
        for (int col = 0; col < source.GetLength(1); ++col)
            arRow[col] = source[row, col];
        yield return arRow;
    }
}


// Convert byte[,] to anonymous type {int index, IEnumerable<byte[]>} for linq operation
var result = (from item in ConvertToSingleDimension(Array2D).Select((i, index) => new {Values = i, Index = index})
             orderby item.Values.Sum(i => i) descending, item.Index
             select item.Index).FirstOrDefault();

Answer 5

从这个问题来看，这其实是一个两部分的答案，无论对你的代码来说是 "更高效 "的。呈现的循环已经是非常精简的资源了，但可以更明确的表达意图。

根据被移动的数据大小，即使是10倍，PLINQ也会更耗费资源，只是因为要旋转起一个线程，工作量很大。

1.) 使用LINQ可以让这个方法更易读懂

我遇到的大多数二维数组LINQ查询，在搜索之前都会把它转换成一个锯齿状的数组（或数组的数组）。这里有一个帮助方法为我们做这种转换，并帮助使这个家伙看起来更干净。

public static T[][] GetJagged<T>(this T[,] raw)
    {
        int lenX = raw.GetLength(0);
        int lenY = raw.GetLength(1);

        T[][] jagged = new T[lenX][];

        for (int x = 0; x < lenX; x++)
        {
            jagged[x] = new T[lenY];
            for (int y = 0; y < lenY; y++)
            {
                jagged[x][y] = raw[x, y];
            }
        }

        return jagged;
    }

现在，我们所要做的就是查询现在的1d数组中的每个成员，并返回每个成员的和。这里，我使用了选择器 (b => b)，本质上是说，如果有一个字节，就选择if为 Sum 的方法。

static int GetMaxIndexLINQ(byte[,] TwoDArray)
    {
        byte[][] jagged = TwoDArray.GetJagged();

        IEnumerable<int> rowSums = from bitRows in jagged
                                   select bitRows.Sum((b) => b);

        int maxIndex = rowSums.Max();
        int MaxRowIndex = Array.IndexOf(rowSums.ToArray(), maxIndex);
        return MaxRowIndex;
    }

这种方式出来的很可读性，即使读者是编码新手，也很容易理解这里发生的事情的要点。

我想指出的是，让你的代码更易读懂是使其更有效率。团队合作让梦想成真，队友能越快清楚地理解你的代码中发生了什么，对大家都有好处。

2.) 优化性能

就像我之前说的，这里并没有发生很多可以做得更精简的事情，任何方法调用或者不必要的检查都只会让这个过程变慢。

也就是说，有一个小的变化可以做一些简单的优化。因为在这个例子中，我们只处理1和0，有一个真正的好处，我们可以使用编译器所做的内部优化，对我们有利。与其检查一个值是否为0，不如把它加到我们的运行总和中去，这实际上要快得多。

static int GetMaxIndex_EvenBetter(byte[,] TwoDArray)
    {
        int NumRows = TwoDArray.GetLength(0);
        int NumCols = TwoDArray.GetLength(1);
        int RowCount, MaxRowCount = 0, MaxRowIndex = 0;

        for (int row = 0; row < NumRows; row++)
        {
            RowCount = 0;

            for (int col = 0; col < NumCols; col++)
            {
                RowCount += TwoDArray[row, col]; //See my change here
            }
            if (RowCount > MaxRowCount)
            {
                MaxRowCount = RowCount;
                MaxRowIndex = row;
            }
        }

        return MaxRowIndex;
    }

在其他大多数情况下，你不只是在处理1和0，所以你可以 DO 想在添加之前检查这些数值，这里却没有必要。

如何在二维数组中使用LINQ？

问题描述投票：3回答：5

5个回答

最新问题

如何在二维数组中使用LINQ？

问题描述 投票：3回答：5

5个回答

最新问题

问题描述投票：3回答：5