为什么awk中的sort弄乱了我数组中的记录之一?

问题描述 投票:2回答:2

我正在做家庭作业,这是我第一次使用awk。我正在尝试以降序对数组进行排序,而我似乎已经做到了……主要是。输出应该显示数据库文件中某人的姓名,职位和销售总额。它可以很好地进行排序,但是当我使用asort时,其中一个人的名字和位置(戴维·琼斯,其记录应该在中间)被删除,格式被弄乱了。这是我的代码:

BEGIN {
    printf("%4s  %22s  %15s\n", "Name", "Position", "Sales Amount");
    printf("=============================================\n");
    FS = ":";
}

/^[0-9]*:[a-z]*:[A-Z || a-z || -]*:[0-9]*\.[0-9]*$/ {
    productPRICEar[pprice_key++] = $4;
}

/^[0-9]*:[A-Z || a-z]*:[A-Z || a-z]*$/ {
    associateNUMar[anum_key++] = $1;
    associateNAMEar[aname_key++] = $2;
    associatePOSar[apos_key++] = $3;
}

/^[0-9]*:[0-9]*:[0-9]*:[0-9]*\/[0-9]*\/[0-9]*:[0-9]*$/ {
    transactionIDar[tID_key++] = $1;
    productIDar[pID_key++] = $2;
    quantityar[quant_key++] = $3;
    associateIDar[aID_key++] = $5;
}

END {
    # Create an empty array value for each associate
    for (key in associateNUMar) {
        associate_total[key] = 0; # Stores the total sales made by the associate
    }

    # For each transaction
    for(transaction in transactionIDar) {
        # Declare variables
        belongs_to = associateIDar[transaction]; # Who the transaction belongs to
        product_id = productIDar[transaction]; # ID of the product sold in the transaction
        quantity_sold = quantityar[transaction]; # Quantity of the product sold in the transaction
        transaction_total = productPRICEar[product_id-1] * quantity_sold; # Total revenue from the transaction.

        # For each associate
        for (associate in associateNUMar) {
            # If this is the associate the current transaction belongs to
            if (associateNUMar[associate] == belongs_to) {
                current_total = associate_total[associate]; # Get the associate's current sales total
                associate_total[associate] = current_total + transaction_total; # Add the transaction total to the associate's sales total
            }
        }
    }
    print "\nUnsorted\n=============================================";
    # For each associate's sales total
    for(key2 in associate_total) {
        # Retrieve the associate's records
        associate_name = associateNAMEar[key2]; # Associate's name
        associate_position = associatePOSar[key2]; # Associate's position
        associate_salestotal = associate_total[key2]; # Associate's sales total

        printf("%-18s  %-13s  %10.2f\n", associate_name, associate_position, associate_salestotal);
    }


    n = asort(associate_total);
    print "\nSorted\n=============================================";
    # For each associate's sales total
    for (key2=n; key2>=1; key2--) {
        # Retrieve the associate's records
        associate_name = associateNAMEar[key2]; # Associate's name
        associate_position = associatePOSar[key2]; # Associate's position
        associate_salestotal = associate_total[key2]; # Associate's sales total

        printf("%-18s  %-13s  %10.2f\n", associate_name, associate_position, associate_salestotal);
    }
}

这是数据库:

1:software:Word Processor:55.00
2:software:Bad Wolf Video Game:19.99
3:software:Return to Gallifrey Video Game:59.99
4:vehicle:TARDIS:999999.99
5:hardware:sonic screwdriver:9999.99
6:merchandise:company t-shirt:20.00

1:Davy Jones:Security
2:Ricky Davis:Developer
3:Samantha Smith:Salesperson
4:Matt Smith:Doctor
5:David Tennant:Doctor
6:Buckminster Fuller:Engineer
7:Clara Oswald:Nurse
8:Amelia Pond:Nurse

1:1:1:01/02/2015:2
2:2:1:02/04/2017:2
3:3:1:03/06/2018:5
4:4:1:11/05/2018:5
5:1:1:01/12/2018:2
6:2:2:02/11/2018:2
7:3:1:05/13/2018:6
8:6:3:06/24/2018:1
9:5:1:02/02/2016:5
10:1:1:05/01/2017:5
11:2:1:11/05/2018:5
12:3:1:12/06/2018:5
13:2:1:02/12/2018:5
14:1:1:10/16/2018:5
15:6:4:05/18/2018:3
16:5:1:06/28/2018:6
17:1:1:07/30/2018:5
18:2:1:08/04/2018:7
19:3:1:09/07/2018:5
20:6:1:10/17/2018:4
21:6:1:10/17/2018:8
22:2:1:08/04/2018:7
23:3:1:09/07/2018:5

以下是我的输出。它显示了未排序和已排序的版本。排序后的版本显然被弄乱了。

Name                Position     Sales Amount
=============================================

Unsorted
=============================================
Davy Jones          Security            60.00
Ricky Davis         Developer          169.97
Samantha Smith      Salesperson         80.00
Matt Smith          Doctor              20.00
David Tennant       Doctor         1010444.92
Buckminster Fuller  Engineer         10059.98
Clara Oswald        Nurse               39.98
Amelia Pond         Nurse               20.00

Sorted
=============================================
                                   1010444.92
Amelia Pond         Nurse            10059.98
Clara Oswald        Nurse              169.97
Buckminster Fuller  Engineer            80.00
David Tennant       Doctor              60.00
Matt Smith          Doctor              39.98
Samantha Smith      Salesperson         20.00
Ricky Davis         Developer           20.00

这是程序的逻辑:

1. Products - each product record has the following fields
    1. Product id - an integer uniquely identifying a product
    2. Product category - a string describing the category of the product
    3. Description - a string describing the product
    4. Price - floating point number with 2 significant digits - how much does this product cost?

2. Associates - each record for an associate will have the following fields:
    1. Associate id - an integer uniquely idenfitying the associate
    2. Name - a string containing the name of the associate
    3. Position - a string describing the job position of the associate

3. Sales - each record for a sale will have the following fields
    1. Transaction id - integer uniquely identifying the transaction
    2. Product id - the product id of the product sold in this transaction
    3. Quantity - integer quantifying how many of the specified product were sold
    4. Date - date of the transaction in the format mm/dd/yyyy
    5. Associate id - the associate id of the associate that made this sale

我不确定为什么asort会这样做,而且我完全陷入困境。请让我知道问题出在哪里,以便我解决。

awk gnu
2个回答
0
投票

[请您试一下。由于您没有显示完整的代码或示例Input_file的完整代码,因此这仅是代码段,因此必须将其放置在awk代码中。同样重要的一点是它需要GNU awk

PROCINFO["sorted_in"] = "@val_num_asc"
print "\nSorted\n=============================================";
# For each associate's sales total
for (key2 in associateNAMEar) {
# Retrieve the associate's records
  associate_name = associateNAMEar[key2]; # Associate's name
  associate_position = associatePOSar[key2]; # Associate's position
  associate_salestotal = associate_total[key2]; # Associate's sales total
  printf("%-18s  %-13s  %10.2f\n", associate_name, associate_position, associate_salestotal);
}

对于降序,请在上面的代码中使用@val_num_desc



[示例运行:以下是运行示例,其中显示了示例Input_file。请注意,在使用split时,我已使用定界符作为,来分隔名为a

的数组的值
awk 'BEGIN{
    split("Matt Smith,Davy Jones,Clara Oswald ",a,",")

    print "ascending:"
    PROCINFO["sorted_in"]="@val_str_asc"
    for (i in a) print a[i]

    print "\ndescending:"
    PROCINFO["sorted_in"]="@val_str_desc"
    for (i in a) print a[i]
}'

ascending:
Clara Oswald
Davy Jones
Matt Smith

descending:
Matt Smith
Davy Jones
Clara Oswald


[C0的示例:

asort

0
投票

首先,awk 'BEGIN{ split("Matt Smith,Davy Jones,Clara Oswald ",a,",") print "ascending:" num=asort(a); for (i=1;i<=num;i++) print a[i] print "\ndescending:" for (i=num;i>=1;i--) print a[i] }' 函数重写数组的索引。例如,如果我们有一个像这样的数组:

asort

然后排序后看起来像这样:

a["foo"] = "world"
a[9] = "there"
a[3.5] = "hello"

注意,我们的原始索引已被破坏。 a[1] = "hello" a[2] = "there" a[3] = "world" 不再位于键"world"之下,依此类推。

您的第一个循环迭代"foo"中的原始键。然后将它们用associate_total覆盖,取而代之的是从1到n的自然数。现在,第二个循环从数字上遍历这些自然数。似乎原始键不包含从1到n的自然数集。

例如,如果原始键从0到n-1,则意味着我们不再可以访问项目0。新的asort数组包含值n,该值不映射到项目中的任何项目。其他数组。

因此,我们可以使用此程序重现与您类似的问题:

associate_total

输出:

BEGIN {
   for (i = 0; i < 3; i++)
     key[i];

   color[0] = "red"
   color[1] = "green"
   color[2] = "blue"


   for (i in key)
     print i, color[i]

   print "---"

   asort(key)

   for (i in key)
     print i, color[i]
}

糟糕,$ awk -f asort.awk 0 red 1 green 2 blue --- 1 green 2 blue 3 red之后?

我的程序在输出中包含键值,并且突出显示了问题的根源;显然,密钥集已更改。您可能还想在程序中打印密钥,直到调试为止。

© www.soinside.com 2019 - 2024. All rights reserved.