C中的Anagram测试仪

Question

我正在尝试在C中实现anagram测试器。在调用程序时，用户输入双引号中的两个单词，如“listen”和“silent”。我几乎得到它的工作，但我有一个帮助函数我写的一些麻烦，我写了两个输入词中的空格。以下是此功能的代码：

void noSpaces(char word[100]) {
    /*
    This is a function to get rid of spaces in a word
    It does this by scanning for a space and shifting the
    array elements at indices > where the space is
    down by 1 as long as there is still a space
    there. 
    */
    for (int i = 0; i < 100; i++) {
        while (word[i] == ' ') {
            for (int j = i; j < 100; j++) {
                word[j] = word[j+1];
            }
        }
    }
}

现在，当我将输入单词从main函数传递给此帮助程序时，它工作正常。问题是第二次调用此函数。当我在第二个输入上调用此函数时，如果k是第一个输入中的空格数，则该函数将删除第二个输入的第一个k字母。例如，键入./anagram " banana" "banana"会给我一个假阴性，如果我添加一个print语句来查看在调用noSpaces之后输入发生了什么，我会得到以下结果：

banana
anana

以下是完整程序的代码：

#include <stdio.h>

int main(int argc, char *argv[]) {
    //this if statement checks for empty entry
    if (isEmpty(argv[1]) == 0 || isEmpty(argv[2]) == 0) {
        //puts("one of these strings is empty");
        return 1;
    }
    //call to noSpaces to eliminate spaces in each word
    noSpaces(argv[1]);
    noSpaces(argv[2]);
    //call to sortWords
    sortWords(argv[1]);
    sortWords(argv[2]);
    int result = compare(argv[1], argv[2]);
    /*
    if (result == 1) {
        puts("Not anagrams");
    } else {
        puts("Anagrams");
    }
    */
    return result;
}

int compare(char word1[100], char word2[100]) {
    /*
    This is a function that accepts two sorted 
    char arrays (see 'sortWords' below) and
    returns 1 if it finds a different character
    at entry i in either array, or 0 if at no 
    index the arrays have a different character.
    */
    int counter = 0;
    while (word1[counter] != '\0' && word2[counter] != '\0') {
        if (word1[counter] != word2[counter]) {
            //printf("not anagrams\n");
            return 1;
        }
        counter++;
    }
    // printf("anagrams\n");
    return 0;
}

void sortWords(char word[100]) {
    /*
    This is a function to sort the input char arrays
    it's a simple bubble sort on the array elements.
    'sortWords' function accepts a char array and returns void,
    sorting the entries in alphabetical order
    being careful about ignoring the 'special character'
    '\0'.
    */
    for (int j = 0; j < 100; j++) {
        int i = 0;
        while (word[i + 1] != '\0') {
            if (word[i] > word[i + 1]) {
                char dummy = word[i + 1];
                word[i + 1] = word[i];
                word[i] = dummy;
            }
            i++;
        }
    }
}

void noSpaces(char word[100]) {
    /*
    This is a function to get rid of spaces in a word
    It does this by scanning for a space and shifting the
    array elements at indices > where the space is
    down by 1 as long as there is still a space there. 
    */
    for (int i = 0; i < 100; i++) {
        while (word[i] == ' ') {
            for (int j = i; j < 100; j++) {
                word[j] = word[j + 1];
            }
        }
    }
}

int isEmpty(char word[100]) {
    // if a word consists of the empty character, it's empty
    //otherwise, it isn't
    if (word[0] == '\0') {
        return 0;
    }
    return 1;
}

我知道有一个库可以处理字符串，但我真的想避免使用它。我已经走到这一步而不需要它了，而且我觉得这个问题基本上已经解决了，但是对于一件我看不到的小事。

我来自java背景，我是C的新手，如果这解释了我所做的任何错误。

Answer 1

在C中，字符串是带有空终止符的char数组，即值为0的字节通常表示为'\0'。你不应该假设任何特定的长度，如100。实际上，编译器会忽略函数原型参数中的数组大小。您可以通过扫描null终止符来确定长度，这是strlen()高效执行的操作，或者您可以编写代码以避免多次扫描，在null终止符处停止。您应该确保您的函数适用于空字符串，这是一个具有单个空字节的数组。以下是代码中的问题：

在函数noSpaces中，迭代超出字符串的末尾，修改可能属于下一个字符串的内存。该程序具有未定义的行为。

你应该停在字符串的末尾。还使用2个索引变量以线性时间执行：

void noSpaces(char word[]) {
    /*
    This is a function to get rid of spaces in a word
    It does this by scanning for a space and shifting the
    array elements at indices > where the space is
    down by 1 as long as there is still a space
    there. 
    */
    int i, j;
    for (i = j = 0; word[i] != '\0'; i++) {
        if (word[i] != ' ') {
            word[j++] = word[i];
        }
    }
    word[j] = '\0';
}

您可以简化compare以平均使用第三个测试：

int compare(const char word1[], const char word2[]) {
    /*
    This is a function that accepts two sorted 
    char arrays (see 'sortWords' below) and
    returns 1 if it finds a different character
    at entry i in either array, or 0 if at no 
    index the arrays have a different character.
    */
    for (int i = 0; word1[i] == word2[i]; i++) {
        if (word1[i]) == '\0')
            //printf("anagrams\n");
            return 0;
        }
    }
    // printf("not anagrams\n");
    return 1;
}

sortWords对空字符串有未定义的行为，因为您在索引char处读取了1，超出了数组的末尾。这是一个更正版本：

void sortWords(char word[]) {
    /*
    This is a function to sort the input char arrays
    it's a simple bubble sort on the array elements.
    'sortWords' function accepts a char array and returns void,
    sorting the entries in alphabetical order
    being careful about ignoring the 'special character'
    '\0'.
    */
    for (int j = 0; word[j] != '\0'; j++) {
        for (int i = 1; i < j; i++) {
            if (word[i - 1] > word[i]) {
                char dummy = word[i - 1];
                word[i - 1] = word[i];
                word[i] = dummy;
            }
        }
    }
}

您应该在使用前声明函数，或者在使用前交替定义它们。您的代码编译是因为编译器接受旧样式C，其中从第一个调用站点传递的参数推断出尚未看到函数的原型。这种做法容易出错且过时。

你的排序函数具有二次时间复杂度，对于很长的字符串来说这可能非常慢，但是单词不应该太大，所以这不是问题。

不修改参数字符串会更好。您可以使用具有相同时间复杂度的其中一个字符串的副本执行测试。

这是一个直接的方法：

#include <stdio.h>

int check_anagrams(const char word1[], const char word2[]) {
    /*
       This function accepts two strings and returns 1 if they
       are anagrams of one another, ignoring spaces.
       The strings are not modified.
     */
    int i, j, len1, letters1, letters2;

    /* compute the length and number of letters of word1 */
    for (len1 = letters1 = 0; word1[len1] != '\0'; len1++) {
        if (word1[len1] != ' ')
            letters1++;
    }

    /* create a copy of word1 in automatic storage */
    char copy[len1];    /* this is an array, not a string */
    for (i = 0; i < len1; i++)
        copy[i] = word1[i];

    for (j = letters2 = 0; word2[j] != '\0'; j++) {
        char temp = word2[j];
        if (temp != ' ') {
            letters2++;
            for (i = 0; i < len1; i++) {
                if (copy[i] == temp) {
                    copy[i] = '\0';
                    break;
                }
            }
            if (i == len1) {
                /* letter was not found */
                return 0;
            }
        }
    }
    if (letters1 != letters2)
        return 0;
    return 1;
}

int main(int argc, char *argv[]) {
    const char *s1 = " listen";
    const char *s2 = "silent   ";
    if (argc >= 3) {
        s1 = argv[1];
        s2 = argv[2];
    }
    int result = check_anagrams(s1, s2);
    if (result == 0) {
        printf("\"%s\" and \"%s\" are not anagrams\n", s1, s2);
    } else {
        printf("\"%s\" and \"%s\" are anagrams\n", s1, s2);
    }
    return result;
}

Answer 2

您在辅助函数中出现了逻辑错误。你开始从word[j]开始复制，而不是在第二个单词的开头复制，所以你要删除与前导空格一样多的前导字符，就像你在输出中看到的那样。

请注意，j=i和i计算外循环中前导空格的数量。

顺便说一句，你应该只有两个循环。把while条件放在第一个for循环中，如下所示：for (int i = 0; i<100 && word[i]==' '; i++)。

要修复逻辑错误，需要在最里面的循环中使用另一个初始化为零的迭代器k，并使用word[k] = word[j+1]。我认为那会奏效。

Answer 3

如果argv [1]缓冲区长度小于100，则argv [1]和argv [2]上的缓冲区溢出有问题。所以我认为你应该使用带有strlen（word）的for循环就足够了。当您使用100 in for循环的静态长度时，该单词将从另一个内存位置获取数据并使您的程序处于未定义的行为。其他功能也有同样的问题。我的意思是sortWords和比较函数。

这是我在你的noSpaces函数中的修改，它应该工作。

void noSpaces(char word [100]){
    /*
    This is a function to get rid of spaces in a word
    It does this by scanning for a space and shifting the
    array elements at indices > where the space is
    down by 1 as long as there is still a space
    there.
    */
    for(int i =0; i<strlen(word)-1; i++){
        while(word[i]==' '){
            for(int j = i ; j<strlen(word); j++){
                word[j] = word [j+1];
            }
        }
    }
}

Answer 4

而不是试图删除空格和排序，这是O（N lg N）运行时间。您可以通过构建一个表示单词中每个字母计数的数组来执行O（N）操作。并且在执行此操作时忽略空格。

// Iterate over each character in the string
// For each char in string, increment the count of that character
// in the lettercount array.
// Return the number of unfiltered letters that were counted
int fillLetterCountTable(const char* string, int* lettercount)
{
    int len = strlen(string);
    int valid = 0;

    for (int i = 0; i < len; i++)
    {
       unsigned char index = (unsigned char)(string1[i]);
       if (index ==  ' ')  // ignore spaces
       {
           continue;
       }
       counts[index] += 1;
       valid++;
    }

    return valid;
}

// compare if two strings are anagrams of each other
// return true if string1 and string2 are anagrams, false otherwise
bool compare(const char* string1, const char* string2)
{
    int lettercount1[256] = {0};
    int lettercount2[256] = {0};

    int valid1 = fillLetterCountTable(string1, lettercount1);
    int valid2 = fillLetterCountTable(string2, lettercount2);

    if (valid1 != valid2)
        return false;

    // memcmp(lettercount1, lettercount2, sizeof(lettercount1));
    for (int i = 0; i < 256; i++)
    {
        if (counts1[i] != counts2[i])
            return false;
    }
    return true;
}

C中的Anagram测试仪

问题描述投票：2回答：4

4个回答

最新问题

C中的Anagram测试仪

问题描述 投票：2回答：4

4个回答

最新问题

问题描述投票：2回答：4