使用指针查找文件中的标签

问题描述 投票:0回答:1

所以我对使用指针还很陌生,因此也许我正在尝试的甚至没有意义...?我正在做一个项目,要求我通过搜索标签的起点和终点(从文件中读取)来查找文件中的html标签。

[目前,我已将整个文件写入一维char数组,我必须在数组中进行搜索,找到严格使用指针的所有以标识符''开始和结束的标签。如果标识符之间有一个'/'或空格,则它不是标签,我们不需要保留它的引用。数组的设置方式可以确保初始文件的格式不变。

一旦找到标签,我们必须在数组中搜索匹配项并计算有多少相同标签。我被限制将找到的标签存储在数组中,但是我能够将指向每个标签的开始和结尾的指针存储在指针数组中。任何指导将不胜感激。这是我拥有的可能起作用的代码,但是我想对指针进行一些练习...

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    FILE *fptr;
    char maxLength[10] = "";// this is where each tag less than 10 characters long will be held
    char htags[100][10]; // Array that will be used to store each tag found
    int tagCounts[100]; // Array that will count the amount of times a tag shows up
    for (int i = 0; i < 100; i++)
        tagCounts[i] = 0;
    int n = 0;
    char *filename = argv[1]; // Input file to be read
    fptr = fopen(filename, "r"); // Read through the file
    if (fptr == NULL)
    {
        printf("File not found...");
        return 0;
    }
int c;
    if (fptr != NULL)
    {
        while (c = fgetc(fptr) != NULL)// making sure we are reading until the very end of the file
        {
            char c = fgetc(fptr); // reading character by character

            if (c == '<' && !feof(fptr))// searching for tags that start with '<'
            {
                char ch = fgetc(fptr);
                if (ch != '/') // checking to see if it is actually a tag
                {
                    int i = 0;
                    strcpy(maxLength, ""); // this is what we will use to store the tags when we find one
                    while (ch != 10 && ch != '>' && ch != ' ')
                    {
                        maxLength[i] = ch; // adds the next character to str[] until
                        // it gets to the end of the tag
                        ch = fgetc(fptr);
                        i++; // increment i so that the characters don't overlap in the array
                    }

                    maxLength[i] = '\0'; // Used to indicate the termination of the character string for a tag

                    int number = 1;

                    for (int i = 0; i < n; i++)// Checking to see if the tag has already
                                               // been found before in the file
                    {
                        if (strcmp(htags[i], maxLength) == 0) // tag has been found before
                        {
                            tagCounts[i]++; // increase the count by 1
                            number = 0;
                            break;
                        }
                    }

                    if (number == 1)
                    {
                        strcpy(htags[n], maxLength); // store the newly found tag in the array
                        tagCounts[n]++;
                        n++; // A new tag has been found, so we must create another element in the array and subsequently increase it's count
                    }
                }
            }
        }
    }
    for (int i = 0; i < n; i++)
    {
        printf("Tag-> %s -> Appeared %d time(s)\n", htags[i], tagCounts[i]);
    }
}

上面是程序应该做的(获取一个文本文件并计算标签),但是现在我必须严格使用指针来引用数组中的元素,获取之间的字符串,然后计算相似的元素...

我为以下程序初始化的数组是:

char buffer[100000]; 
char *sTags[100], *eTags[100];

这是我能想到的,但是我对材料的理解只是碰壁。

    char *contents = buffer;    

    if(buffer != NULL){
       char *c = contents;

        if(*c == '<'){
            char *ch = contents;

            if(*ch != '/'){
                while(*ch != '\0' && *ch != '>' && *ch != ' '){
                    contents++; //nothing is telling us it isn't a tag, so find the end
                    //found the tag, so get the location right before it?
                    if(*ch == '>'){
                        *sTags = contents--;
                    }
                    *sTags = '\0';
                }
                //This is where I am confused.
                //How would I go about comparing the string to other strings
                //throughout the file if it is not stored somewhere..?
            }
        }
    }
    printf("%s", sTags);
c arrays pointers
1个回答
0
投票

要考虑的两件事。

1)isspace(c)

您可能要使用isspace(c)而不是c ==''。https://www.techonthenet.com/c_language/standard_library_functions/ctype_h/isspace.php

因为似乎(c =='')检查将允许标签越过行尾。例如,您的代码将对这样的四行文件做什么?

<alpha>
<be
ta>
<gamma>

是否会找到类似“ ”,“ ”和“ ”的标记?嗯...这就是您要检查的(c == 10)吗?

2)strncmp()

我不完全理解不存储标签副本的仅指针要求。但是strncmp()可能有所帮助。https://www.tutorialspoint.com/c_standard_library/c_function_strncmp.htm如果您有标签的开始和结束指针,则长度为(end-beg + 1)。

因此,在您的第一种冗长的main()方法中,您遍历已知标签(如果有)以检查是否匹配。

基于指针的指针可以遍历您已知的标记,而您只需要调用strncmp()如果候选标签与已知标签的长度相同。

最后

最后,用于printf()调试的奖励积分,我发现这些在我的代码中超级有用。不要犹豫,将其打印出来。

如果您还没有用于调试的小型测试文件,我鼓励您制作一个。

祝你好运!

© www.soinside.com 2019 - 2024. All rights reserved.