C word count from file

I need some help with a program that I am writing for my Systems Programming class. It is in C and I have very, very little experience with C. I need to merge three text file with the format:

word1
word2
word3
...
wordX

I am also to bring each of the words from all three files and put them into a 2D array (an array of string-arrays), then use some sort of sorting method on them.

I shouldn’t need help with the sorting, but I don’t know how to get the word count from each of the text files or put them into an array.


This is the function I have for counting the words in the file. It doesn’t compile on gcc (probably for obvious reasons, but I don’t know them). Do I even have the right idea?

int countWords(FILE f){
   int count = 0;
   char ch;
   while ((ch = fgetc(f)) != EOF){
       if (ch == 'n')
           count++;
       //return count; originally here, but shouldn't be.
   }
       return count;
}

EDIT: I supposed I could just find a way to count the lines in the program, but I’m not sure if the approach would be any different from what I am trying to do here. (I have never really been that good at working with text files.


I got it to count all of the lines in the program. I guess I’m a little rusty.


#include <stdlib.h>
#include <stdio.h>

int countWords(FILE *f){
   int count = 0;
   char ch;
   while ((ch = fgetc(f)) != EOF){
       if (ch == 'n')
           count++;
   }
   return count;
}
int main(void){

   int wordCount = 0;
   FILE *rFile = fopen("american0.txt", "r");
   wordCount += countWords(rFile);
   printf("%d", wordCount);
   return 0;
}

I kind of forgot about that the pointer thing with FILE *fileName

In this article, we are going to learn How to count words in a text file in C.We will read each word from the text file in each iteration and increment a counter to count words.

C fscanf function


The scanf function is available in the C library. This function is used to read formatted input from a stream. The syntax of fscanf function is:

Syntax

int fscanf(FILE *stream, const char *format, …)

Parameters

  • stream − This is the pointer to a FILE object that identifies the stream.
  • format − This is the C string that contains one or more of the following items − Whitespace character, Non-whitespace character, and Format specifiers. A format specifier will be as [=%[*][width][modifiers]type=].
  • How to Read text File word by word in C
  • How to Read File Line by Line in C
  • How to count words in a text file in C
  • How to get File Size timestamp access status in C
  • How to check if File Exists in C Language

1. How to count words in a text file in C using fscanf


In this c program, we are making use of the fscanf function to read the text file. The first thing we are going to do is open the file in reading mode. So using open() function and “r” read mode we opened the file.

  • The next step is to find the file stats like what is the size of the data this file contains. so we can allocate exact memory for the buffer that is going to hold the content of this file. We are using the stat() function to find the file size.
  • Once we have the size and buffer allocated for this size, we start reading the file by using the fscanf() function.
  • We keep reading the file word by word until we reach the end of file.In fscanf function, we are passing “%39[^-n] as the argument so we can read the text until we find the next word. The code will look like:
fscanf(in_file, "%39[^-n]", file_contents)
  • We are maintaining a variable to count the number of words in the file. On each word count, we increment the counter.

C Program to count words in a text file in C


To run this program, we need one text file with the name Readme.txt in the same folder where we have our code.The content of the file is:

Hello My name is 
John 
danny
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>

const char* filename = "Readme.txt";

int main(int argc, char *argv[])
{
	int wordcount = 0;
    FILE *in_file = fopen(filename, "r");
    if (!in_file) 
	{
        perror("fopen");
        return 0;
    }

    struct stat sb;
    if (stat(filename, &sb) == -1) 
	{
        perror("stat");
        return 0;
    }

    char *file_contents = malloc(sb.st_size);

    while (fscanf(in_file, "%[^-n ] ", file_contents) != EOF) 
	{
		wordcount++;
        printf("%sn", file_contents);
    }
	

	printf("The file have total words = %dn",wordcount);
	
    fclose(in_file);
    return 0;
}

Output

Hello
My
name
is
John
danny
The file have total words = 6

Write a C program to count number of characters, words and lines in a text file. Logic to count characters, words and lines in a file in C program. How to count total characters, words and lines in a text file in C programming.

Example

Source file

I love programming.
Working with files in C programming is fun.
I am learning C programming at Codeforwin.

Output

Total characters = 106
Total words      = 18
Total lines      = 3

Required knowledge

Basic Input Output, Pointers, String, File Handling

Step by step descriptive logic to count characters, words and lines in a text file.

  1. Open source file in r (read) mode.
  2. Initialize three variables characters = 0, words = 0 and lines = 0 to store counts.
  3. Read a character from file and store it to some variable say ch.
  4. Increment characters count.
    Increment words count if current character is whitespace character i.e. if (ch == ' ' || ch == 't' || ch == 'n' || ch == '').
    Increment lines count if current character is new line character i.e. if (ch == 'n' || ch == '').
  5. Repeat step 3-4 till file has reached end.
  6. Finally after file has reached end increment words and lines count by one if total characters > 0 to make sure you count last word and line.

Also read how to find total words in a string.

Program to count characters, words and lines in a file

/**
 * C program to count number of characters, words and lines in a text file.
 */

#include <stdio.h>
#include <stdlib.h>

int main()
{
    FILE * file;
    char path[100];

    char ch;
    int characters, words, lines;


    /* Input path of files to merge to third file */
    printf("Enter source file path: ");
    scanf("%s", path);

    /* Open source files in 'r' mode */
    file = fopen(path, "r");


    /* Check if file opened successfully */
    if (file == NULL)
    {
        printf("nUnable to open file.n");
        printf("Please check if file exists and you have read privilege.n");

        exit(EXIT_FAILURE);
    }

    /*
     * Logic to count characters, words and lines.
     */
    characters = words = lines = 0;
    while ((ch = fgetc(file)) != EOF)
    {
        characters++;

        /* Check new line */
        if (ch == 'n' || ch == '')
            lines++;

        /* Check words */
        if (ch == ' ' || ch == 't' || ch == 'n' || ch == '')
            words++;
    }

    /* Increment words and lines for last word */
    if (characters > 0)
    {
        words++;
        lines++;
    }

    /* Print file statistics */
    printf("n");
    printf("Total characters = %dn", characters);
    printf("Total words      = %dn", words);
    printf("Total lines      = %dn", lines);


    /* Close files to release resources */
    fclose(file);

    return 0;
}

Suppose if datafile3.txt contains

I love programming.
Working with files in C programming is fun.
I am learning C programming at Codeforwin.
Enter source file path: datafile3.txt

Total characters = 106
Total words      = 18
Total lines      = 3

Happy coding 😉

Yes this can be simplified to:

int main()
{
     std::ifstream   inputFile("Bob");
     std::unordered_map<std::string, int>  count;

     std::for_each(std::istream_iterator<std::string>(inputFile),
                   std::istream_iterator<std::string>(),
                   [&count](std::string const& word){++count[word];});
}

Why this works:

operator>>

When you read a string from a stream with operator>> it read a space separated word. Try it.

 int main()
 {
     std::string  line;
     std::cin >> line;
     std::cout << line << "n"; 
 }

If you run that and type a line of text. It will only print out the first space separated word.

std::istream_iterator

The standard provides an iterator for streams. std::istream_iterator<X> will read an object of type X from the stream using operator>>.

This allows you to use streams just like you would any other container when using standard algorithms. The standard algorithms take two iterators to represent a container (begin and end or potentially any two points in the container).

So by using std::istream_iterator<std::string> you can treat a stream like a container of space separated words and use it in an algorithm.

 int main()
 {
     std::string  line;
     std::istream_iterator<std::string> iterator(std::cin);

     line = *iterator;   // de-reference the iterator.
                         // Which reads the stream with operator >>
     std::cout << line << "n"; 
 }

std::for_each

I use std::for_each above because it is trivial to use. But with a tiny bit of work you can use the range based for loop introduced in C++11 (as this just calls std::begin, std::end on the object to get the bounds of the loop.

But lets look at std::for_each first.

std::for_each(begin, end, action);

Basically it loops from begin to end and performs action on the result of de-referencing the iterator.

 // In my case action was a lambda
 [&count](std::string const& word){++count[word];}

It captures count from the current context to be used in the funtion. And de-referencing the std::istream_iterator<std::string> returns a reference to a std::string object. So we can not use that to increment the count for each word.

Note: count is std::unordered_map so be looking up a value it will automatically insert it if it does not already exist (using default value (for int that is zero). Then increment that value in the map.

Range based for

A quick search to use range based for with std::istream_iterator gives me this:

template <typename T>
struct irange
{
    irange(std::istream& in): d_in(in) {}
    std::istream& d_in;
};
template <typename T>
std::istream_iterator<T> begin(irange<T> r) {
    return std::istream_iterator<T>(r.d_in);
}
template <typename T>
std::istream_iterator<T> end(irange<T>) {
    return std::istream_iterator<T>();
}

int main()
{
     std::ifstream   inputFile("Bob");
     std::unordered_map<std::string, int>  count;

     std::for(auto const& word : irange<std::string>(inputFle)) {
         ++count[word];
     }
}

Issues with this technique.

We use space to separate words. So any punctuation is going to screw things up. Not to worry. C++ allows you to define what is a space in any given context. So we just need to tell the stream what is a space.

https://stackoverflow.com/a/6154217/14065

Review of code

Sure.

struct StringOccurrence //stores word and number of occurrences
{
    std::string m_str;
    unsigned int m_count;
    StringOccurrence(const char* str, unsigned int count) : m_str(str), m_count(count) {};
};

But you can do this with a number of standard types.

typedef std::pair<std::string, unsigned int> StringOccurrence;

You are doing this to store the value in a vector. But a better way to store this is in a map. Because maps are ordered in some way internally lookup is a lot faster. std::map gives access in O(ln(n)) or std::unordered_map gives access in O(1).

I hate bad comments.
Bad comments are worse than no comments because they need to be maintained and the compiler will not help you maintain them.

    if (!in) //check if file path is valid

Not quite, but close enough I suppose. But I don’t really need the comment to tell me that. The code seems pretty self explanatory.

Note sure if -1 is a good value. It will really depend on the OS you are running on. 0 is the only valid value. Anything else is considered an error. At your OS level this will probably be truncated to 255 on most systems (but not all).

        return -1;

If you run this:

> cat xrt.cpp

int main()
{
    return -1;
}
> g++ xrt.cpp
> ./a.out
> echo $?         # Echos the error code of the last command.
255

I don’t think you need to copy the whole thing into memory.

    std::vector<std::string>vec;
    std::string lineBuff;
    while (std::getline(in, lineBuff)) // write multiline text to vector of strings
    {
        vec.push_back(lineBuff);
    }

Just read a line at a time and processes that.

Don’t use pointers in C++

    std::vector<StringOccurrence*> strOc;

C++ has much better ways to handle dynamic memory allocation and pointers is never the way to go.

When you iterate from begin -> end of something. You can use the new range based for instead.

    for (auto it = vec.begin(); it < vec.end(); it++)

    // easier to write and read:

   for(auto const& val : vec)

Going to comment on your comments again.

    for (auto it = vec.begin(); it < vec.end(); it++) //itterate through each line

Not very useful. I can see that you are iterating over every line. From the code.
You should restrict your comments to WHY you are doing something.

Space ' ' is not the only white space character! What about tab or carrige return r or vertical tab v. You should test for space using standard library routines.

std::is_space(c)

I have use goto probably twice in the last ten years. One of those times was probably wrong.

                        goto end; //skip next step (need fix?)

Loops and conditions will always be better and easier to read.

We have a leak her:

                strOc.push_back(new StringOccurrence(stringBuff.c_str(), 1));

I see a new (but no delete). See above about using pointers. There is no need to use a pointer here. Just use a normal object it will be moved into the vector.

A file is a physical storage location on disk and a directory is a logical path which is used to organise the files. A file exists within a directory.

The three operations that we can perform on file are as follows −

  • Open a file.
  • Process file (read, write, modify).
  • Save and close file.

Example

Consider an example given below −

  • Open a file in write mode.
  • Enter statements in the file.

The input file is as follows −

Hi welcome to my world
This is C programming tutorial
From tutorials Point

The output is as follows −

Number of characters = 72

Total words = 13

Total lines = 3

Program

Following is the C program to count characters, lines and number of words in a file

 Live Demo

#include <stdio.h>
#include <stdlib.h>
int main(){
   FILE * file;
   char path[100];
   char ch;
   int characters, words, lines;
   file=fopen("counting.txt","w");
   printf("enter the text.press cntrl Z:
");    while((ch = getchar())!=EOF){       putc(ch,file);    }    fclose(file);    printf("Enter source file path: ");    scanf("%s", path);    file = fopen(path, "r");    if (file == NULL){       printf("
Unable to open file.
");       exit(EXIT_FAILURE);    }    characters = words = lines = 0;    while ((ch = fgetc(file)) != EOF){       characters++;    if (ch == '
' || ch == '')       lines++;    if (ch == ' ' || ch == 't' || ch == '
' || ch == '')       words++;    }    if (characters > 0){       words++;       lines++;    }    printf("
");    printf("Total characters = %d
", characters);    printf("Total words = %d
", words);    printf("Total lines = %d
", lines);    fclose(file);    return 0; }

Output

When the above program is executed, it produces the following result −

enter the text.press cntrl Z:
Hi welcome to Tutorials Point
C programming Articles
Best tutorial In the world
Try to have look on it
All The Best
^Z
Enter source file path: counting.txt

Total characters = 116
Total words = 23
Total lines = 6

Понравилась статья? Поделить с друзьями:
  • By the rivers one word
  • C users user desktop документ microsoft word docx
  • By my word and hand перевод
  • C users admin appdata roaming microsoft word
  • By internal structure of the word we mean