Word count files linux

wc (short for word count) is a command line tool in Unix/Linux operating systems, which is used to find out the number of newline count, word count, byte and character count in the files specified by the File arguments to the standard output and hold a total count for all named files.

When you define the File parameter, the wc command prints the file names as well as the requested counts. If you do not define a file name for the File parameter, it prints only the total count to the standard output.

In this article, we will discuss how to use the wc command to calculate a file’s newlines, words, characters, or byte count with practical examples.

wc Command Syntax

The syntax of the wc command is shown below.

# wc [options] filenames

The followings are the options and usage provided by the wc command.

  • wc -l – Prints the number of lines in a file.
  • wc -w – prints the number of words in a file.
  • wc -c – Displays the count of bytes in a file.
  • wc -m – prints the count of characters from a file.
  • wc -L – prints only the length of the longest line in a file.

Let’s see how we can use the ‘wc‘ command with the few available arguments and examples in this article. We have used the ‘tecmint.txt‘ file for testing the commands.

Let’s find out the output of the tecmint.txt file using the cat command as shown below.

$ cat tecmint.txt

Red Hat
CentOS
AlmaLinux
Rocky Linux
Fedora
Debian
Scientific Linux
OpenSuse
Ubuntu
Xubuntu
Linux Mint
Deepin Linux
Slackware
Mandriva

1. A Basic Example of WC Command

The ‘wc‘ command without passing any parameter will display a basic result of the ‘tecmint.txt‘ file. The three numbers shown below are 12 (number of lines), 16 (number of words), and 112 (number of bytes) of the file.

$ wc tecmint.txt

12  16 112 tecmint.txt

2. Count Number of Lines in a File

Count the number of newlines in a file using the option ‘-l‘, which prints the number of lines from a given file. Say, the following command will display the count of newlines in a file.

In the output, the first field is assigned as count and the second field is the name of the file.

$ wc -l tecmint.txt

12 tecmint.txt

3. Count Number of Words in a File

The -w argument with the wc command prints the number of words in a file. Type the following command to count the words in a file.

$ wc -w tecmint.txt

16 tecmint.txt

4. Count Number of Characters in a File

When using option -m with the wc command will print the total number of characters in a file.

$ wc -m tecmint.txt

112 tecmint.txt

5. Count Number of Bytes in a File

When using option -c will print the number of bytes of a file.

$ wc -c tecmint.txt

112 tecmint.txt

6. Display Length of Longest Line in File

The ‘wc‘ command allows an argument ‘-L‘, it can be used to print out the length of the longest (number of characters) line in a file.

So, we have the longest character line (‘Scientific Linux‘) in a file.

$ wc -L tecmint.txt

16 tecmint.txt

7. Check wc Command Options

For more information and help on the wc command, simply run the ‘wc --help‘ or ‘man wc‘ from the command line.

$ wc --help
OR
$ man wc

wc Command Usage

Usage: wc [OPTION]... [FILE]...
  or:  wc [OPTION]... --files0-from=F
Print newline, word, and byte counts for each FILE, and a total line if
more than one FILE is specified.  A word is a non-zero-length sequence of
characters delimited by white space.

With no FILE, or when FILE is -, read standard input.

The options below may be used to select which counts are printed, always in
the following order: newline, word, character, byte, maximum line length.
  -c, --bytes            print the byte counts
  -m, --chars            print the character counts
  -l, --lines            print the newline counts
      --files0-from=F    read input from the files specified by
                           NUL-terminated names in file F;
                           If F is - then read names from standard input
  -L, --max-line-length  print the maximum display width
  -w, --words            print the word counts
      --help     display this help and exit
      --version  output version information and exit

GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
Full documentation at: <https://www.gnu.org/software/coreutils/wc>
or available locally via: info '(coreutils) wc invocation'

In this article, you’ve learned about the wc command, which is a simple command-line utility to count the number of lines, words, characters, and byes in text files. There are lots of such other Linux commands, you should learn and master your command-line skills.

If You Appreciate What We Do Here On TecMint, You Should Consider:

TecMint is the fastest growing and most trusted community site for any kind of Linux Articles, Guides and Books on the web. Millions of people visit TecMint! to search or browse the thousands of published articles available FREELY to all.

If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.

Support Us

We are thankful for your never ending support.

The wc program counts «words», but those are not for instance the «words» that many people would see when they examine a file. The vi program for instance uses a different measure of «words», delimiting them based on their character classes, while wc simply counts things separated by whitespace. The two measures can be radically different. Consider this example:

first,second

vi sees three words (first and second as well as the comma separating them), while wc sees one (there is no whitespace on that line). There are many ways to count words, some are less useful than others.

While Perl would be better suited to writing a counter for the vi-style words, here is a quick example using sed, tr and wc (moderately portable using literal carriage returns ^M):

#!/bin/sh
in_words="[[:alnum:]_]"
in_punct="[][{}\|:"';<>,./?`~!@#$%^&*()+=-]"
sed     -e "s/($in_words)($in_punct)/1^M2/g" 
        -e "s/($in_punct)($in_words)/1^M2/g" 
        -e "s/[[:space:]]/^M/g" 
        "$@" |
tr 'r' 'n' |
sed     -e '/^$/d' |
wc      -l

Comparing counts:

  • Running the script on itself, gives me 76 words.
  • The example in Perl by @cuonglm gives 31.
  • Using wc gives 28.

For reference, POSIX vi says:

In the POSIX locale, vi shall recognize five kinds of words:

  1. A maximal sequence of letters, digits, and underscores, delimited at both ends by:

    • Characters other than letters, digits, or underscores

    • The beginning or end of a line

    • The beginning or end of the edit buffer

  2. A maximal sequence of characters other than letters, digits, underscores, or characters, delimited at both ends by:

    • A letter, digit, underscore
    • <blank> characters
    • The beginning or end of a line
    • The beginning or end of the edit buffer
  3. One or more sequential blank lines

  4. The first character in the edit buffer

  5. The last non-<newline> in the edit buffer

I’m trying to count a particular word occurrence in a whole directory. Is this possible?

Say for example there is a directory with 100 files all of whose files may have the word “aaa” in them. How would I count the number of “aaa” in all the files under that directory?

I tried something like:

 zegrep "xception" `find . -name '*auth*application*' | wc -l 

But it’s not working.

tchrist's user avatar

tchrist

77.9k30 gold badges127 silver badges178 bronze badges

asked May 26, 2011 at 7:20

Ashish Sharma's user avatar

Ashish SharmaAshish Sharma

1,5677 gold badges24 silver badges35 bronze badges

grep -roh aaa . | wc -w

Grep recursively all files and directories in the current dir searching for aaa, and output only the matches, not the entire line. Then, just use wc to count how many words are there.

answered May 26, 2011 at 8:30

Carlos Campderrós's user avatar

Carlos CampderrósCarlos Campderrós

22.1k11 gold badges51 silver badges57 bronze badges

6

Another solution based on find and grep.

find . -type f -exec grep -o aaa {} ; | wc -l

Should correctly handle filenames with spaces in them.

answered May 28, 2011 at 14:35

Fredrik Pihl's user avatar

Fredrik PihlFredrik Pihl

44.2k7 gold badges83 silver badges130 bronze badges

4

Use grep in its simplest way. Try grep --help for more info.


  1. To get count of a word in a particular file:

    grep -c <word> <file_name>
    

    Example:

    grep -c 'aaa' abc_report.csv
    

    Output:

    445
    

  1. To get count of a word in the whole directory:

    grep -c -R <word>
    

    Example:

    grep -c -R 'aaa'
    

    Output:

    abc_report.csv:445
    lmn_report.csv:129
    pqr_report.csv:445
    my_folder/xyz_report.csv:408
    

answered Mar 13, 2016 at 3:22

Parag Tyagi's user avatar

Let’s use AWK!

$ function wordfrequency() { awk 'BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf("%3d %sn", words[w], w) } ' | sort -rn; }
$ cat your_file.txt | wordfrequency

This lists the frequency of each word occurring in the provided file. If you want to see the occurrences of your word, you can just do this:

$ cat your_file.txt | wordfrequency | grep yourword

To find occurrences of your word across all files in a directory (non-recursively), you can do this:

$ cat * | wordfrequency | grep yourword

To find occurrences of your word across all files in a directory (and it’s sub-directories), you can do this:

$ find . -type f | xargs cat | wordfrequency | grep yourword

Source: AWK-ward Ruby

ack's user avatar

ack

7,2262 gold badges25 silver badges20 bronze badges

answered Dec 15, 2014 at 22:40

Sheharyar's user avatar

SheharyarSheharyar

72.7k21 gold badges168 silver badges213 bronze badges

find .|xargs perl -p -e 's/ /n'|xargs grep aaa|wc -l

answered May 26, 2011 at 7:33

Vijay's user avatar

VijayVijay

64.7k89 gold badges225 silver badges319 bronze badges

cat the files together and grep the output: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '<exception>'

if you want ‘exceptional’ to match, don’t use the ‘<‘ and ‘>’ around the word.

answered May 26, 2011 at 7:27

jcomeau_ictx's user avatar

jcomeau_ictxjcomeau_ictx

37.4k6 gold badges92 silver badges107 bronze badges

How about starting with:

cat * | sed 's/ /n/g' | grep '^aaa$' | wc -l

as in the following transcript:

pax$ cat file1
this is a file number 1

pax$ cat file2
And this file is file number 2,
a slightly larger file

pax$ cat file[12] | sed 's/ /n/g' | grep 'file$' | wc -l
4

The sed converts spaces to newlines (you may want to include other space characters as well such as tabs, with sed 's/[ t]/n/g'). The grep just gets those lines that have the desired word, then the wc counts those lines for you.

Now there may be edge cases where this script doesn’t work but it should be okay for the vast majority of situations.

If you wanted a whole tree (not just a single directory level), you can use somthing like:

( find . -name '*.txt' -exec cat {} ';' ) | sed 's/ /n/g' | grep '^aaa$' | wc -l

answered May 26, 2011 at 7:28

paxdiablo's user avatar

paxdiablopaxdiablo

844k233 gold badges1565 silver badges1937 bronze badges

There’s also a grep regex syntax for matching words only:

# based on Carlos Campderrós solution posted in this thread
man grep | less -p '<'
grep -roh '<aaa>' . | wc -l

For a different word matching regex syntax see:

man re_format | less -p '[[:<:]]'

answered May 28, 2011 at 18:20

tim's user avatar

wc stands for word count. As the name implies, it is mainly used for counting purpose.

  • It is used to find out number of lines, word count, byte and characters count in the files specified in the file arguments.
  • By default it displays four-columnar output.
  • First column shows number of lines present in a file specified, second column shows number of words present in the file, third column shows number of characters present in file and fourth column itself is the file name which are given as argument.

Syntax:

wc [OPTION]... [FILE]...

Let us consider two files having name state.txt and capital.txt containing 5 names of the Indian states and capitals respectively.

$ cat state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh

$ cat capital.txt
Hyderabad
Itanagar
Dispur
Patna
Raipur

Passing only one file name in the argument.

$ wc state.txt
 5  7 58 state.txt
       OR
$ wc capital.txt
 5  5 39 capital.txt

Passing more than one file name in the argument.

$ wc state.txt capital.txt
  5   7  58 state.txt
  5   5  39 capital.txt
 10  12  97 total

Note : When more than file name is specified in argument then command will display four-columnar output for all individual files plus one extra row displaying total number of lines, words and characters of all the files specified in argument, followed by keyword total. Options: 1. -l: This option prints the number of lines present in a file. With this option wc command displays two-columnar output, 1st column shows number of lines present in a file and 2nd itself represent the file name.

With one file name
$ wc -l state.txt
5 state.txt

With more than one file name
$ wc -l state.txt capital.txt
  5 state.txt
  5 capital.txt
 10 total

2. -w: This option prints the number of words present in a file. With this option wc command displays two-columnar output, 1st column shows number of words present in a file and 2nd is the file name.

With one file name
$ wc -w state.txt
7 state.txt

With more than one file name
$ wc -w state.txt capital.txt
  7 state.txt
  5 capital.txt
 12 total

3. -c: This option displays count of bytes present in a file. With this option it display two-columnar output, 1st column shows number of bytes present in a file and 2nd is the file name.

With one file name
$ wc -c state.txt
58 state.txt

With more than one file name
$ wc -c state.txt capital.txt
 58 state.txt
 39 capital.txt
 97 total

4. -m: Using -m option ‘wc’ command displays count of characters from a file.

With one file name
$ wc -m state.txt
56 state.txt

With more than one file name
$ wc -m state.txt capital.txt
 58 state.txt
 39 capital.txt
 97 total

5. -L: The ‘wc’ command allow an argument -L, it can be used to print out the length of longest (number of characters) line in a file. So, we have the longest character line Arunachal Pradesh in a file state.txt and Hyderabad in the file capital.txt. But with this option if more than one file name is specified then the last row i.e. the extra row, doesn’t display total but it display the maximum of all values displaying in the first column of individual files. Note: A character is the smallest unit of information that includes space, tab and newline.

With one file name
$ wc -L state.txt
17 state.txt

With more than one file name
$ wc -L state.txt capital.txt
 17 state.txt
 10 capital.txt
 17 total

6. –version: This option is used to display the version of wc which is currently running on your system.

$ wc --version
wc (GNU coreutils) 8.26
Packaged by Cygwin (8.26-1)
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later .
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin and David MacKenzie.

Applications of wc Command

1. To count all files and folders present in directory: As we all know ls command in unix is used to display all the files and folders present in the directory, when it is piped with wc command with -l option it display count of all files and folders present in current directory.

$ ls gfg
a.txt 
b.txt  
c.txt  
d.txt  
e.txt  
geeksforgeeks  
India

$ ls gfg | wc -l
7

2. Display number of word count only of a file: We all know that this can be done with wc command having -w option, wc -w file_name, but this command shows two-columnar output one is count of words and other is file name.

$ wc -w  state.txt
7 state.txt

So to display 1st column only, pipe(|) output of wc -w command to cut command with -c option. Or use input redirection(<).

$ wc -w  state.txt | cut -c1
7
      OR
$ wc -w < state.txt
7

?t=89 This article is contributed by Akash Gupta. If you like GeeksforGeeks and would like to contribute, you can also write an article using write.geeksforgeeks.org or mail your article to review-team@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks. Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

Say I need to find out how many words are in each file that has the word ‘work’ in its name.

I know that to find files with ‘work’ in the name, it would be ls work. And to figure out the number of words in a file it would be wc -w.

However I tried the following and it seems to be just displaying the number of files, not the number of words combined in all files (which I need):

ls work | wc -w

So say if there are 14 files that have ‘work’ in the name, it would display 14, not the number of words.

waldyrious's user avatar

waldyrious

2,1682 gold badges21 silver badges35 bronze badges

asked Sep 29, 2013 at 0:45

John Stacen's user avatar

0

The syntax is wc -w [FILE]. If you don’t use FILE but pipe in the output of ls work it will only count what it will read on stdin.

You need to pipe in the text itself:

cat *work* | wc -w

Alternative you could execute wc with find -exec. But be aware that this could show multiple «total» sums as find will call wc multiple times if there are lots of files.

find ./ -type f -name "*work*" -exec wc -w {} +

heemayl's user avatar

heemayl

89.5k20 gold badges197 silver badges264 bronze badges

answered Sep 29, 2013 at 1:10

Germar's user avatar

GermarGermar

6,1172 gold badges24 silver badges38 bronze badges

1

You can run wc with multiple files and then use shell built-in * which adds every non hidden files in working directory to wc‘s parameters.

wc -w *work*

heemayl's user avatar

heemayl

89.5k20 gold badges197 silver badges264 bronze badges

answered Apr 18, 2015 at 17:24

maresmar's user avatar

2

There are some good answers here but for this I like to use:

ls | grep work | xargs wc -w

answered Apr 6, 2022 at 9:25

T MacN's user avatar

0

Понравилась статья? Поделить с друзьями:
  • Word count certain words
  • Word count and pages
  • Word count and frequency
  • Word could not create the work file check the temp environment variable
  • Word correct all grammar