Word counts in latex

The prior answers are (I believe) more than adequate for the original question. But for the benefit of others who find this via search, I would like to provide more information.

«Word count» can mean many things. It is not necessarily determined by looking for word boundaries (space and return).

One widely-used measure, at least for U.S. English, is to visualize an old-fashioned typewriter, where each keystroke generates a character (including quote, period, comma, and space). Carriage return is also a character. Then, take the number of characters, and divide by six. This assumes an average word length (in U.S. English) of five letters, plus a space.

The above definition is useful for estimating how many pages will be used in a lengthy, printed book or manuscript. Of course, if you are preparing a PDF with TeX, you know exactly how many pages it uses.

Note that this criterion is not useful for academic papers containing illustrations, tables, and images.

I do not know whether MS Word counts word boundaries, or characters/6. In theory, the result should be almost the same, for lengthy flowing text (U.S. English).

I recently wrote a book, for which the page count measured by characters/6 was 220. The actual page count, using TeX with 5.5″x8.5″ layout, was 240 pages including blanks. Not a bad estimate.

You may ask: In the case of a term paper, why not specify number of pages instead of word count? The obvious answer is that the number of pages can be gamed using different fonts, font sizes, or leading.

  • Posted by Henry on September 15, 2015

    This article was originally published on the ShareLaTeX blog and is reproduced here for archival purposes.

    We have just released a long time requested feature, word count. You can now find out how many more words you are over for your next assignment. This feature is built on top of the great work of Einar Andreas Rødland and his TeXcount project.

    total words in your latex project

    To perform a word count on your LaTeX project you first need to compile. You will then be able to perform the word count which is in the left hand menu.

    TeXCount is found in the left hand menu

See this on atom.io

latex-wordcount

Table of contents

  • About
  • Requirements
  • Commands
    • Quick summary
    • Document wordcount
    • Document wordcount (more accurate)
    • File wordcount
    • Section wordcount
    • Selection wordcount
  • Settings
    • Enable shell escape
    • Maximum section depth

About

Given the nature of LaTeX, it is practically impossible to get an accurate word count for any non-trivial document. However, this package provides several methods of estimating the word count using tools that come with a TeX Live distribution, such as TeXcount and wordcount.tex.

Most of the provided commands use TeXcount, as it is the quicker and more lenient program. However, it is likely to underestimate the wordcount most of the time. The one command that uses wordcount.tex only works on the entire document and requires generating a log file that grows larger as the document gets longer. This one will likely overestimate the word count, so (to reiterate the earlier point) these methods are approximations only and you as the user must decide how to interpret the results.

  • Note: I recently added code folding based on section commands. This is experimental, must be explicitly enabled to use, and will likely be removed at some point. For now though, I’ll leave it here because this package already has section range finding logic, so implementing folding based on this range was easy enough.

Requirements

  • Most commands need the texcount command line program. This comes with a standard TeX Live installation. I believe it also comes with MiKTeX, but cannot confirm.
    • Windows: TeXcount may need to be set up following the instructions given here.
  • The Document wordcount (more accurate) command only works on a UNIX shell, as it currently uses commands such as echo and grep. Most Windows users will be unable to use this command for now.
  • The above command also requires that the absolute path to the root file has no spaces in it. E.g., /Users/username/my TeX files/main.tex will not work.

Commands

Quick summary

  • Document wordcount: counts entire document, from start of root file, including all input files.
  • Document wordcount (more accurate): more accurate version of Document wordcount, but takes longer and requires generation of a log file containing a line for every character in the output.
  • File wordcount: counts only current file, and does not count input files (the only one that doesn’t).
  • Section wordcount: counts the current section, including all input files.
  • Selection wordcount: counts the current selection, including input files.

Document wordcount

This will count using TeXcount, which uses a set of LaTeX specific rules to better estimate the word count. It’s quite fast, as it does not compile the document, so useful if you want a rough estimate of the word count. Note that it will not know any macro definitions, so take this into consideration if you use macros that expand into a group of words. It will probably underestimate the word count in most cases. See it’s homepage here.

Document wordcount (more accurate)

Forgive the name. (this does not work on files with spaces in the path; see top note)

This command runs a different program, and actually compiles your document. The output should be a file called wordcount.log located in the same directory as the root file (the one that gets compiled; magic comments are followed). Basically, it redefines several TeX commands to force the log file to include each character and word. A simple search command then finds the number of characters and words, and prints the result.

  • Note: because the file is compiled, and it prints every character to the log file, this method is slow. It also takes a lot of space, so be mindful of that. The taken space increases as the document gets bigger.

This method is more accurate than TeXcount; it ‘knows’ macro definitions, so (for example) it will correctly give the number of words inserted by the lipsum command.

However, it makes no distinction between the type of any text. Tables, captions, math formulas (each individual term; even superscripts are counted), code (in a minted block), etc. will all be included in the count.

File wordcount

Back to TeXcount, this command will run it on the currently open file (not the root one). It will also ignore any input, include, etc. commands that would be honoured by Document wordcount.

Section wordcount

First, the text in the current section is gathered. This section is determined by looking back from the cursor position to the nearest section command, where section command is one of the following: part, chapter, section, subsection, subsubsection, paragraph, subparagraph. However, only section commands of the level set by maximum section depth or lower (shallower?) will be recognised. When found, the section start (that will eventually be passed to texcount) is set to the beginning of that section command.

Next, it will then look for the next section command of same or lower level as the starting one. When found, it will set the end of the section range to just before this section command.

  • If not found in the current file, it will stop there. It will not attempt to determine the document structure and work out where the source continues.

Finally, it copies the text in this range and pastes it into a temporary file created by the npm tmp package. It then runs texcount on this temporary file, but set up so it thinks it’s in the same directory as the original file, ensuring relative file paths still work (as input files are included if they are within the section).

Selection wordcount

Similar to Section wordcount, this command will count the words in a selected region. Specifically, it will create a temporary file in the system temp directory and write the selected text to it. It will then run texcount on this temporary file, specifying the directory as that of the original file. This way, input statements (and others) will still work if they are completely within the selection. TeXcount will not see the outside of the selection, so if the selection cuts into a macro it may not behave as expected.

In both this case and the above, the temporary file is deleted as soon as the results are returned.

Settings

Enable shell escape

The shell escape flag is generally used to allow LaTeX to execute arbitrary code. This can be useful for packages such as minted, where the syntax colours are determined using an external program.

This option is not necessary for TeXcount, as it works by parsing the source. However, wordcount.tex requires the document be compiled, so any packages that require shell escape need this flag enabled.

Maximum section depth

When counting the words in a section, this setting determines the deepest section command to look for. For example, consider the following document layout

section{S:1}
  (1)
  subsection{SS:1-1}
    (2)
  subsection{SS:1-2}
    (3)
section{S:2}

A setting of subsection will cause the count when the cursor is at (1) to be for the entire S:1 to S:2, the count at (2) to be between SS:1-1 and SS:1-2, and the count at (3) to be between SS:1-2 and S:2.

A setting of section will cause all three locations to count the same area (S:1 to S:2), because the deeper subsection commands will be ignored.

Ideas

  • Config file could be used for custom section commands and their ‘level’; section goes from current section (by looking above) to next section command of same or greater level.

The concept of a word count for a mathematical document is usually not appropriate. A more appropriate assessment is to provide some guidance on the page size, line spacing and font size to be used and then define a limit in terms of pages after excluding certain material, e.g. one might exclude figures, tables, appendices, front and back matter from the count.

Nevertheless sometimes a word count, or at least some estimate of a word count, is required or of some interest.

The standard Linux command wc counts the letters, words and lines in a file. However this will give a gross over estimate on many latex documents due to the large number of words which are actually latex commands and maths. To get a more accurate estimate there is a need to try to count just the actual words in the document.

Note how accurate any of these methods will be likely depends on how your document has been written, potentially the use of latex macros, what your or your examiners definition of a word count is. In particular in the later case only your examiners or academic administration team can likely formally answer how a word count may actually be assessed/applied.

Using Overleaf

Obtaining a word count in Overleaf is as simple as selecting it from the menu. It uses the texcount utility (see below for some more info on that).

Using Kile

Kile is a latex editor. If you open your document in Kile then select Statistics from the File menu you will find a word count etc.

Using Texmaker

Texmaker’s integrated PDF viewer has a word count feature — just right-click in the pdf document, and select Number of words in the document.

Using untex

Use untex first to remove the tex codes and then count the words, e.g.

untex file.tex | wc -w

The accuracy of the estimate will depend to a degree on how many latex macros of your own you have which it fails to handle well.

Using TeXcount

TeXcount is another system that aims to parse the latex document and count the words, e.g. one can run

texcount.pl -inc -html -v -sum file.tex > results.html

which produces an HTML file that you can view in a web browser to see the overall counts it has done and what parts of the document it has included or excluded for the count.

Note texcount also provide a web based word count service.

As with untex the accuracy of the estimate will depend to a degree on how many latex macros of your own you have which it fails to handle well.

Using Postscript or PDF document

An alternative approach is to try to count the words in the postscript of PDF file by converting it to plain text first, e.g.

dvips -o — file.dvi | ps2ascii | wc -w

pdftotext file.pdf — | egrep -e ‘www+’ | iconv -f ISO-8859-15 -t UTF-8 | wc -w

I’m currently searching for an application or a script that does a correct word count for a LaTeX document.

Up till now, I have only encountered scripts that only work on a single file but what I want is a script that can safely ignore LaTeX keywords and also traverse linked files…ie follow include and input links to produce a correct word-count for the whole document.

With vim, I currently use ggVGg CTRL+G but obviously that shows the count for the current file and does not ignore LaTeX keywords.

Does anyone know of any script (or application) that can do this job?

Thom Wiggers's user avatar

Thom Wiggers

6,8881 gold badge40 silver badges65 bronze badges

asked Jun 4, 2010 at 14:20

Andreas Grech's user avatar

Andreas GrechAndreas Grech

105k98 gold badges296 silver badges359 bronze badges

4

I use texcount. The webpage has a Perl script to download (and a manual).

It will include tex files that are included (input or include) in the document (see -inc), supports macros, and has many other nice features.

When following included files you will get detail about each separate file as well as a total. For example here is the total output for a 12 page document of mine:

TOTAL COUNT
Files: 20
Words in text: 4188
Words in headers: 26
Words in float captions: 404
Number of headers: 12
Number of floats: 7
Number of math inlines: 85
Number of math displayed: 19

If you’re only interested in the total, use the -total argument.

answered Jun 7, 2010 at 13:16

Geoff's user avatar

12

I went with icio’s comment and did a word-count on the pdf itself by piping the output of pdftotext to wc:

pdftotext file.pdf - | wc - w 

answered Jun 4, 2010 at 14:47

Andreas Grech's user avatar

Andreas GrechAndreas Grech

105k98 gold badges296 silver badges359 bronze badges

3

latex file.tex
dvips -o - file.dvi | ps2ascii | wc -w

should give you a fairly accurate word count.

answered Jun 4, 2010 at 14:28

aioobe's user avatar

aioobeaioobe

410k112 gold badges808 silver badges825 bronze badges

2

To add to @aioobe,

If you use pdflatex, just do

pdftops file.pdf
ps2ascii file.ps|wc -w

I compared this count to the count in Microsoft Word in a 1599 word document (according to Word). pdftotext produced a text with 1700+ words. texcount did not include the references and produced 1088 words. ps2ascii returned 1603 words. 4 more than in Word.

I say that’s a pretty good count. I am not sure where’s the 4 word difference, though. :)

answered Feb 15, 2014 at 5:36

fiacobelli's user avatar

fiacobellifiacobelli

1,9625 gold badges24 silver badges31 bronze badges

1

In Texmaker interface you can get the word count by right clicking in the PDF preview:

enter image description here

enter image description here

answered Apr 18, 2016 at 17:37

Franck Dernoncourt's user avatar

1

Overleaf has a word count feature:

Overleaf v2:

enter image description here

enter image description here

Overleaf v1:

enter image description here

enter image description here

answered Jan 5, 2019 at 4:36

Franck Dernoncourt's user avatar

I use the following VIM script:

function! WC()
    let filename = expand("%")
    let cmd = "detex " . filename . " | wc -w | perl -pe 'chomp; s/ +//;'"
    let result = system(cmd)
    echo result . " words"
endfunction

… but it doesn’t follow links. This would basically entail parsing the TeX file to get all linked files, wouldn’t it?

The advantage over the other answers is that it doesn’t have to produce an output file (PDF or PS) to compute the word count so it’s potentially (depending on usage) much more efficient.

Although icio’s comment is theoretically correct, I found that the above method gives quite accurate estimates for the number of words. For most texts, it’s well within the 5% margin that is used in many assignments.

answered Jun 4, 2010 at 14:50

Konrad Rudolph's user avatar

Konrad RudolphKonrad Rudolph

525k130 gold badges931 silver badges1208 bronze badges

1

If the use of a vim plugin suits you, the vimtex plugin has integrated the texcount tool quite nicely.

Here is an excerpt from their documentation:

:VimtexCountLetters       Shows the number of letters/characters or words in
:VimtexCountWords         the current project or in the selected region. The
                          count is created with `texcount` through a call on
                          the main project file similar to: >

                            texcount -nosub -sum [-letter] -merge -q -1 FILE
<
                          Note: Default arguments may be controlled with
                                |g:vimtex_texcount_custom_arg|.

                          Note: One may access the information through the
                                function `vimtex#misc#wordcount(opts)`, where
                                `opts` is a dictionary with the following
                                keys (defaults indicated): >

                                'range' : [1, line('$')]
                                'count_letters' : 0/1
                                'detailed' : 0
<
                                If `detailed` is 0, then it only returns the
                                total count. This makes it possible to use for
                                e.g. statusline functions. If the `opts` dict
                                is not passed, then the defaults are assumed.

                                             *VimtexCountLetters!*
                                             *VimtexCountWords!*
:VimtexCountLetters!      Similar to |VimtexCountLetters|/|VimtexCountWords|, but
:VimtexCountWords!        show separate reports for included files.  I.e.
                          presents the result of: >

                            texcount -nosub -sum [-letter] -inc FILE
<
                                             *VimtexImapsList*
                                             *<plug>(vimtex-imaps-list)*

The nice part about this is how extensible it is. On top of counting the number of words in your current file, you can make a visual selection (say two or three paragraphs) and then only apply the command to your selection.

answered Feb 17, 2020 at 2:41

Benjamin Chausse's user avatar

Benjamin ChausseBenjamin Chausse

1,3672 gold badges9 silver badges19 bronze badges

For a very basic article class document I just look at the number of matches for a regex to find words. I use Sublime Text, so this method may not work for you in a different editor, but I just hit Ctrl+F (Command+F on Mac) and then, with regex enabled, search for

(^|s+|"|((h|f|te){)|()w+

which should ignore text declaring a floating environment or captions on figures as well as most kinds of basic equations and usepackage declarations, while including quotations and parentheticals. It also counts footnotes and emphasized text and will count hyperref links as one word. It’s not perfect, but it’s typically accurate to within a few dozen words or so. You could refine it to work for you, but a script is probably a better solution, since LaTeX source code isn’t a regular language. Just thought I’d throw this up here.

answered Nov 17, 2017 at 20:06

ocket8888's user avatar

ocket8888ocket8888

1,05012 silver badges30 bronze badges

Like this post? Please share to your friends:
  • Word counts for essays
  • Word counts for books
  • Word counts by genre
  • Word country in spanish
  • Word count of plays