The linguistic inquiry and word count

Introducing the Linguistic Inquiry and Word Count

by Dr. Ryan Nichols, Philosophy, Cal State Fullerton, Orange County CA

As I write this column there are, remarkably, no Youtube guides for the use of the Linguistic Inquiry and Word Count. This is a shame since the Linguistic Inquiry and Word Count, ‘LIWC’ (pronounced ‘luke’) for short, is one of the best textual analysis software tools out there.

LIWC2007 logo represented with some word categories. Source: Author image

LIWC2007 logo represented with some word categories. Source: Author image

LIWC allows users to look under the hood of works of literature. When uploading a text to LIWC, the user will receive an output containing more than 70 columns of data. For example, if I upload this blog post to LIWC, it might return the result that 17.32% of the text falls under LIWC’s cognition category while only 1.2% falls under the religion category, and so on. This is useful information for several reasons illustrated in this and the following post.

LIWC’s design has made it a favorite for psychologists, but it also finds use in marketing, twitter analysis, mental health diagnostics and much more. Psychologists across the world have developed LIWC dictionaries in their native languages. As of writing, languages supported include Arabic, Chinese, Dutch, English, French, German, Italian, Portuguese, Russian, Serbian, Spanish, and Turkish. LIWC is an extremely affordable software tool. LIWClite7 is $30 USD while LIWC2007, the full version, is $90 USD. (When compared to shareware text analysis software, this is not cheap. But proceeds from LIWC funnel to the University of Texas Department of Psychology to support its work.)

Another key reason for praising LIWC is the quality of LIWC’s dictionary design. The LIWC2007 dictionary contains 4500 words and word stems. Each is filed into one or more subdictionaries. Subdictionaries represent one of the 55 word categories through which LIWC compiles a text. For example, the word “cried” is part of “five word categories: sadness, negative emotion, overall affect, verb, and past tense verb. Hence, if it is found in the target text, each of these five subdictionary scale scores will be incremented” (Pennebaker et al., 2007, p. 4). What makes this so special is that Professor Jamie Pennebaker and developers psychometrically validated the subdictionaries with great effort. This means that values across LIWC categories have been shown to correlate with big-five personality traits (Pennebaker & King, 1999; Mehl, Gosling, & Pennebaker, 2006).

The psychometric validation of LIWC categories is significant because it allows LIWC users to draw justified inferences from word frequencies to psychological states of the authors. For this reason the potential for LIWC’s use in the context of the humanities, religion in particular, is largely untapped. CERC is using it for a few projects. In a pilot research project designed to test the application of LIWC to research questions in the humanities, Justin Lynn, Ben Purzycki and I compiled a large corpus of literary texts from three genres, Science Fiction, Fantasy, and Mystery, in order to test the interpretations of humanities scholars about genre. In a research project about contemporary Protestantism Oliver Gunther, Carson Logan and I compiled about 400 sermons drawn from 12 denominations in order to test whether the language across the denominations, in particular, their use of supernatural agency terms, strongly correlated with known differences in theological orientation and known categories in the sociology of religion.

In two upcoming posts about LIWC we will describe each of these in more detail in order to give a sense for the questions a humanist can pursue with the Linguistic Inquiry and Word Count. In the meantime, however, due to the dearth of instructional videos about LIWC, I recorded a video introduction here.

Передняя обложка

Lawrence Erlbaum Associates, Incorporated, 1999

0 Отзывы

Google не подтверждает отзывы, однако проверяет данные и удаляет недостоверную информацию.

Language, whether spoken or written, is an important window into people’s emotional and cognitive worlds. Text analysis of these narratives, focusing on specific words or classes of words, has been used in numerous research studies including studies of emotional, cognitive, structural, and process components of individuals’ verbal and written language. It was in this research context that the LIWC program was developed. The program analyzes text files on a word-by-word basis, calculating percentage words that match each of several language dimensions. Its output is a text file that can be opened in any of a variety of applications, including word processors and spreadsheet programs. The program has 68 pre-set dimensions (output variables) including linguistic dimensions, word categories tapping psychological constructs, and personal concern categories, and can accommodate user-defined dimensions as well. Easy to install and use, this software offers researchers in social, personality, clinical, and applied psychology a valuable tool for quantifying the rich but often slippery data provided in the form of personal narratives. The software comes complete on one 31/2 diskette and runs on any Windows-based computer.

Home page > Tools

Linguistic Inquiry and Word Count (LIWC)

Brief description

Linguistic Inquiry and Word Count (LIWC; pronounced «Luke») is a text analysis program that calculates the percentage of words in a given text that fall into one or more of over 80 linguistic, psychological and topical categories indicating various social, cognitive, and affective processes. You can use LIWC, for example, to determine the degree in which a text uses positive or negative emotions, self-references or causal words.

The core of the program is a dictionary containing words that belong to these categories. Dictionaries for many languages are available; it is also possible to define your own dictionary, for example to define one or more categories that are not included in the standard dictionary.

Instruction

Operator’s Manual LIWC 2015
Extensive online software manual.

Introduction to Linguistic Inquiry and Word Count (Centre for Human Evolution, Cognition and Culture at University of British Columbia).
Video clip introducing the use of LIWC.

Availability

LIWC is available on VU-pc’s for staff and students of the Faculty of Humanities (with limitations on concurrent access).

More information

LIWC website.

How it works. Brief background information about LIWC.

Tausczik, Y.R. & Pennebaker, J.W. 2014. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29 (1), 24-54. DOI: 10.1177/0261927X09351676
This article reviews several computerized text analysis methods and describes how LIWC was created and validated.

Logo LIWC

Project description

PyPI version
Travis CI Build Status

Linguistic Inquiry and Word Count (LIWC) analyzer.

The LIWC lexicon is proprietary, so it is not included in this repository,
but this Python package requires it.
The lexicon data can be acquired (purchased) from liwc.net.
This package reads from the LIWC2007_English100131.dic (MD5: 2a8c06ee3748218aa89b975574b4e84d) file,
which must be available on any system where this package is used.

The LIWC2007 .dic format looks like this:

%
1   funct
2   pronoun
[...]
%
a   1   10
abdomen*    146 147
about   1   16  17
[...]

Setup

Install from PyPI:

pip install -U liwc

Example

import re
from collections import Counter

def tokenize(text):
    # you may want to use a smarter tokenizer
    for match in re.finditer(r'w+', text, re.UNICODE):
        yield match.group(0)

import liwc
parse, category_names = liwc.load_token_parser('LIWC2007_English100131.dic')
  • parse is a function from a token of text (a string) to a list of matching LIWC categories (a list of strings)
  • category_names is all LIWC categories in the lexicon (a list of strings)
gettysburg = '''Four score and seven years ago our fathers brought forth on
  this continent a new nation, conceived in liberty, and dedicated to the
  proposition that all men are created equal. Now we are engaged in a great
  civil war, testing whether that nation, or any nation so conceived and so
  dedicated, can long endure. We are met on a great battlefield of that war.
  We have come to dedicate a portion of that field, as a final resting place
  for those who here gave their lives that that nation might live. It is
  altogether fitting and proper that we should do this.'''
gettysburg_tokens = tokenize(gettysburg)
# now flatmap over all the categories in all of the tokens using a generator:
gettysburg_counts = Counter(category for token in gettysburg_tokens for category in parse(token))
# and print the results:
print(gettysburg_counts)

License

Copyright (c) 2012-2019 Christopher Brown.
MIT Licensed.

Download files

Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

Понравилась статья? Поделить с друзьями:
  • The last word you say
  • The last word you said
  • The last word yesterday
  • The last word year two
  • The last word worth it