Word count in a dictionary

Although using Counter from the collections library as suggested by @Michael is a better approach, I am adding this answer just to improve your code. (I believe this will be a good answer for a new Python learner.)

From the comment in your code it seems like you want to improve your code. And I think you are able to read the file content in words (while usually I avoid using read() function and use for line in file_descriptor: kind of code).

As words is a string, in for loop, for i in words: the loop-variable i is not a word but a char. You are iterating over chars in the string instead of iterating over words in the string words. To understand this, notice following code snippet:

>>> for i in "Hi, h r u?":
...  print i
... 
H
i
,
 
h
 
r
 
u
?
>>> 

Because iterating over the given string char by chars instead of word by words is not what you wanted to achieve, to iterate words by words you should use the split method/function from string class in Python.
str.split(str="", num=string.count(str)) method returns a list of all the words in the string, using str as the separator (splits on all whitespace if left unspecified), optionally limiting the number of splits to num.

Notice the code examples below:

Split:

>>> "Hi, how are you?".split()
['Hi,', 'how', 'are', 'you?']

loop with split:

>>> for i in "Hi, how are you?".split():
...  print i
... 
Hi,
how
are
you?

And it looks like something you need. Except for word Hi, because split(), by default, splits by whitespaces so Hi, is kept as a single string (and obviously) you don’t want that.

To count the frequency of words in the file, one good solution is to use regex. But first, to keep the answer simple I will be using replace() method. The method str.replace(old, new[, max]) returns a copy of the string in which the occurrences of old have been replaced with new, optionally restricting the number of replacements to max.

Now check code example below to see what I suggested:

>>> "Hi, how are you?".split()
['Hi,', 'how', 'are', 'you?'] # it has , with Hi
>>> "Hi, how are you?".replace(',', ' ').split()
['Hi', 'how', 'are', 'you?'] # , replaced by space then split

loop:

>>> for word in "Hi, how are you?".replace(',', ' ').split():
...  print word
... 
Hi
how
are
you?

Now, how to count frequency:

One way is use Counter as @Michael suggested, but to use your approach in which you want to start from empty an dict. Do something like this code sample below:

words = f.read()
wordfreq = {}
for word in .replace(', ',' ').split():
    wordfreq[word] = wordfreq.setdefault(word, 0) + 1
    #                ^^ add 1 to 0 or old value from dict 

What am I doing? Because initially wordfreq is empty you can’t assign it to wordfreq[word] for the first time (it will raise key exception error). So I used setdefault dict method.

dict.setdefault(key, default=None) is similar to get(), but will set dict[key]=default if key is not already in dict. So for the first time when a new word comes, I set it with 0 in dict using setdefault then add 1 and assign to the same dict.

I have written an equivalent code using with open instead of single open.

with open('~/Desktop/file') as f:
    words = f.read()
    wordfreq = {}
    for word in words.replace(',', ' ').split():
        wordfreq[word] = wordfreq.setdefault(word, 0) + 1
print wordfreq

That runs like this:

$ cat file  # file is 
this is the textfile, and it is used to take words and count
$ python work.py  # indented manually 
{'and': 2, 'count': 1, 'used': 1, 'this': 1, 'is': 2, 
 'it': 1, 'to': 1, 'take': 1, 'words': 1, 
 'the': 1, 'textfile': 1}

Using re.split(pattern, string, maxsplit=0, flags=0)

Just change the for loop: for i in re.split(r"[,s]+", words):, that should produce the correct output.

Edit: better to find all alphanumeric character because you may have more than one punctuation symbols.

>>> re.findall(r'[w]+', words) # manually indent output  
['this', 'is', 'the', 'textfile', 'and', 
  'it', 'is', 'used', 'to', 'take', 'words', 'and', 'count']

use for loop as: for word in re.findall(r'[w]+', words):

How would I write code without using read():

File is:

$ cat file
This is the text file, and it is used to take words and count. And multiple
Lines can be present in this file.
It is also possible that Same words repeated in with capital letters.

Code is:

$ cat work.py
import re
wordfreq = {}
with open('file') as f:
    for line in f:
        for word in re.findall(r'[w]+', line.lower()):
            wordfreq[word] = wordfreq.setdefault(word, 0) + 1
  
print wordfreq

Used lower() to convert an upper letter to lower letter.

output:

$python work.py  # manually strip output  
{'and': 3, 'letters': 1, 'text': 1, 'is': 3, 
 'it': 2, 'file': 2, 'in': 2, 'also': 1, 'same': 1, 
 'to': 1, 'take': 1, 'capital': 1, 'be': 1, 'used': 1, 
 'multiple': 1, 'that': 1, 'possible': 1, 'repeated': 1, 
 'words': 2, 'with': 1, 'present': 1, 'count': 1, 'this': 2, 
 'lines': 1, 'can': 1, 'the': 1}

Finding the Frequency of every word from an Input using Dictionary in Python

Hey Coder, In this article we will learn to find the frequency of all words from the input using the Dictionary Data structure.

Dictionary stores the data in the form of key: value, where every key is unique. {} or dict() method can be used to create a dictionary.
We can store a value with a key and using the same key we can extract the value.

In this program, we are going to store different words as keys and the frequencies of each word as the value to the respective key.

The get member of the dictionary returns the value of the key in the dictionary. If there is no such key it returns a default value, without specifying a default value None is returned.

Syntax of get   –  dict_name.get( key [, default])

In this program, we are going to set the default value to Zero and also increase the value of the key by one when the word occurs one or more times in the Input.

Program: Frequency of every word from an Input using Dictionary in Python

Declare a Dictionary object count to store the set of pairs of word: frequency.

Prompt for the input from the user and store it into a variable input_line.

Split the input_line into a list of words using split() member and store them to the variable list_of_words.

Using a for loop, iterate over each word in list_of_words as a variable word for each iteration.

Using get member of the dictionary count, get the value of the key using count.get(word,0) and increase the value by 1 and update the new value of the key word to count[word].

Finally, display the words and their frequencies using a for loop, iterating through the keys in the count as key variable and printing key and count[key].

count = {}
input_line = input("Enter a Line : ")
list_of_words = input_line.split()
for word in list_of_words:
    count[word] = count.get(word, 0) + 1
print('Word Frequency')
for key in count.keys():
    print(key, count[key])

Input :

Today we have learnt how to find the frequency of each and every word of input line from the user using a dictionary in Python

Output :

Word Frequency
Today 1
we 1
have 1
learnt 1
how 1
to 1
find 1
the 2
frequency 1
of 2
each 1
and 1
every 1
word 1
input 1
line 1
from 1
user 1
using 1
a 1
dictionary 1
in 1
Python 1

Dictionaries are one of the best data types introduced in the Python. The dictionary holds data in form on Key:value pair. In this article, will present you the solution to Python File Word Count using Dictionary.

Text File

Acquire the text file from which you want to count repetition of each and every word. For the testing purpose, create any file with some of your favourite story or anything.

Python File Word Count using Dictionary

Let’s work step by step on building this game. In this program, we are going to create a function. The function accepts the file name as an parameter.

def word_count(f):
 #Create empty dictionary. This will store the words and its count
 d = dict()

# open file for reading
 fl = open(f)

# read file to an variable
 fl1 = fl.read()

# split each word based on the ' ' (space) 
 for c in fl1.split(' '):
 if c not in d:
 d[c] = 1
 else:
 d[c] += 1
 return d

Above function will return dictionary with words and its count in for of key:value .

Call Function by Passing Parameter

Call the function and assign results to variable.

word_cnt = count_word('text.txt')
word_cnt_sorted = sorted(h.items(), key=lambda x:x[1],reverse=True)
print word_cnt_sorted

Tutorialsrack
08/05/2020



Python


In this python program, we will learn how to count words in a string and put it into a dictionary. In this program, we will take the string input from the user and count the words and their frequency and put it into a dictionary as key-value pairs. In this program, we are doing this in two ways. The first way is using a python built-in count(), zip(), dict() and split() functions and the second way is using a count() and split() functions.

Here is the source code of the program to count words in a string and put it into a dictionary.

Program 1: Python Program to Count Words in a String using Dictionary Using count(), dist() and zip() function

In this program, we used the split() function to split the string into words and count() is used to count the number of words in a given string and the dict() function is used to create the dictionary and the zip() function is used to make an iterator that aggregates elements from each of the iterables.

# Python Program to Count Words in a String using Dictionary
# Using count(), dist() and zip() function

# Take the input from the user
string = input("Enter any String: ")
words = []

# To Avoid Case-sensitiveness of the string
words = string.lower().split()

frequency = [words.count(i) for i in words]

myDict = dict(zip(words, frequency))
print("Dictionary Items :  ",  myDict)

Enter any String: The first second was alright, but the second second was tough.

Dictionary Items :   {‘the’: 2, ‘first’: 1, ‘second’: 3, ‘was’: 2, ‘alright,’: 1, ‘but’: 1, ‘tough.’: 1}

Program 2: Python Program to Count Words in a String using Dictionary Using For Loop and count() Function

In this program, we used the split() function to split the string into word list and use a for loop for iterating the words list and the count() function used to count the frequency of the words in a list.

# Python Program to Count Words in a String using Dictionary
# Using For Loop and count() Function

# Take the Input from the User
string = input("Enter any String: ")
words = []

# To Avoid Case-sensitiveness of the string
words = string.lower().split()
myDict = {}
for key in words:
    myDict[key] = words.count(key)

# Print the Input
print("Dictionary Items:  ",  myDict)

Enter any String: The first second was alright, but the second second was tough.

Dictionary Items:   {‘the’: 2, ‘first’: 1, ‘second’: 3, ‘was’: 2, ‘alright,’: 1, ‘but’: 1, ‘tough.’: 1}

Many times it is required to count the occurrence of each word in a text file. To achieve so, we make use of a dictionary object that stores the word as the key and its count as the corresponding value. We iterate through each word in the file and add it to the dictionary with a count of 1. If the word is already present in the dictionary we increment its count by 1. 

File sample.txt

First, we create a text file in which we want to count the words in Python. Let this file be sample.txt with the following contents

Mango banana apple pear
Banana grapes strawberry
Apple pear mango banana
Kiwi apple mango strawberry

Example 1: Count occurrences of each word in a given text file

Here, we use a Python loop to read each line, and from that line, we are converting each line to lower for the unique count and then split each word to count its number.

Python3

text = open("sample.txt", "r")

d = dict()

for line in text:

    line = line.strip()

    line = line.lower()

    words = line.split(" ")

    for word in words:

        if word in d:

            d[word] = d[word] + 1

        else:

            d[word] = 1

for key in list(d.keys()):

    print(key, ":", d[key])

Output:

mango : 3
banana : 3
apple : 3
pear : 2
grapes : 1
strawberry : 2
kiwi : 1

Example 2: Count occurrences of specific words in a given text file

In this example, we will count the number of “apples” present in the text file.

Python3

word = "apple"

count = 0

with open("temp.txt", 'r') as f:

    for line in f:

        words = line.split()

        for i in words:

            if(i==word):

                count=count+1

print("Occurrences of the word", word, ":", count)

Output:

Occurrences of the word apple: 2

Example 3: Count total occurrences of words in a given text file

In this example, we will count the total number of words present in a text file.

Python3

count = 0

f = open("sample.txt", "r")

for line in f:

    word = line.split(" ")

    count += len(word)

print("Total Number of Words: " + str(count))

f.close()

Output:

Total Number of Words: 15

Consider the files with punctuation

Sample.txt:

Mango! banana apple pear.
Banana, grapes strawberry.
Apple- pear mango banana.
Kiwi "apple" mango strawberry.

Code:

Python3

import string

text = open("sample.txt", "r")

d = dict()

for line in text:

    line = line.strip()

    line = line.lower()

    line = line.translate(line.maketrans("", "", string.punctuation))

    words = line.split(" ")

    for word in words:

        if word in d:

            d[word] = d[word] + 1

        else:

            d[word] = 1

for key in list(d.keys()):

    print(key, " ", d[key])

Output:

mango : 3
banana : 3
apple : 3
pear : 2
grapes : 1
strawberry : 2
kiwi : 1

Понравилась статья? Поделить с друзьями:
  • Word count from text
  • Word connectors not working
  • Word count from excel
  • Word connectors in sentences
  • Word connectors for essays