Although using Counter
from the collections
library as suggested by @Michael is a better approach, I am adding this answer just to improve your code. (I believe this will be a good answer for a new Python learner.)
From the comment in your code it seems like you want to improve your code. And I think you are able to read the file content in words (while usually I avoid using read()
function and use for line in file_descriptor:
kind of code).
As words
is a string, in for loop, for i in words:
the loop-variable i
is not a word but a char. You are iterating over chars in the string instead of iterating over words in the string words
. To understand this, notice following code snippet:
>>> for i in "Hi, h r u?":
... print i
...
H
i
,
h
r
u
?
>>>
Because iterating over the given string char by chars instead of word by words is not what you wanted to achieve, to iterate words by words you should use the split
method/function from string class in Python.
str.split(str="", num=string.count(str))
method returns a list of all the words in the string, using str as the separator (splits on all whitespace if left unspecified), optionally limiting the number of splits to num.
Notice the code examples below:
Split:
>>> "Hi, how are you?".split()
['Hi,', 'how', 'are', 'you?']
loop with split:
>>> for i in "Hi, how are you?".split():
... print i
...
Hi,
how
are
you?
And it looks like something you need. Except for word Hi,
because split()
, by default, splits by whitespaces so Hi,
is kept as a single string (and obviously) you don’t want that.
To count the frequency of words in the file, one good solution is to use regex. But first, to keep the answer simple I will be using replace()
method. The method str.replace(old, new[, max])
returns a copy of the string in which the occurrences of old have been replaced with new, optionally restricting the number of replacements to max.
Now check code example below to see what I suggested:
>>> "Hi, how are you?".split()
['Hi,', 'how', 'are', 'you?'] # it has , with Hi
>>> "Hi, how are you?".replace(',', ' ').split()
['Hi', 'how', 'are', 'you?'] # , replaced by space then split
loop:
>>> for word in "Hi, how are you?".replace(',', ' ').split():
... print word
...
Hi
how
are
you?
Now, how to count frequency:
One way is use Counter
as @Michael suggested, but to use your approach in which you want to start from empty an dict. Do something like this code sample below:
words = f.read()
wordfreq = {}
for word in .replace(', ',' ').split():
wordfreq[word] = wordfreq.setdefault(word, 0) + 1
# ^^ add 1 to 0 or old value from dict
What am I doing? Because initially wordfreq
is empty you can’t assign it to wordfreq[word]
for the first time (it will raise key exception error). So I used setdefault
dict method.
dict.setdefault(key, default=None)
is similar to get()
, but will set dict[key]=default
if key is not already in dict. So for the first time when a new word comes, I set it with 0
in dict using setdefault
then add 1
and assign to the same dict.
I have written an equivalent code using with open instead of single open
.
with open('~/Desktop/file') as f:
words = f.read()
wordfreq = {}
for word in words.replace(',', ' ').split():
wordfreq[word] = wordfreq.setdefault(word, 0) + 1
print wordfreq
That runs like this:
$ cat file # file is
this is the textfile, and it is used to take words and count
$ python work.py # indented manually
{'and': 2, 'count': 1, 'used': 1, 'this': 1, 'is': 2,
'it': 1, 'to': 1, 'take': 1, 'words': 1,
'the': 1, 'textfile': 1}
Using re.split(pattern, string, maxsplit=0, flags=0)
Just change the for loop: for i in re.split(r"[,s]+", words):
, that should produce the correct output.
Edit: better to find all alphanumeric character because you may have more than one punctuation symbols.
>>> re.findall(r'[w]+', words) # manually indent output
['this', 'is', 'the', 'textfile', 'and',
'it', 'is', 'used', 'to', 'take', 'words', 'and', 'count']
use for loop as: for word in re.findall(r'[w]+', words):
How would I write code without using read()
:
File is:
$ cat file
This is the text file, and it is used to take words and count. And multiple
Lines can be present in this file.
It is also possible that Same words repeated in with capital letters.
Code is:
$ cat work.py
import re
wordfreq = {}
with open('file') as f:
for line in f:
for word in re.findall(r'[w]+', line.lower()):
wordfreq[word] = wordfreq.setdefault(word, 0) + 1
print wordfreq
Used lower()
to convert an upper letter to lower letter.
output:
$python work.py # manually strip output
{'and': 3, 'letters': 1, 'text': 1, 'is': 3,
'it': 2, 'file': 2, 'in': 2, 'also': 1, 'same': 1,
'to': 1, 'take': 1, 'capital': 1, 'be': 1, 'used': 1,
'multiple': 1, 'that': 1, 'possible': 1, 'repeated': 1,
'words': 2, 'with': 1, 'present': 1, 'count': 1, 'this': 2,
'lines': 1, 'can': 1, 'the': 1}
Finding the Frequency of every word from an Input using Dictionary in Python
Hey Coder, In this article we will learn to find the frequency of all words from the input using the Dictionary Data structure.
Dictionary stores the data in the form of key: value, where every key is unique. {} or dict() method can be used to create a dictionary.
We can store a value with a key and using the same key we can extract the value.
In this program, we are going to store different words as keys and the frequencies of each word as the value to the respective key.
The get member of the dictionary returns the value of the key in the dictionary. If there is no such key it returns a default value, without specifying a default value None is returned.
Syntax of get – dict_name.get( key [, default])
In this program, we are going to set the default value to Zero and also increase the value of the key by one when the word occurs one or more times in the Input.
Program: Frequency of every word from an Input using Dictionary in Python
Declare a Dictionary object count to store the set of pairs of word: frequency.
Prompt for the input from the user and store it into a variable input_line.
Split the input_line into a list of words using split() member and store them to the variable list_of_words.
Using a for loop, iterate over each word in list_of_words as a variable word for each iteration.
Using get member of the dictionary count, get the value of the key using count.get(word,0) and increase the value by 1 and update the new value of the key word to count[word].
Finally, display the words and their frequencies using a for loop, iterating through the keys in the count as key variable and printing key and count[key].
count = {} input_line = input("Enter a Line : ") list_of_words = input_line.split() for word in list_of_words: count[word] = count.get(word, 0) + 1 print('Word Frequency') for key in count.keys(): print(key, count[key])
Input :
Today we have learnt how to find the frequency of each and every word of input line from the user using a dictionary in Python
Output :
Word Frequency Today 1 we 1 have 1 learnt 1 how 1 to 1 find 1 the 2 frequency 1 of 2 each 1 and 1 every 1 word 1 input 1 line 1 from 1 user 1 using 1 a 1 dictionary 1 in 1 Python 1
Dictionaries are one of the best data types introduced in the Python. The dictionary holds data in form on Key:value pair. In this article, will present you the solution to Python File Word Count using Dictionary.
Text File
Acquire the text file from which you want to count repetition of each and every word. For the testing purpose, create any file with some of your favourite story or anything.
Python File Word Count using Dictionary
Let’s work step by step on building this game. In this program, we are going to create a function. The function accepts the file name as an parameter.
def word_count(f): #Create empty dictionary. This will store the words and its count d = dict() # open file for reading fl = open(f) # read file to an variable fl1 = fl.read() # split each word based on the ' ' (space) for c in fl1.split(' '): if c not in d: d[c] = 1 else: d[c] += 1 return d
Above function will return dictionary with words and its count in for of key:value .
Call Function by Passing Parameter
Call the function and assign results to variable.
word_cnt = count_word('text.txt') word_cnt_sorted = sorted(h.items(), key=lambda x:x[1],reverse=True) print word_cnt_sorted
Tutorialsrack
08/05/2020
Python
In this python program, we will learn how to count words in a string and put it into a dictionary. In this program, we will take the string input from the user and count the words and their frequency and put it into a dictionary as key-value pairs. In this program, we are doing this in two ways. The first way is using a python built-in count()
, zip()
, dict()
and split()
functions and the second way is using a count()
and split()
functions.
Here is the source code of the program to count words in a string and put it into a dictionary.
Program 1: Python Program to Count Words in a String using Dictionary Using count(), dist() and zip() function
In this program, we used the split() function to split the string into words and count() is used to count the number of words in a given string and the dict()
function is used to create the dictionary and the zip() function is used to make an iterator that aggregates elements from each of the iterables.
# Python Program to Count Words in a String using Dictionary
# Using count(), dist() and zip() function
# Take the input from the user
string = input("Enter any String: ")
words = []
# To Avoid Case-sensitiveness of the string
words = string.lower().split()
frequency = [words.count(i) for i in words]
myDict = dict(zip(words, frequency))
print("Dictionary Items : ", myDict)
Enter any String: The first second was alright, but the second second was tough.
Dictionary Items : {‘the’: 2, ‘first’: 1, ‘second’: 3, ‘was’: 2, ‘alright,’: 1, ‘but’: 1, ‘tough.’: 1}
Program 2: Python Program to Count Words in a String using Dictionary Using For Loop and count() Function
In this program, we used the split()
function to split the string into word list and use a for
loop for iterating the words list and the count()
function used to count the frequency of the words in a list.
# Python Program to Count Words in a String using Dictionary
# Using For Loop and count() Function
# Take the Input from the User
string = input("Enter any String: ")
words = []
# To Avoid Case-sensitiveness of the string
words = string.lower().split()
myDict = {}
for key in words:
myDict[key] = words.count(key)
# Print the Input
print("Dictionary Items: ", myDict)
Enter any String: The first second was alright, but the second second was tough.
Dictionary Items: {‘the’: 2, ‘first’: 1, ‘second’: 3, ‘was’: 2, ‘alright,’: 1, ‘but’: 1, ‘tough.’: 1}
Many times it is required to count the occurrence of each word in a text file. To achieve so, we make use of a dictionary object that stores the word as the key and its count as the corresponding value. We iterate through each word in the file and add it to the dictionary with a count of 1. If the word is already present in the dictionary we increment its count by 1.
File sample.txt
First, we create a text file in which we want to count the words in Python. Let this file be sample.txt with the following contents
Mango banana apple pear Banana grapes strawberry Apple pear mango banana Kiwi apple mango strawberry
Example 1: Count occurrences of each word in a given text file
Here, we use a Python loop to read each line, and from that line, we are converting each line to lower for the unique count and then split each word to count its number.
Python3
text
=
open
(
"sample.txt"
,
"r"
)
d
=
dict
()
for
line
in
text:
line
=
line.strip()
line
=
line.lower()
words
=
line.split(
" "
)
for
word
in
words:
if
word
in
d:
d[word]
=
d[word]
+
1
else
:
d[word]
=
1
for
key
in
list
(d.keys()):
print
(key,
":"
, d[key])
Output:
mango : 3 banana : 3 apple : 3 pear : 2 grapes : 1 strawberry : 2 kiwi : 1
Example 2: Count occurrences of specific words in a given text file
In this example, we will count the number of “apples” present in the text file.
Python3
word
=
"apple"
count
=
0
with
open
(
"temp.txt"
,
'r'
) as f:
for
line
in
f:
words
=
line.split()
for
i
in
words:
if
(i
=
=
word):
count
=
count
+
1
print
(
"Occurrences of the word"
, word,
":"
, count)
Output:
Occurrences of the word apple: 2
Example 3: Count total occurrences of words in a given text file
In this example, we will count the total number of words present in a text file.
Python3
count
=
0
f
=
open
(
"sample.txt"
,
"r"
)
for
line
in
f:
word
=
line.split(
" "
)
count
+
=
len
(word)
print
(
"Total Number of Words: "
+
str
(count))
f.close()
Output:
Total Number of Words: 15
Consider the files with punctuation
Sample.txt:
Mango! banana apple pear. Banana, grapes strawberry. Apple- pear mango banana. Kiwi "apple" mango strawberry.
Code:
Python3
import
string
text
=
open
(
"sample.txt"
,
"r"
)
d
=
dict
()
for
line
in
text:
line
=
line.strip()
line
=
line.lower()
line
=
line.translate(line.maketrans("
", "
", string.punctuation))
words
=
line.split(
" "
)
for
word
in
words:
if
word
in
d:
d[word]
=
d[word]
+
1
else
:
d[word]
=
1
for
key
in
list
(d.keys()):
print
(key,
" "
, d[key])
Output:
mango : 3 banana : 3 apple : 3 pear : 2 grapes : 1 strawberry : 2 kiwi : 1