I would like to replace words in a string sentence such as:
What $noun$ is $verb$?
What’s the regular expression to replace the characters in ‘$ $’ (inclusive) with actual nouns/verbs?
asked Sep 21, 2012 at 21:07
You don’t need a regular expression for that. I would do
string = "What $noun$ is $verb$?"
print string.replace("$noun$", "the heck")
Only use regular expressions when needed. It’s generally slower.
answered Sep 21, 2012 at 21:15
ZtyxZtyx
13.7k15 gold badges78 silver badges113 bronze badges
3
Given that you are free to modify $noun$
etc. to your liking, best practise to do this nowadays is probably to using the format
function on a string:
"What {noun} is {verb}?".format(noun="XXX", verb="YYY")
answered Oct 19, 2015 at 8:42
ZtyxZtyx
13.7k15 gold badges78 silver badges113 bronze badges
In [1]: import re
In [2]: re.sub('$noun$', 'the heck', 'What $noun$ is $verb$?')
Out[2]: 'What the heck is $verb$?'
answered Sep 21, 2012 at 21:13
Roland SmithRoland Smith
42k3 gold badges63 silver badges91 bronze badges
use a dictionary to hold the regular expression pattern and value. use the re.sub to replace the tokens.
dict = {
"($noun$)" : "ABC",
"($verb$)": "DEF"
}
new_str=str
for key,value in dict.items():
new_str=(re.sub(key, value, new_str))
print(new_str)
output:
What ABC is DEF?
answered Jul 2, 2021 at 20:09
Golden LionGolden Lion
3,6702 gold badges25 silver badges34 bronze badges
If you’re looking for ways to remove or replace all or part of a string in Python, then this tutorial is for you. You’ll be taking a fictional chat room transcript and sanitizing it using both the .replace()
method and the re.sub()
function.
In Python, the .replace()
method and the re.sub()
function are often used to clean up text by removing strings or substrings or replacing them. In this tutorial, you’ll be playing the role of a developer for a company that provides technical support through a one-to-one text chat. You’re tasked with creating a script that’ll sanitize the chat, removing any personal data and replacing any swear words with emoji.
You’re only given one very short chat transcript:
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!
Even though this transcript is short, it’s typical of the type of chats that agents have all the time. It has user identifiers, ISO time stamps, and messages.
In this case, the client johndoe
filed a complaint, and company policy is to sanitize and simplify the transcript, then pass it on for independent evaluation. Sanitizing the message is your job!
The first thing you’ll want to do is to take care of any swear words.
How to Remove or Replace a Python String or Substring
The most basic way to replace a string in Python is to use the .replace()
string method:
>>>
>>> "Fake Python".replace("Fake", "Real")
'Real Python'
As you can see, you can chain .replace()
onto any string and provide the method with two arguments. The first is the string that you want to replace, and the second is the replacement.
Now it’s time to apply this knowledge to the transcript:
>>>
>>> transcript = """
... [support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
... [johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
... [support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
... [johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!"""
>>> transcript.replace("BLASTED", "😤")
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!
Loading the transcript as a triple-quoted string and then using the .replace()
method on one of the swear words works fine. But there’s another swear word that’s not getting replaced because in Python, the string needs to match exactly:
>>>
>>> "Fake Python".replace("fake", "Real")
'Fake Python'
As you can see, even if the casing of one letter doesn’t match, it’ll prevent any replacements. This means that if you’re using the .replace()
method, you’ll need to call it various times with the variations. In this case, you can just chain on another call to .replace()
:
>>>
>>> transcript.replace("BLASTED", "😤").replace("Blast", "😤")
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : 😤! You're right!
Success! But you’re probably thinking that this isn’t the best way to do this for something like a general-purpose transcription sanitizer. You’ll want to move toward some way of having a list of replacements, instead of having to type out .replace()
each time.
Set Up Multiple Replacement Rules
There are a few more replacements that you need to make to the transcript to get it into a format acceptable for independent review:
- Shorten or remove the time stamps
- Replace the usernames with Agent and Client
Now that you’re starting to have more strings to replace, chaining on .replace()
is going to get repetitive. One idea could be to keep a list of tuples, with two items in each tuple. The two items would correspond to the arguments that you need to pass into the .replace()
method—the string to replace and the replacement string:
# transcript_multiple_replace.py
REPLACEMENTS = [
("BLASTED", "😤"),
("Blast", "😤"),
("2022-08-24T", ""),
("+00:00", ""),
("[support_tom]", "Agent "),
("[johndoe]", "Client"),
]
transcript = """
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!
"""
for old, new in REPLACEMENTS:
transcript = transcript.replace(old, new)
print(transcript)
In this version of your transcript-cleaning script, you created a list of replacement tuples, which gives you a quick way to add replacements. You could even create this list of tuples from an external CSV file if you had loads of replacements.
You then iterate over the list of replacement tuples. In each iteration, you call .replace()
on the string, populating the arguments with the old
and new
variables that have been unpacked from each replacement tuple.
With this, you’ve made a big improvement in the overall readability of the transcript. It’s also easier to add replacements if you need to. Running this script reveals a much cleaner transcript:
$ python transcript_multiple_replace.py
Agent 10:02:23 : What can I help you with?
Client 10:03:15 : I CAN'T CONNECT TO MY 😤 ACCOUNT
Agent 10:03:30 : Are you sure it's not your caps lock?
Client 10:04:03 : 😤! You're right!
That’s a pretty clean transcript. Maybe that’s all you need. But if your inner automator isn’t happy, maybe it’s because there are still some things that may be bugging you:
- Replacing the swear words won’t work if there’s another variation using -ing or a different capitalization, like BLAst.
- Removing the date from the time stamp currently only works for August 24, 2022.
- Removing the full time stamp would involve setting up replacement pairs for every possible time—not something you’re too keen on doing.
- Adding the space after Agent in order to line up your columns works but isn’t very general.
If these are your concerns, then you may want to turn your attention to regular expressions.
Leverage re.sub()
to Make Complex Rules
Whenever you’re looking to do any replacing that’s slightly more complex or needs some wildcards, you’ll usually want to turn your attention toward regular expressions, also known as regex.
Regex is a sort of mini-language made up of characters that define a pattern. These patterns, or regexes, are typically used to search for strings in find and find and replace operations. Many programming languages support regex, and it’s widely used. Regex will even give you superpowers.
In Python, leveraging regex means using the re
module’s sub()
function and building your own regex patterns:
# transcript_regex.py
import re
REGEX_REPLACEMENTS = [
(r"blastw*", "😤"),
(r" [-T:+d]{25}", ""),
(r"[supportw*]", "Agent "),
(r"[johndoe]", "Client"),
]
transcript = """
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!
"""
for old, new in REGEX_REPLACEMENTS:
transcript = re.sub(old, new, transcript, flags=re.IGNORECASE)
print(transcript)
While you can mix and match the sub()
function with the .replace()
method, this example only uses sub()
, so you can see how it’s used. You’ll note that you can replace all variations of the swear word by using just one replacement tuple now. Similarly, you’re only using one regex for the full time stamp:
$ python transcript_regex.py
Agent : What can I help you with?
Client : I CAN'T CONNECT TO MY 😤 ACCOUNT
Agent : Are you sure it's not your caps lock?
Client : 😤! You're right!
Now your transcript has been completely sanitized, with all noise removed! How did that happen? That’s the magic of regex.
The first regex pattern, "blastw*"
, makes use of the w
special character, which will match alphanumeric characters and underscores. Adding the *
quantifier directly after it will match zero or more characters of w
.
Another vital part of the first pattern is that the re.IGNORECASE
flag makes it a case-insensitive pattern. So now, any substring containing blast
, regardless of capitalization, will be matched and replaced.
The second regex pattern uses character sets and quantifiers to replace the time stamp. You often use character sets and quantifiers together. A regex pattern of [abc]
, for example, will match one character of a
, b
, or c
. Putting a *
directly after it would match zero or more characters of a
, b
, or c
.
There are more quantifiers, though. If you used [abc]{10}
, it would match exactly ten characters of a
, b
or c
in any order and any combination. Also note that repeating characters is redundant, so [aa]
is equivalent to [a]
.
For the time stamp, you use an extended character set of [-T:+d]
to match all the possible characters that you might find in the time stamp. Paired with the quantifier {25}
, this will match any possible time stamp, at least until the year 10,000.
The time stamp regex pattern allows you to select any possible date in the time stamp format. Seeing as the the times aren’t important for the independent reviewer of these transcripts, you replace them with an empty string. It’s possible to write a more advanced regex that preserves the time information while removing the date.
The third regex pattern is used to select any user string that starts with the keyword "support"
. Note that you escape () the square bracket (
[
) because otherwise the keyword would be interpreted as a character set.
Finally, the last regex pattern selects the client username string and replaces it with "Client"
.
With regex, you can drastically cut down the number of replacements that you have to write out. That said, you still may have to come up with many patterns. Seeing as regex isn’t the most readable of languages, having lots of patterns can quickly become hard to maintain.
Thankfully, there’s a neat trick with re.sub()
that allows you to have a bit more control over how replacement works, and it offers a much more maintainable architecture.
Use a Callback With re.sub()
for Even More Control
One trick that Python and sub()
have up their sleeves is that you can pass in a callback function instead of the replacement string. This gives you total control over how to match and replace.
To get started building this version of the transcript-sanitizing script, you’ll use a basic regex pattern to see how using a callback with sub()
works:
# transcript_regex_callback.py
import re
transcript = """
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!
"""
def sanitize_message(match):
print(match)
re.sub(r"[-T:+d]{25}", sanitize_message, transcript)
The regex pattern that you’re using will match the time stamps, and instead of providing a replacement string, you’re passing in a reference to the sanitize_message()
function. Now, when sub()
finds a match, it’ll call sanitize_message()
with a match object as an argument.
Since sanitize_message()
just prints the object that it’s received as an argument, when running this, you’ll see the match objects being printed to the console:
$ python transcript_regex_callback.py
<re.Match object; span=(15, 40), match='2022-08-24T10:02:23+00:00'>
<re.Match object; span=(79, 104), match='2022-08-24T10:03:15+00:00'>
<re.Match object; span=(159, 184), match='2022-08-24T10:03:30+00:00'>
<re.Match object; span=(235, 260), match='2022-08-24T10:04:03+00:00'>
A match object is one of the building blocks of the re
module. The more basic re.match()
function returns a match object. sub()
doesn’t return any match objects but uses them behind the scenes.
Because you get this match object in the callback, you can use any of the information contained within it to build the replacement string. Once it’s built, you return the new string, and sub()
will replace the match with the returned string.
Apply the Callback to the Script
In your transcript-sanitizing script, you’ll make use of the .groups()
method of the match object to return the contents of the two capture groups, and then you can sanitize each part in its own function or discard it:
# transcript_regex_callback.py
import re
ENTRY_PATTERN = (
r"[(.+)] " # User string, discarding square brackets
r"[-T:+d]{25} " # Time stamp
r": " # Separator
r"(.+)" # Message
)
BAD_WORDS = ["blast", "dash", "beezlebub"]
CLIENTS = ["johndoe", "janedoe"]
def censor_bad_words(message):
for word in BAD_WORDS:
message = re.sub(rf"{word}w*", "😤", message, flags=re.IGNORECASE)
return message
def censor_users(user):
if user.startswith("support"):
return "Agent"
elif user in CLIENTS:
return "Client"
else:
raise ValueError(f"unknown client: '{user}'")
def sanitize_message(match):
user, message = match.groups()
return f"{censor_users(user):<6} : {censor_bad_words(message)}"
transcript = """
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!
"""
print(re.sub(ENTRY_PATTERN, sanitize_message, transcript))
Instead of having lots of different regexes, you can have one top level regex that can match the whole line, dividing it up into capture groups with brackets (()
). The capture groups have no effect on the actual matching process, but they do affect the match object that results from the match:
[(.+)]
matches any sequence of characters wrapped in square brackets. The capture group picks out the username string, for instancejohndoe
.[-T:+d]{25}
matches the time stamp, which you explored in the last section. Since you won’t be using the time stamp in the final transcript, it’s not captured with brackets.:
matches a literal colon. The colon is used as a separator between the message metadata and the message itself.(.+)
matches any sequence of characters until the end of the line, which will be the message.
The content of the capturing groups will be available as separate items in the match object by calling the .groups()
method, which returns a tuple of the matched strings.
The two groups are the user string and the message. The .groups()
method returns them as a tuple of strings. In the sanitize_message()
function, you first use unpacking to assign the two strings to variables:
def sanitize_message(match):
user, message = match.groups()
return f"{censor_users(user):<6} : {censor_bad_words(message)}"
Note how this architecture allows a very broad and inclusive regex at the top level, and then lets you supplement it with more precise regexes within the replacement callback.
The sanitize_message()
function makes use of two functions to clean up usernames and bad words. It additionally uses f-strings to justify the messages. Note how censor_bad_words()
uses a dynamically created regex while censor_users()
relies on more basic string processing.
This is now looking like a good first prototype for a transcript-sanitizing script! The output is squeaky clean:
$ python transcript_regex_callback.py
Agent : What can I help you with?
Client : I CAN'T CONNECT TO MY 😤 ACCOUNT
Agent : Are you sure it's not your caps lock?
Client : 😤! You're right!
Nice! Using sub()
with a callback gives you far more flexibility to mix and match different methods and build regexes dynamically. This structure also gives you the most room to grow when your bosses or clients inevitably change their requirements on you!
Conclusion
In this tutorial, you’ve learned how to replace strings in Python. Along the way, you’ve gone from using the basic Python .replace()
string method to using callbacks with re.sub()
for absolute control. You’ve also explored some regex patterns and deconstructed them into a better architecture to manage a replacement script.
With all that knowledge, you’ve successfully cleaned a chat transcript, which is now ready for independent review. Not only that, but your transcript-sanitizing script has plenty of room to grow.
String replace() in Python returns a copy of the string where occurrences of a substring are replaced with another substring.
Syntax of String replace() method
The replace() method in Python strings has the following syntax:
Syntax: string.replace(old, new, count)
Parameters:
- old – old substring you want to replace.
- new – new substring which would replace the old substring.
- count – (Optional ) the number of times you want to replace the old substring with the new substring.
Return Value : It returns a copy of the string where all occurrences of a substring are replaced with another substring.
Examples of Python replace() Methods
Replace all Instances of a single character using replace() in Python
In this example, we are only replacing a single character from a given string. The Python replace() method is case-sensitive, and therefore it performs a case-sensitive substring substitution, i.e. R in FOR is unchanged.
Python3
string
=
"grrks FOR grrks"
new_string
=
string.replace(
"r"
,
"e"
)
print
(string)
print
(new_string)
Output :
grrks FOR grrks geeks FOR geeks
Replace all Instances of a String using replace() in Python
Here, we will replace all the geeks with GeeksforGeeks using replace() function.
Python3
string
=
"geeks for geeks ngeeks for geeks"
print
(string)
print
(string.replace(
"geeks"
,
"GeeksforGeeks"
))
Output :
geeks for geeks geeks for geeks GeeksforGeeks for GeeksforGeeks GeeksforGeeks for GeeksforGeeks
Replace only a certain number of Instances using replace() in Python
In this example, we are replacing certain numbers of words. i.e. “ek” with “a” with count=3.
Python3
string
=
"geeks for geeks geeks geeks geeks"
print
(string.replace(
"e"
,
"a"
))
print
(string.replace(
"ek"
,
"a"
,
3
))
Output:
gaaks for gaaks gaaks gaaks gaaks geas for geas geas geeks geeks
Using a list comprehension and the join() method:
Approach:
Split the original string into a list of substrings using the split() method.
Use a list comprehension to replace each occurrence of old_substring with new_substring.
Join the list of substrings back into a string using the join() method.
Python3
my_string
=
"geeks for geeks "
old_substring
=
"k"
new_substring
=
"x"
split_list
=
my_string.split(old_substring)
new_list
=
[new_substring
if
i <
len
(split_list)
-
1
else
''
for
i
in
range
(
len
(split_list)
-
1
)]
new_string
=
''.join([split_list[i]
+
new_list[i]
for
i
in
range
(
len
(split_list)
-
1
)]
+
[split_list[
-
1
]])
print
(new_string)
Time Complexity: O(n)
Space Complexity: O(n)
In this article you’ll see how to use Python’s .replace()
method to perform substring substiution.
You’ll also see how to perform case-insensitive substring substitution.
Let’s get started!
What does the .replace()
Python method do?
When using the .replace()
Python method, you are able to replace every instance of one specific character with a new one. You can even replace a whole string of text with a new line of text that you specify.
The .replace()
method returns a copy of a string. This means that the old substring remains the same, but a new copy gets created – with all of the old text having been replaced by the new text.
How does the .replace()
Python method work? A Syntax Breakdown
The syntax for the .replace()
method looks like this:
string.replace(old_text, new_text, count)
Let’s break it down:
old_text
is the first required parameter that.replace()
accepts. It’s the old character or text that you want to replace. Enclose this in quotation marks.new_text
is the second required parameter that.replace()
accepts. It’s the new character or text which you want to replace the old character/text with. This parameter also needs to be enclosed in quotation marks.count
is the optional third parameter that.replace()
accepts. By default,.replace()
will replace all instances of the substring. However, you can usecount
to specify the number of occurrences you want to be replaced.
Python .replace()
Method Code Examples
How to Replace All Instances of a Single Character
To change all instances of a single character, you would do the following:
phrase = "I like to learn coding on the go"
# replace all instances of 'o' with 'a'
substituted_phrase = phrase.replace("o", "a" )
print(phrase)
print(substituted_phrase)
#output
#I like to learn coding on the go
#I like ta learn cading an the ga
In the example above, each word that contained the character o
is replaced with the character a
.
In that example there were four instances of the character o
. Specifically, it was found in the words to
, coding
, on
, and go
.
What if you only wanted to change two words, like to
and coding
, to contain a
instead of o
?
How to Replace Only a Certain Number of Instances of a Single Character
To change only two instances of a single character, you would use the count
parameter and set it to two:
phrase = "I like to learn coding on the go"
# replace only the first two instances of 'o' with 'a'
substituted_phrase = phrase.replace("o", "a", 2 )
print(phrase)
print(substituted_phrase)
#output
#I like to learn coding on the go
#I like ta learn cading on the go
If you only wanted to change the first instance of a single character, you would set the count
parameter to one:
phrase = "I like to learn coding on the go"
# replace only the first instance of 'o' with 'a'
substituted_phrase = phrase.replace("o", "a", 1 )
print(phrase)
print(substituted_phrase)
#output
#I like to learn coding on the go
#I like ta learn coding on the go
How to Replace All Instances of a String
To change more than one character, the process looks similar.
phrase = "The sun is strong today. I don't really like sun."
#replace all instances of the word 'sun' with 'wind'
substituted_phrase = phrase.replace("sun", "wind")
print(phrase)
print(substituted_phrase)
#output
#The sun is strong today. I don't really like sun.
#The wind is strong today. I don't really like wind.
In the example above, the word sun
was replaced with the word wind
.
How to Replace Only a Certain Number of Instances of a String
If you wanted to change only the first instance of sun
to wind
, you would use the count
parameter and set it to one.
phrase = "The sun is strong today. I don't really like sun."
#replace only the first instance of the word 'sun' with 'wind'
substituted_phrase = phrase.replace("sun", "wind", 1)
print(phrase)
print(substituted_phrase)
#output
#The sun is strong today. I don't really like sun.
#The wind is strong today. I don't really like sun.
How to Perform Case-Insensitive Substring Substitution in Python
Let’s take a look at another example.
phrase = "I am learning Ruby. I really enjoy the ruby programming language!"
#replace the text "Ruby" with "Python"
substituted_text = phrase.replace("Ruby", "Python")
print(substituted_text)
#output
#I am learning Python. I really enjoy the ruby programming language!
In this case, what I really wanted to do was to replace all instances of the word Ruby
with Python
.
However, there was the word ruby
with a lowercase r
, which I would also like to change.
Because the first letter was in lowercase, and not uppercase as I specified with Ruby
, it remained the same and didn’t change to Python
.
The .replace()
method is case-sensitive, and therefore it performs a case-sensitive substring substitution.
In order to perform a case-insensitive substring substitution you would have to do something different.
You would need to use the re.sub()
function and use the re.IGNORECASE
flag.
To use re.sub()
you need to:
- Use the
re
module, viaimport re
. - Speficy a regular expression
pattern
. - Mention with what you want to
replace
the pattern. - Mention the
string
you want to perform this operation on. - Optionally, specify the
count
parameter to make the replacement more precise and specify the maximum number of replacements you want to take place. - The
re.IGNORECASE
flag tells the regular expression to perform a case-insensitive match.
So, all together the syntax looks like this:
import re
re.sub(pattern, replace, string, count, flags)
Taking the example from earlier:
phrase = "I am learning Ruby. I really enjoy the ruby programming language!"
This is how I would replace both Ruby
and ruby
with Python
:
import re
phrase = "I am learning Ruby. I really enjoy the ruby programming language!"
phrase = re.sub("Ruby","Python", phrase, flags=re.IGNORECASE)
print(phrase)
#output
#I am learning Python. I really enjoy the Python programming language!
Wrapping up
And there you have it — you now know the basics of substring substitution. Hopefully you found this guide helpful.
To learn more about Python, check out freeCodeCamp’s Scientific Computing with Python Certification.
You’ll start from the basics and learn in an interacitve and beginner-friendly way. You’ll also build five projects at the end to put into practice and help reinforce what you learned.
Thanks for reading and happy coding!
Learn to code for free. freeCodeCamp’s open source curriculum has helped more than 40,000 people get jobs as developers. Get started
In this article, we will discuss how to replace multiple words in a string based on a dictionary.
Table of Contents
- Using str.replace() function
- Using Regex
Suppose we have a string,
"This is the last rain of Season and Jack is here."
We want to replace multiple words in this string using a dictionary i.e.
{'is' : 'AA', 'the': 'BBB', 'and': 'CCC'}
Keys in the dictionary are the substrings that need to be replaced, and the corresponding values in the dictionary are the replacement strings. Like, in this case,
Advertisements
- “is” should be replaced by “AA”
- “the” should be replaced by “BBB”
- “and” should be replaced by “CCC”
The final string should be like,
ThAA AA BBB last rain of Season CCC Jack AA here.
There are different ways to do this. Let’s discuss them one by one.
Using str.replace() function
The string class has a member function replace(to_be_replaced, replacement) and it replaces all the occurrences of substring “to_be_replaced” with “replacement” string.
To replace all the multiple words in a string based on a dictionary. We can iterate over all the key-value pairs in a dictionary and, for each pair, replace all the occurrences of “key” substring with “value” substring in the original string.
For example:
strValue = "This is the last rain of Season and Jack is here." # Dictionary containing mapping of # values to be replaced and replacement values dictOfStrings = {'is' : 'AA', 'the': 'BBB', 'and': 'CCC'} # Iterate over all key-value pairs in dict and # replace each key by the value in the string for word, replacement in dictOfStrings.items(): strValue = strValue.replace(word, replacement) print(strValue)
Output:
ThAA AA BBB last rain of Season CCC Jack AA here.
It replaced all the dictionary keys/words in a string with the corresponding values from the dictionary.
Using Regex
In Python, the regex module provides a function sub(pattern, replacement_str, original_str) to replace the contents of a string based on a matching regex pattern.
This function returns a modified copy of given string “original_str” after replacing all the substrings that matches the given regex “pattern” with a substring “replacement_str”.
To replace all the multiple substrings in a string based on a dictionary. We can loop over all the key-value pairs in a dictionary and for each key-value pair, replace all the occurrences of “key” substring with “value” substring in the original string using the regex.sub() function.
For example:
import re strValue = "This is the last rain of Season and Jack is here." # Dictionary containing mapping of # values to be replaced and replacement values dictOfStrings = {'is' : 'AA', 'the': 'BBB', 'and': 'CCC'} # Iterate over all key-value pairs in dict and # replace each key by the value in the string for word, replacement in dictOfStrings.items(): strValue = re.sub(word, replacement, strValue) print(strValue)
Output:
ThAA AA BBB last rain of Season CCC Jack AA here.
It replaced all the dictionary keys/words in a string with the corresponding values from the dictionary.
Summary:
We learned to replace multiple words in a string based on a dictionary in Python.