I’m working with Python, and I’m trying to find out if you can tell if a word is in a string.
I have found some information about identifying if the word is in the string — using .find
, but is there a way to do an if
statement. I would like to have something like the following:
if string.find(word):
print("success")
mkrieger1
17.7k4 gold badges54 silver badges62 bronze badges
asked Mar 16, 2011 at 1:10
0
What is wrong with:
if word in mystring:
print('success')
Martin Thoma
121k154 gold badges603 silver badges926 bronze badges
answered Mar 16, 2011 at 1:13
fabrizioMfabrizioM
46k15 gold badges100 silver badges118 bronze badges
13
if 'seek' in 'those who seek shall find':
print('Success!')
but keep in mind that this matches a sequence of characters, not necessarily a whole word — for example, 'word' in 'swordsmith'
is True. If you only want to match whole words, you ought to use regular expressions:
import re
def findWholeWord(w):
return re.compile(r'b({0})b'.format(w), flags=re.IGNORECASE).search
findWholeWord('seek')('those who seek shall find') # -> <match object>
findWholeWord('word')('swordsmith') # -> None
answered Mar 16, 2011 at 1:52
Hugh BothwellHugh Bothwell
54.7k8 gold badges84 silver badges99 bronze badges
6
If you want to find out whether a whole word is in a space-separated list of words, simply use:
def contains_word(s, w):
return (' ' + w + ' ') in (' ' + s + ' ')
contains_word('the quick brown fox', 'brown') # True
contains_word('the quick brown fox', 'row') # False
This elegant method is also the fastest. Compared to Hugh Bothwell’s and daSong’s approaches:
>python -m timeit -s "def contains_word(s, w): return (' ' + w + ' ') in (' ' + s + ' ')" "contains_word('the quick brown fox', 'brown')"
1000000 loops, best of 3: 0.351 usec per loop
>python -m timeit -s "import re" -s "def contains_word(s, w): return re.compile(r'b({0})b'.format(w), flags=re.IGNORECASE).search(s)" "contains_word('the quick brown fox', 'brown')"
100000 loops, best of 3: 2.38 usec per loop
>python -m timeit -s "def contains_word(s, w): return s.startswith(w + ' ') or s.endswith(' ' + w) or s.find(' ' + w + ' ') != -1" "contains_word('the quick brown fox', 'brown')"
1000000 loops, best of 3: 1.13 usec per loop
Edit: A slight variant on this idea for Python 3.6+, equally fast:
def contains_word(s, w):
return f' {w} ' in f' {s} '
answered Apr 11, 2016 at 20:32
user200783user200783
13.6k11 gold badges67 silver badges132 bronze badges
6
You can split string to the words and check the result list.
if word in string.split():
print("success")
Martin Thoma
121k154 gold badges603 silver badges926 bronze badges
answered Dec 1, 2016 at 18:26
CorvaxCorvax
7647 silver badges12 bronze badges
3
find returns an integer representing the index of where the search item was found. If it isn’t found, it returns -1.
haystack = 'asdf'
haystack.find('a') # result: 0
haystack.find('s') # result: 1
haystack.find('g') # result: -1
if haystack.find(needle) >= 0:
print('Needle found.')
else:
print('Needle not found.')
Martin Thoma
121k154 gold badges603 silver badges926 bronze badges
answered Mar 16, 2011 at 1:13
Matt HowellMatt Howell
15.6k7 gold badges48 silver badges56 bronze badges
0
This small function compares all search words in given text. If all search words are found in text, returns length of search, or False
otherwise.
Also supports unicode string search.
def find_words(text, search):
"""Find exact words"""
dText = text.split()
dSearch = search.split()
found_word = 0
for text_word in dText:
for search_word in dSearch:
if search_word == text_word:
found_word += 1
if found_word == len(dSearch):
return lenSearch
else:
return False
usage:
find_words('çelik güray ankara', 'güray ankara')
answered Jun 22, 2012 at 22:51
Guray CelikGuray Celik
1,2811 gold badge14 silver badges13 bronze badges
0
If matching a sequence of characters is not sufficient and you need to match whole words, here is a simple function that gets the job done. It basically appends spaces where necessary and searches for that in the string:
def smart_find(haystack, needle):
if haystack.startswith(needle+" "):
return True
if haystack.endswith(" "+needle):
return True
if haystack.find(" "+needle+" ") != -1:
return True
return False
This assumes that commas and other punctuations have already been stripped out.
IanS
15.6k9 gold badges59 silver badges84 bronze badges
answered Jun 15, 2012 at 7:23
daSongdaSong
4071 gold badge5 silver badges9 bronze badges
1
Using regex is a solution, but it is too complicated for that case.
You can simply split text into list of words. Use split(separator, num) method for that. It returns a list of all the words in the string, using separator as the separator. If separator is unspecified it splits on all whitespace (optionally you can limit the number of splits to num).
list_of_words = mystring.split()
if word in list_of_words:
print('success')
This will not work for string with commas etc. For example:
mystring = "One,two and three"
# will split into ["One,two", "and", "three"]
If you also want to split on all commas etc. use separator argument like this:
# whitespace_chars = " tnrf" - space, tab, newline, return, formfeed
list_of_words = mystring.split( tnrf,.;!?'"()")
if word in list_of_words:
print('success')
Martin Thoma
121k154 gold badges603 silver badges926 bronze badges
answered Dec 18, 2017 at 11:44
tstempkotstempko
1,1761 gold badge15 silver badges17 bronze badges
2
As you are asking for a word and not for a string, I would like to present a solution which is not sensitive to prefixes / suffixes and ignores case:
#!/usr/bin/env python
import re
def is_word_in_text(word, text):
"""
Check if a word is in a text.
Parameters
----------
word : str
text : str
Returns
-------
bool : True if word is in text, otherwise False.
Examples
--------
>>> is_word_in_text("Python", "python is awesome.")
True
>>> is_word_in_text("Python", "camelCase is pythonic.")
False
>>> is_word_in_text("Python", "At the end is Python")
True
"""
pattern = r'(^|[^w]){}([^w]|$)'.format(word)
pattern = re.compile(pattern, re.IGNORECASE)
matches = re.search(pattern, text)
return bool(matches)
if __name__ == '__main__':
import doctest
doctest.testmod()
If your words might contain regex special chars (such as +
), then you need re.escape(word)
answered Aug 9, 2017 at 10:11
Martin ThomaMartin Thoma
121k154 gold badges603 silver badges926 bronze badges
Advanced way to check the exact word, that we need to find in a long string:
import re
text = "This text was of edited by Rock"
#try this string also
#text = "This text was officially edited by Rock"
for m in re.finditer(r"bofb", text):
if m.group(0):
print("Present")
else:
print("Absent")
Martin Thoma
121k154 gold badges603 silver badges926 bronze badges
answered Nov 2, 2016 at 8:39
RameezRameez
5545 silver badges11 bronze badges
What about to split the string and strip words punctuation?
w in [ws.strip(',.?!') for ws in p.split()]
If need, do attention to lower/upper case:
w.lower() in [ws.strip(',.?!') for ws in p.lower().split()]
Maybe that way:
def wcheck(word, phrase):
# Attention about punctuation and about split characters
punctuation = ',.?!'
return word.lower() in [words.strip(punctuation) for words in phrase.lower().split()]
Sample:
print(wcheck('CAr', 'I own a caR.'))
I didn’t check performance…
answered Dec 26, 2020 at 5:18
marciomarcio
5067 silver badges19 bronze badges
You could just add a space before and after «word».
x = raw_input("Type your word: ")
if " word " in x:
print("Yes")
elif " word " not in x:
print("Nope")
This way it looks for the space before and after «word».
>>> Type your word: Swordsmith
>>> Nope
>>> Type your word: word
>>> Yes
Martin Thoma
121k154 gold badges603 silver badges926 bronze badges
answered Feb 26, 2015 at 14:23
PyGuyPyGuy
433 bronze badges
1
I believe this answer is closer to what was initially asked: Find substring in string but only if whole words?
It is using a simple regex:
import re
if re.search(r"b" + re.escape(word) + r"b", string):
print('success')
Martin Thoma
121k154 gold badges603 silver badges926 bronze badges
answered Aug 25, 2021 at 13:25
Milos CuculovicMilos Cuculovic
19.4k50 gold badges159 silver badges264 bronze badges
One of the solutions is to put a space at the beginning and end of the test word. This fails if the word is at the beginning or end of a sentence or is next to any punctuation. My solution is to write a function that replaces any punctuation in the test string with spaces, and add a space to the beginning and end or the test string and test word, then return the number of occurrences. This is a simple solution that removes the need for any complex regex expression.
def countWords(word, sentence):
testWord = ' ' + word.lower() + ' '
testSentence = ' '
for char in sentence:
if char.isalpha():
testSentence = testSentence + char.lower()
else:
testSentence = testSentence + ' '
testSentence = testSentence + ' '
return testSentence.count(testWord)
To count the number of occurrences of a word in a string:
sentence = "A Frenchman ate an apple"
print(countWords('a', sentence))
returns 1
sentence = "Is Oporto a 'port' in Portugal?"
print(countWords('port', sentence))
returns 1
Use the function in an ‘if’ to test if the word exists in a string
answered Mar 18, 2022 at 9:37
iStuartiStuart
3953 silver badges6 bronze badges
Given this string
random_string= '4'
i want to determine if its an integer, a character or just a word
i though i could do this
test = int(random_string)
isinstance(test,int) == True
but i realized if the random_string does not contain a number i will have an error
basically the random_string can be of the forms
random_string ='hello'
random_string ='H'
random_string ='r'
random_string ='56'
anyone know a way to do this, kind of confused, for determine is its a character what i did was
chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
random_string in chars == True
i did another string to check if it was a lowercase letter.
also to check if its a word, i took the length of the string, if the length is more than one i determine that it is a word or a number
issue is how can i check if its a word or a number
please help
asked Jun 25, 2012 at 7:24
1
Strings have methods isalpha
and isdigit
to test if they consist of letters and digits, respectively:
>>> 'hello'.isalpha()
True
>>> '123'.isdigit()
True
Note that they only check for those characters, so a string with spaces or anything else will return false for both:
>>> 'hi there'.isalpha()
False
However, if you want the value as a number, you’re better off just using int
. Note that there’s no point checking with isinstance
whether the result is indeed an integer. If int(blah)
succeeds, it will always return an integer; if the string doesn’t represent an integer, it will raise an exception.
answered Jun 25, 2012 at 7:27
BrenBarnBrenBarn
240k35 gold badges408 silver badges382 bronze badges
1
Take your pick.
>>> '4'.isdigit()
True
>>> '-4'.isdigit()
False
>>> int('-4')
-4
>>> 'foo'.isdigit()
False
>>> int('foo')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'foo'
answered Jun 25, 2012 at 7:27
To implement the logic your asking for, I guess the most pythonic way would be to use exception handling:
try:
n = int(random_string)
except ValueError:
if len(random_string) > 1:
# it's a word
else:
# it's a character (or the empty string)
To check the case, you can use the string method islower()
.
answered Jun 25, 2012 at 7:32
wimwim
329k99 gold badges597 silver badges730 bronze badges
To gain more control, esp in case if you have some type of strings which inbuilt functions don’t support, you can use re —
A regular exp like below to check if its a number —
re.findall(r'^[-]?[0-9]*[.]?[0-9]*$', s)
and the following regular exp to check if its a string —
r'^[a-zA-Z]+$'
Please note that this is just for demo, you should modify the regular exp as per your needs.
answered Jun 25, 2012 at 9:24
theharshesttheharshest
7,69711 gold badges39 silver badges51 bronze badges
2
- HowTo
- Python How-To’s
- Check if a String Contains Word in …
Muhammad Maisam Abbas
Dec 21, 2022
Jun 07, 2021
This tutorial will introduce the method to find whether a specified word is inside a string variable or not in Python.
Check the String if It Contains a Word Through an if/in
Statement in Python
If we want to check whether a given string contains a specified word in it or not, we can use the if/in
statement in Python. The if/in
statement returns True
if the word is present in the string and False
if the word is not in the string.
The following program snippet shows us how to use the if/in
statement to determine whether a string contains a word or not:
string = "This contains a word"
if "word" in string:
print("Found")
else:
print("Not Found")
Output:
We checked whether the string variable string
contains the word word
inside it or not with the if/in
statement in the program above. This approach compares both strings character-wise; this means that it doesn’t compare whole words and can give us wrong answers, as demonstrated in the following example:
string = "This contains a word"
if "is" in string:
print("Found")
else:
print("Not Found")
Output:
The output shows that the word is
is present inside the string variable string
. But, in reality, this is
is just a part of the first word This
in the string
variable.
This problem has a simple solution. We can surround the word and the string
variable with white spaces to just compare the whole word. The program below shows us how we can do that:
string = "This contains a word"
if " is " in (" " + string + " "):
print("Found")
else:
print("Not Found")
Output:
In the code above, we used the same if/in
statement, but we slightly altered it to compare only individual words. This time, the output shows no such word as is
present inside the string
variable.
Maisam is a highly skilled and motivated Data Scientist. He has over 4 years of experience with Python programming language. He loves solving complex problems and sharing his results on the internet.
Related Article — Python String
- Remove Commas From String in Python
- Check a String Is Empty in a Pythonic Way
- Convert a String to Variable Name in Python
- Remove Whitespace From a String in Python
- Extract Numbers From a String in Python
- Convert String to Datetime in Python
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Strings and Character Data in Python
In the tutorial on Basic Data Types in Python, you learned how to define strings: objects that contain sequences of character data. Processing character data is integral to programming. It is a rare application that doesn’t need to manipulate strings at least to some extent.
Here’s what you’ll learn in this tutorial: Python provides a rich set of operators, functions, and methods for working with strings. When you are finished with this tutorial, you will know how to access and extract portions of strings, and also be familiar with the methods that are available to manipulate and modify string data.
You will also be introduced to two other Python objects used to represent raw byte data, the bytes
and bytearray
types.
String Manipulation
The sections below highlight the operators, methods, and functions that are available for working with strings.
String Operators
You have already seen the operators +
and *
applied to numeric operands in the tutorial on Operators and Expressions in Python. These two operators can be applied to strings as well.
The +
Operator
The +
operator concatenates strings. It returns a string consisting of the operands joined together, as shown here:
>>>
>>> s = 'foo'
>>> t = 'bar'
>>> u = 'baz'
>>> s + t
'foobar'
>>> s + t + u
'foobarbaz'
>>> print('Go team' + '!!!')
Go team!!!
The *
Operator
The *
operator creates multiple copies of a string. If s
is a string and n
is an integer, either of the following expressions returns a string consisting of n
concatenated copies of s
:
s * n
n * s
Here are examples of both forms:
>>>
>>> s = 'foo.'
>>> s * 4
'foo.foo.foo.foo.'
>>> 4 * s
'foo.foo.foo.foo.'
The multiplier operand n
must be an integer. You’d think it would be required to be a positive integer, but amusingly, it can be zero or negative, in which case the result is an empty string:
If you were to create a string variable and initialize it to the empty string by assigning it the value 'foo' * -8
, anyone would rightly think you were a bit daft. But it would work.
The in
Operator
Python also provides a membership operator that can be used with strings. The in
operator returns True
if the first operand is contained within the second, and False
otherwise:
>>>
>>> s = 'foo'
>>> s in 'That's food for thought.'
True
>>> s in 'That's good for now.'
False
There is also a not in
operator, which does the opposite:
>>>
>>> 'z' not in 'abc'
True
>>> 'z' not in 'xyz'
False
Built-in String Functions
As you saw in the tutorial on Basic Data Types in Python, Python provides many functions that are built-in to the interpreter and always available. Here are a few that work with strings:
Function | Description |
---|---|
chr() |
Converts an integer to a character |
ord() |
Converts a character to an integer |
len() |
Returns the length of a string |
str() |
Returns a string representation of an object |
These are explored more fully below.
ord(c)
Returns an integer value for the given character.
At the most basic level, computers store all information as numbers. To represent character data, a translation scheme is used which maps each character to its representative number.
The simplest scheme in common use is called ASCII. It covers the common Latin characters you are probably most accustomed to working with. For these characters, ord(c)
returns the ASCII value for character c
:
>>>
>>> ord('a')
97
>>> ord('#')
35
ASCII is fine as far as it goes. But there are many different languages in use in the world and countless symbols and glyphs that appear in digital media. The full set of characters that potentially may need to be represented in computer code far surpasses the ordinary Latin letters, numbers, and symbols you usually see.
Unicode is an ambitious standard that attempts to provide a numeric code for every possible character, in every possible language, on every possible platform. Python 3 supports Unicode extensively, including allowing Unicode characters within strings.
As long as you stay in the domain of the common characters, there is little practical difference between ASCII and Unicode. But the ord()
function will return numeric values for Unicode characters as well:
>>>
>>> ord('€')
8364
>>> ord('∑')
8721
chr(n)
Returns a character value for the given integer.
chr()
does the reverse of ord()
. Given a numeric value n
, chr(n)
returns a string representing the character that corresponds to n
:
>>>
>>> chr(97)
'a'
>>> chr(35)
'#'
chr()
handles Unicode characters as well:
>>>
>>> chr(8364)
'€'
>>> chr(8721)
'∑'
len(s)
Returns the length of a string.
With len()
, you can check Python string length. len(s)
returns the number of characters in s
:
>>>
>>> s = 'I am a string.'
>>> len(s)
14
str(obj)
Returns a string representation of an object.
Virtually any object in Python can be rendered as a string. str(obj)
returns the string representation of object obj
:
>>>
>>> str(49.2)
'49.2'
>>> str(3+4j)
'(3+4j)'
>>> str(3 + 29)
'32'
>>> str('foo')
'foo'
String Indexing
Often in programming languages, individual items in an ordered set of data can be accessed directly using a numeric index or key value. This process is referred to as indexing.
In Python, strings are ordered sequences of character data, and thus can be indexed in this way. Individual characters in a string can be accessed by specifying the string name followed by a number in square brackets ([]
).
String indexing in Python is zero-based: the first character in the string has index 0
, the next has index 1
, and so on. The index of the last character will be the length of the string minus one.
For example, a schematic diagram of the indices of the string 'foobar'
would look like this:
The individual characters can be accessed by index as follows:
>>>
>>> s = 'foobar'
>>> s[0]
'f'
>>> s[1]
'o'
>>> s[3]
'b'
>>> len(s)
6
>>> s[len(s)-1]
'r'
Attempting to index beyond the end of the string results in an error:
>>>
>>> s[6]
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
s[6]
IndexError: string index out of range
String indices can also be specified with negative numbers, in which case indexing occurs from the end of the string backward: -1
refers to the last character, -2
the second-to-last character, and so on. Here is the same diagram showing both the positive and negative indices into the string 'foobar'
:
Here are some examples of negative indexing:
>>>
>>> s = 'foobar'
>>> s[-1]
'r'
>>> s[-2]
'a'
>>> len(s)
6
>>> s[-len(s)]
'f'
Attempting to index with negative numbers beyond the start of the string results in an error:
>>>
>>> s[-7]
Traceback (most recent call last):
File "<pyshell#26>", line 1, in <module>
s[-7]
IndexError: string index out of range
For any non-empty string s
, s[len(s)-1]
and s[-1]
both return the last character. There isn’t any index that makes sense for an empty string.
String Slicing
Python also allows a form of indexing syntax that extracts substrings from a string, known as string slicing. If s
is a string, an expression of the form s[m:n]
returns the portion of s
starting with position m
, and up to but not including position n
:
>>>
>>> s = 'foobar'
>>> s[2:5]
'oba'
Again, the second index specifies the first character that is not included in the result—the character 'r'
(s[5]
) in the example above. That may seem slightly unintuitive, but it produces this result which makes sense: the expression s[m:n]
will return a substring that is n - m
characters in length, in this case, 5 - 2 = 3
.
If you omit the first index, the slice starts at the beginning of the string. Thus, s[:m]
and s[0:m]
are equivalent:
>>>
>>> s = 'foobar'
>>> s[:4]
'foob'
>>> s[0:4]
'foob'
Similarly, if you omit the second index as in s[n:]
, the slice extends from the first index through the end of the string. This is a nice, concise alternative to the more cumbersome s[n:len(s)]
:
>>>
>>> s = 'foobar'
>>> s[2:]
'obar'
>>> s[2:len(s)]
'obar'
For any string s
and any integer n
(0 ≤ n ≤ len(s)
), s[:n] + s[n:]
will be equal to s
:
>>>
>>> s = 'foobar'
>>> s[:4] + s[4:]
'foobar'
>>> s[:4] + s[4:] == s
True
Omitting both indices returns the original string, in its entirety. Literally. It’s not a copy, it’s a reference to the original string:
>>>
>>> s = 'foobar'
>>> t = s[:]
>>> id(s)
59598496
>>> id(t)
59598496
>>> s is t
True
If the first index in a slice is greater than or equal to the second index, Python returns an empty string. This is yet another obfuscated way to generate an empty string, in case you were looking for one:
>>>
>>> s[2:2]
''
>>> s[4:2]
''
Negative indices can be used with slicing as well. -1
refers to the last character, -2
the second-to-last, and so on, just as with simple indexing. The diagram below shows how to slice the substring 'oob'
from the string 'foobar'
using both positive and negative indices:
Here is the corresponding Python code:
>>>
>>> s = 'foobar'
>>> s[-5:-2]
'oob'
>>> s[1:4]
'oob'
>>> s[-5:-2] == s[1:4]
True
Specifying a Stride in a String Slice
There is one more variant of the slicing syntax to discuss. Adding an additional :
and a third index designates a stride (also called a step), which indicates how many characters to jump after retrieving each character in the slice.
For example, for the string 'foobar'
, the slice 0:6:2
starts with the first character and ends with the last character (the whole string), and every second character is skipped. This is shown in the following diagram:
Similarly, 1:6:2
specifies a slice starting with the second character (index 1
) and ending with the last character, and again the stride value 2
causes every other character to be skipped:
The illustrative REPL code is shown here:
>>>
>>> s = 'foobar'
>>> s[0:6:2]
'foa'
>>> s[1:6:2]
'obr'
As with any slicing, the first and second indices can be omitted, and default to the first and last characters respectively:
>>>
>>> s = '12345' * 5
>>> s
'1234512345123451234512345'
>>> s[::5]
'11111'
>>> s[4::5]
'55555'
You can specify a negative stride value as well, in which case Python steps backward through the string. In that case, the starting/first index should be greater than the ending/second index:
>>>
>>> s = 'foobar'
>>> s[5:0:-2]
'rbo'
In the above example, 5:0:-2
means “start at the last character and step backward by 2
, up to but not including the first character.”
When you are stepping backward, if the first and second indices are omitted, the defaults are reversed in an intuitive way: the first index defaults to the end of the string, and the second index defaults to the beginning. Here is an example:
>>>
>>> s = '12345' * 5
>>> s
'1234512345123451234512345'
>>> s[::-5]
'55555'
This is a common paradigm for reversing a string:
>>>
>>> s = 'If Comrade Napoleon says it, it must be right.'
>>> s[::-1]
'.thgir eb tsum ti ,ti syas noelopaN edarmoC fI'
Interpolating Variables Into a String
In Python version 3.6, a new string formatting mechanism was introduced. This feature is formally named the Formatted String Literal, but is more usually referred to by its nickname f-string.
The formatting capability provided by f-strings is extensive and won’t be covered in full detail here. If you want to learn more, you can check out the Real Python article Python 3’s f-Strings: An Improved String Formatting Syntax (Guide). There is also a tutorial on Formatted Output coming up later in this series that digs deeper into f-strings.
One simple feature of f-strings you can start using right away is variable interpolation. You can specify a variable name directly within an f-string literal, and Python will replace the name with the corresponding value.
For example, suppose you want to display the result of an arithmetic calculation. You can do this with a straightforward print()
statement, separating numeric values and string literals by commas:
>>>
>>> n = 20
>>> m = 25
>>> prod = n * m
>>> print('The product of', n, 'and', m, 'is', prod)
The product of 20 and 25 is 500
But this is cumbersome. To accomplish the same thing using an f-string:
- Specify either a lowercase
f
or uppercaseF
directly before the opening quote of the string literal. This tells Python it is an f-string instead of a standard string. - Specify any variables to be interpolated in curly braces (
{}
).
Recast using an f-string, the above example looks much cleaner:
>>>
>>> n = 20
>>> m = 25
>>> prod = n * m
>>> print(f'The product of {n} and {m} is {prod}')
The product of 20 and 25 is 500
Any of Python’s three quoting mechanisms can be used to define an f-string:
>>>
>>> var = 'Bark'
>>> print(f'A dog says {var}!')
A dog says Bark!
>>> print(f"A dog says {var}!")
A dog says Bark!
>>> print(f'''A dog says {var}!''')
A dog says Bark!
Modifying Strings
In a nutshell, you can’t. Strings are one of the data types Python considers immutable, meaning not able to be changed. In fact, all the data types you have seen so far are immutable. (Python does provide data types that are mutable, as you will soon see.)
A statement like this will cause an error:
>>>
>>> s = 'foobar'
>>> s[3] = 'x'
Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
s[3] = 'x'
TypeError: 'str' object does not support item assignment
In truth, there really isn’t much need to modify strings. You can usually easily accomplish what you want by generating a copy of the original string that has the desired change in place. There are very many ways to do this in Python. Here is one possibility:
>>>
>>> s = s[:3] + 'x' + s[4:]
>>> s
'fooxar'
There is also a built-in string method to accomplish this:
>>>
>>> s = 'foobar'
>>> s = s.replace('b', 'x')
>>> s
'fooxar'
Read on for more information about built-in string methods!
Built-in String Methods
You learned in the tutorial on Variables in Python that Python is a highly object-oriented language. Every item of data in a Python program is an object.
You are also familiar with functions: callable procedures that you can invoke to perform specific tasks.
Methods are similar to functions. A method is a specialized type of callable procedure that is tightly associated with an object. Like a function, a method is called to perform a distinct task, but it is invoked on a specific object and has knowledge of its target object during execution.
The syntax for invoking a method on an object is as follows:
This invokes method .foo()
on object obj
. <args>
specifies the arguments passed to the method (if any).
You will explore much more about defining and calling methods later in the discussion of object-oriented programming. For now, the goal is to present some of the more commonly used built-in methods Python supports for operating on string objects.
In the following method definitions, arguments specified in square brackets ([]
) are optional.
Case Conversion
Methods in this group perform case conversion on the target string.
s.capitalize()
Capitalizes the target string.
s.capitalize()
returns a copy of s
with the first character converted to uppercase and all other characters converted to lowercase:
>>>
>>> s = 'foO BaR BAZ quX'
>>> s.capitalize()
'Foo bar baz qux'
Non-alphabetic characters are unchanged:
>>>
>>> s = 'foo123#BAR#.'
>>> s.capitalize()
'Foo123#bar#.'
s.lower()
Converts alphabetic characters to lowercase.
s.lower()
returns a copy of s
with all alphabetic characters converted to lowercase:
>>>
>>> 'FOO Bar 123 baz qUX'.lower()
'foo bar 123 baz qux'
s.swapcase()
Swaps case of alphabetic characters.
s.swapcase()
returns a copy of s
with uppercase alphabetic characters converted to lowercase and vice versa:
>>>
>>> 'FOO Bar 123 baz qUX'.swapcase()
'foo bAR 123 BAZ Qux'
s.title()
Converts the target string to “title case.”
s.title()
returns a copy of s
in which the first letter of each word is converted to uppercase and remaining letters are lowercase:
>>>
>>> 'the sun also rises'.title()
'The Sun Also Rises'
This method uses a fairly simple algorithm. It does not attempt to distinguish between important and unimportant words, and it does not handle apostrophes, possessives, or acronyms gracefully:
>>>
>>> "what's happened to ted's IBM stock?".title()
"What'S Happened To Ted'S Ibm Stock?"
s.upper()
Converts alphabetic characters to uppercase.
s.upper()
returns a copy of s
with all alphabetic characters converted to uppercase:
>>>
>>> 'FOO Bar 123 baz qUX'.upper()
'FOO BAR 123 BAZ QUX'
Find and Replace
These methods provide various means of searching the target string for a specified substring.
Each method in this group supports optional <start>
and <end>
arguments. These are interpreted as for string slicing: the action of the method is restricted to the portion of the target string starting at character position <start>
and proceeding up to but not including character position <end>
. If <start>
is specified but <end>
is not, the method applies to the portion of the target string from <start>
through the end of the string.
s.count(<sub>[, <start>[, <end>]])
Counts occurrences of a substring in the target string.
s.count(<sub>)
returns the number of non-overlapping occurrences of substring <sub>
in s
:
>>>
>>> 'foo goo moo'.count('oo')
3
The count is restricted to the number of occurrences within the substring indicated by <start>
and <end>
, if they are specified:
>>>
>>> 'foo goo moo'.count('oo', 0, 8)
2
s.endswith(<suffix>[, <start>[, <end>]])
Determines whether the target string ends with a given substring.
s.endswith(<suffix>)
returns True
if s
ends with the specified <suffix>
and False
otherwise:
>>>
>>> 'foobar'.endswith('bar')
True
>>> 'foobar'.endswith('baz')
False
The comparison is restricted to the substring indicated by <start>
and <end>
, if they are specified:
>>>
>>> 'foobar'.endswith('oob', 0, 4)
True
>>> 'foobar'.endswith('oob', 2, 4)
False
s.find(<sub>[, <start>[, <end>]])
Searches the target string for a given substring.
You can use .find()
to see if a Python string contains a particular substring. s.find(<sub>)
returns the lowest index in s
where substring <sub>
is found:
>>>
>>> 'foo bar foo baz foo qux'.find('foo')
0
This method returns -1
if the specified substring is not found:
>>>
>>> 'foo bar foo baz foo qux'.find('grault')
-1
The search is restricted to the substring indicated by <start>
and <end>
, if they are specified:
>>>
>>> 'foo bar foo baz foo qux'.find('foo', 4)
8
>>> 'foo bar foo baz foo qux'.find('foo', 4, 7)
-1
s.index(<sub>[, <start>[, <end>]])
Searches the target string for a given substring.
This method is identical to .find()
, except that it raises an exception if <sub>
is not found rather than returning -1
:
>>>
>>> 'foo bar foo baz foo qux'.index('grault')
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
'foo bar foo baz foo qux'.index('grault')
ValueError: substring not found
s.rfind(<sub>[, <start>[, <end>]])
Searches the target string for a given substring starting at the end.
s.rfind(<sub>)
returns the highest index in s
where substring <sub>
is found:
>>>
>>> 'foo bar foo baz foo qux'.rfind('foo')
16
As with .find()
, if the substring is not found, -1
is returned:
>>>
>>> 'foo bar foo baz foo qux'.rfind('grault')
-1
The search is restricted to the substring indicated by <start>
and <end>
, if they are specified:
>>>
>>> 'foo bar foo baz foo qux'.rfind('foo', 0, 14)
8
>>> 'foo bar foo baz foo qux'.rfind('foo', 10, 14)
-1
s.rindex(<sub>[, <start>[, <end>]])
Searches the target string for a given substring starting at the end.
This method is identical to .rfind()
, except that it raises an exception if <sub>
is not found rather than returning -1
:
>>>
>>> 'foo bar foo baz foo qux'.rindex('grault')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
'foo bar foo baz foo qux'.rindex('grault')
ValueError: substring not found
s.startswith(<prefix>[, <start>[, <end>]])
Determines whether the target string starts with a given substring.
When you use the Python .startswith()
method, s.startswith(<suffix>)
returns True
if s
starts with the specified <suffix>
and False
otherwise:
>>>
>>> 'foobar'.startswith('foo')
True
>>> 'foobar'.startswith('bar')
False
The comparison is restricted to the substring indicated by <start>
and <end>
, if they are specified:
>>>
>>> 'foobar'.startswith('bar', 3)
True
>>> 'foobar'.startswith('bar', 3, 2)
False
Character Classification
Methods in this group classify a string based on the characters it contains.
s.isalnum()
Determines whether the target string consists of alphanumeric characters.
s.isalnum()
returns True
if s
is nonempty and all its characters are alphanumeric (either a letter or a number), and False
otherwise:
>>>
>>> 'abc123'.isalnum()
True
>>> 'abc$123'.isalnum()
False
>>> ''.isalnum()
False
s.isalpha()
Determines whether the target string consists of alphabetic characters.
s.isalpha()
returns True
if s
is nonempty and all its characters are alphabetic, and False
otherwise:
>>>
>>> 'ABCabc'.isalpha()
True
>>> 'abc123'.isalpha()
False
s.isdigit()
Determines whether the target string consists of digit characters.
You can use the .isdigit()
Python method to check if your string is made of only digits. s.isdigit()
returns True
if s
is nonempty and all its characters are numeric digits, and False
otherwise:
>>>
>>> '123'.isdigit()
True
>>> '123abc'.isdigit()
False
s.isidentifier()
Determines whether the target string is a valid Python identifier.
s.isidentifier()
returns True
if s
is a valid Python identifier according to the language definition, and False
otherwise:
>>>
>>> 'foo32'.isidentifier()
True
>>> '32foo'.isidentifier()
False
>>> 'foo$32'.isidentifier()
False
s.islower()
Determines whether the target string’s alphabetic characters are lowercase.
s.islower()
returns True
if s
is nonempty and all the alphabetic characters it contains are lowercase, and False
otherwise. Non-alphabetic characters are ignored:
>>>
>>> 'abc'.islower()
True
>>> 'abc1$d'.islower()
True
>>> 'Abc1$D'.islower()
False
s.isprintable()
Determines whether the target string consists entirely of printable characters.
s.isprintable()
returns True
if s
is empty or all the alphabetic characters it contains are printable. It returns False
if s
contains at least one non-printable character. Non-alphabetic characters are ignored:
>>>
>>> 'atb'.isprintable()
False
>>> 'a b'.isprintable()
True
>>> ''.isprintable()
True
>>> 'anb'.isprintable()
False
s.isspace()
Determines whether the target string consists of whitespace characters.
s.isspace()
returns True
if s
is nonempty and all characters are whitespace characters, and False
otherwise.
The most commonly encountered whitespace characters are space ' '
, tab 't'
, and newline 'n'
:
>>>
>>> ' t n '.isspace()
True
>>> ' a '.isspace()
False
However, there are a few other ASCII characters that qualify as whitespace, and if you account for Unicode characters, there are quite a few beyond that:
>>>
>>> 'fu2005r'.isspace()
True
('f'
and 'r'
are the escape sequences for the ASCII Form Feed and Carriage Return characters; 'u2005'
is the escape sequence for the Unicode Four-Per-Em Space.)
s.istitle()
Determines whether the target string is title cased.
s.istitle()
returns True
if s
is nonempty, the first alphabetic character of each word is uppercase, and all other alphabetic characters in each word are lowercase. It returns False
otherwise:
>>>
>>> 'This Is A Title'.istitle()
True
>>> 'This is a title'.istitle()
False
>>> 'Give Me The #$#@ Ball!'.istitle()
True
s.isupper()
Determines whether the target string’s alphabetic characters are uppercase.
s.isupper()
returns True
if s
is nonempty and all the alphabetic characters it contains are uppercase, and False
otherwise. Non-alphabetic characters are ignored:
>>>
>>> 'ABC'.isupper()
True
>>> 'ABC1$D'.isupper()
True
>>> 'Abc1$D'.isupper()
False
String Formatting
Methods in this group modify or enhance the format of a string.
s.center(<width>[, <fill>])
Centers a string in a field.
s.center(<width>)
returns a string consisting of s
centered in a field of width <width>
. By default, padding consists of the ASCII space character:
>>>
>>> 'foo'.center(10)
' foo '
If the optional <fill>
argument is specified, it is used as the padding character:
>>>
>>> 'bar'.center(10, '-')
'---bar----'
If s
is already at least as long as <width>
, it is returned unchanged:
>>>
>>> 'foo'.center(2)
'foo'
s.expandtabs(tabsize=8)
Expands tabs in a string.
s.expandtabs()
replaces each tab character ('t'
) with spaces. By default, spaces are filled in assuming a tab stop at every eighth column:
>>>
>>> 'atbtc'.expandtabs()
'a b c'
>>> 'aaatbbbtc'.expandtabs()
'aaa bbb c'
tabsize
is an optional keyword parameter specifying alternate tab stop columns:
>>>
>>> 'atbtc'.expandtabs(4)
'a b c'
>>> 'aaatbbbtc'.expandtabs(tabsize=4)
'aaa bbb c'
s.ljust(<width>[, <fill>])
Left-justifies a string in field.
s.ljust(<width>)
returns a string consisting of s
left-justified in a field of width <width>
. By default, padding consists of the ASCII space character:
>>>
>>> 'foo'.ljust(10)
'foo '
If the optional <fill>
argument is specified, it is used as the padding character:
>>>
>>> 'foo'.ljust(10, '-')
'foo-------'
If s
is already at least as long as <width>
, it is returned unchanged:
>>>
>>> 'foo'.ljust(2)
'foo'
s.lstrip([<chars>])
Trims leading characters from a string.
s.lstrip()
returns a copy of s
with any whitespace characters removed from the left end:
>>>
>>> ' foo bar baz '.lstrip()
'foo bar baz '
>>> 'tnfootnbartnbaz'.lstrip()
'footnbartnbaz'
If the optional <chars>
argument is specified, it is a string that specifies the set of characters to be removed:
>>>
>>> 'http://www.realpython.com'.lstrip('/:pth')
'www.realpython.com'
s.replace(<old>, <new>[, <count>])
Replaces occurrences of a substring within a string.
In Python, to remove a character from a string, you can use the Python string .replace()
method. s.replace(<old>, <new>)
returns a copy of s
with all occurrences of substring <old>
replaced by <new>
:
>>>
>>> 'foo bar foo baz foo qux'.replace('foo', 'grault')
'grault bar grault baz grault qux'
If the optional <count>
argument is specified, a maximum of <count>
replacements are performed, starting at the left end of s
:
>>>
>>> 'foo bar foo baz foo qux'.replace('foo', 'grault', 2)
'grault bar grault baz foo qux'
s.rjust(<width>[, <fill>])
Right-justifies a string in a field.
s.rjust(<width>)
returns a string consisting of s
right-justified in a field of width <width>
. By default, padding consists of the ASCII space character:
>>>
>>> 'foo'.rjust(10)
' foo'
If the optional <fill>
argument is specified, it is used as the padding character:
>>>
>>> 'foo'.rjust(10, '-')
'-------foo'
If s
is already at least as long as <width>
, it is returned unchanged:
>>>
>>> 'foo'.rjust(2)
'foo'
s.rstrip([<chars>])
Trims trailing characters from a string.
s.rstrip()
returns a copy of s
with any whitespace characters removed from the right end:
>>>
>>> ' foo bar baz '.rstrip()
' foo bar baz'
>>> 'footnbartnbaztn'.rstrip()
'footnbartnbaz'
If the optional <chars>
argument is specified, it is a string that specifies the set of characters to be removed:
>>>
>>> 'foo.$$$;'.rstrip(';$.')
'foo'
s.strip([<chars>])
Strips characters from the left and right ends of a string.
s.strip()
is essentially equivalent to invoking s.lstrip()
and s.rstrip()
in succession. Without the <chars>
argument, it removes leading and trailing whitespace:
>>>
>>> s = ' foo bar bazttt'
>>> s = s.lstrip()
>>> s = s.rstrip()
>>> s
'foo bar baz'
As with .lstrip()
and .rstrip()
, the optional <chars>
argument specifies the set of characters to be removed:
>>>
>>> 'www.realpython.com'.strip('w.moc')
'realpython'
s.zfill(<width>)
Pads a string on the left with zeros.
s.zfill(<width>)
returns a copy of s
left-padded with '0'
characters to the specified <width>
:
>>>
>>> '42'.zfill(5)
'00042'
If s
contains a leading sign, it remains at the left edge of the result string after zeros are inserted:
>>>
>>> '+42'.zfill(8)
'+0000042'
>>> '-42'.zfill(8)
'-0000042'
If s
is already at least as long as <width>
, it is returned unchanged:
>>>
>>> '-42'.zfill(3)
'-42'
.zfill()
is most useful for string representations of numbers, but Python will still happily zero-pad a string that isn’t:
>>>
>>> 'foo'.zfill(6)
'000foo'
Converting Between Strings and Lists
Methods in this group convert between a string and some composite data type by either pasting objects together to make a string, or by breaking a string up into pieces.
These methods operate on or return iterables, the general Python term for a sequential collection of objects. You will explore the inner workings of iterables in much more detail in the upcoming tutorial on definite iteration.
Many of these methods return either a list or a tuple. These are two similar composite data types that are prototypical examples of iterables in Python. They are covered in the next tutorial, so you’re about to learn about them soon! Until then, simply think of them as sequences of values. A list is enclosed in square brackets ([]
), and a tuple is enclosed in parentheses (()
).
With that introduction, let’s take a look at this last group of string methods.
s.join(<iterable>)
Concatenates strings from an iterable.
s.join(<iterable>)
returns the string that results from concatenating the objects in <iterable>
separated by s
.
Note that .join()
is invoked on s
, the separator string. <iterable>
must be a sequence of string objects as well.
Some sample code should help clarify. In the following example, the separator s
is the string ', '
, and <iterable>
is a list of string values:
>>>
>>> ', '.join(['foo', 'bar', 'baz', 'qux'])
'foo, bar, baz, qux'
The result is a single string consisting of the list objects separated by commas.
In the next example, <iterable>
is specified as a single string value. When a string value is used as an iterable, it is interpreted as a list of the string’s individual characters:
>>>
>>> list('corge')
['c', 'o', 'r', 'g', 'e']
>>> ':'.join('corge')
'c:o:r:g:e'
Thus, the result of ':'.join('corge')
is a string consisting of each character in 'corge'
separated by ':'
.
This example fails because one of the objects in <iterable>
is not a string:
>>>
>>> '---'.join(['foo', 23, 'bar'])
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
'---'.join(['foo', 23, 'bar'])
TypeError: sequence item 1: expected str instance, int found
That can be remedied, though:
>>>
>>> '---'.join(['foo', str(23), 'bar'])
'foo---23---bar'
As you will soon see, many composite objects in Python can be construed as iterables, and .join()
is especially useful for creating strings from them.
s.partition(<sep>)
Divides a string based on a separator.
s.partition(<sep>)
splits s
at the first occurrence of string <sep>
. The return value is a three-part tuple consisting of:
- The portion of
s
preceding<sep>
<sep>
itself- The portion of
s
following<sep>
Here are a couple examples of .partition()
in action:
>>>
>>> 'foo.bar'.partition('.')
('foo', '.', 'bar')
>>> 'foo@@bar@@baz'.partition('@@')
('foo', '@@', 'bar@@baz')
If <sep>
is not found in s
, the returned tuple contains s
followed by two empty strings:
>>>
>>> 'foo.bar'.partition('@@')
('foo.bar', '', '')
s.rpartition(<sep>)
Divides a string based on a separator.
s.rpartition(<sep>)
functions exactly like s.partition(<sep>)
, except that s
is split at the last occurrence of <sep>
instead of the first occurrence:
>>>
>>> 'foo@@bar@@baz'.partition('@@')
('foo', '@@', 'bar@@baz')
>>> 'foo@@bar@@baz'.rpartition('@@')
('foo@@bar', '@@', 'baz')
s.rsplit(sep=None, maxsplit=-1)
Splits a string into a list of substrings.
Without arguments, s.rsplit()
splits s
into substrings delimited by any sequence of whitespace and returns the substrings as a list:
>>>
>>> 'foo bar baz qux'.rsplit()
['foo', 'bar', 'baz', 'qux']
>>> 'foontbar bazrfqux'.rsplit()
['foo', 'bar', 'baz', 'qux']
If <sep>
is specified, it is used as the delimiter for splitting:
>>>
>>> 'foo.bar.baz.qux'.rsplit(sep='.')
['foo', 'bar', 'baz', 'qux']
(If <sep>
is specified with a value of None
, the string is split delimited by whitespace, just as though <sep>
had not been specified at all.)
When <sep>
is explicitly given as a delimiter, consecutive delimiters in s
are assumed to delimit empty strings, which will be returned:
>>>
>>> 'foo...bar'.rsplit(sep='.')
['foo', '', '', 'bar']
This is not the case when <sep>
is omitted, however. In that case, consecutive whitespace characters are combined into a single delimiter, and the resulting list will never contain empty strings:
>>>
>>> 'footttbar'.rsplit()
['foo', 'bar']
If the optional keyword parameter <maxsplit>
is specified, a maximum of that many splits are performed, starting from the right end of s
:
>>>
>>> 'www.realpython.com'.rsplit(sep='.', maxsplit=1)
['www.realpython', 'com']
The default value for <maxsplit>
is -1
, which means all possible splits should be performed—the same as if <maxsplit>
is omitted entirely:
>>>
>>> 'www.realpython.com'.rsplit(sep='.', maxsplit=-1)
['www', 'realpython', 'com']
>>> 'www.realpython.com'.rsplit(sep='.')
['www', 'realpython', 'com']
s.split(sep=None, maxsplit=-1)
Splits a string into a list of substrings.
s.split()
behaves exactly like s.rsplit()
, except that if <maxsplit>
is specified, splits are counted from the left end of s
rather than the right end:
>>>
>>> 'www.realpython.com'.split('.', maxsplit=1)
['www', 'realpython.com']
>>> 'www.realpython.com'.rsplit('.', maxsplit=1)
['www.realpython', 'com']
If <maxsplit>
is not specified, .split()
and .rsplit()
are indistinguishable.
s.splitlines([<keepends>])
Breaks a string at line boundaries.
s.splitlines()
splits s
up into lines and returns them in a list. Any of the following characters or character sequences is considered to constitute a line boundary:
Escape Sequence | Character |
---|---|
n |
Newline |
r |
Carriage Return |
rn |
Carriage Return + Line Feed |
v or x0b |
Line Tabulation |
f or x0c |
Form Feed |
x1c |
File Separator |
x1d |
Group Separator |
x1e |
Record Separator |
x85 |
Next Line (C1 Control Code) |
u2028 |
Unicode Line Separator |
u2029 |
Unicode Paragraph Separator |
Here is an example using several different line separators:
>>>
>>> 'foonbarrnbazfquxu2028quux'.splitlines()
['foo', 'bar', 'baz', 'qux', 'quux']
If consecutive line boundary characters are present in the string, they are assumed to delimit blank lines, which will appear in the result list:
>>>
>>> 'foofffbar'.splitlines()
['foo', '', '', 'bar']
If the optional <keepends>
argument is specified and is truthy, then the lines boundaries are retained in the result strings:
>>>
>>> 'foonbarnbaznqux'.splitlines(True)
['foon', 'barn', 'bazn', 'qux']
>>> 'foonbarnbaznqux'.splitlines(1)
['foon', 'barn', 'bazn', 'qux']
bytes
Objects
The bytes
object is one of the core built-in types for manipulating binary data. A bytes
object is an immutable sequence of single byte values. Each element in a bytes
object is a small integer in the range 0
to 255
.
Defining a Literal bytes
Object
A bytes
literal is defined in the same way as a string literal with the addition of a 'b'
prefix:
>>>
>>> b = b'foo bar baz'
>>> b
b'foo bar baz'
>>> type(b)
<class 'bytes'>
As with strings, you can use any of the single, double, or triple quoting mechanisms:
>>>
>>> b'Contains embedded "double" quotes'
b'Contains embedded "double" quotes'
>>> b"Contains embedded 'single' quotes"
b"Contains embedded 'single' quotes"
>>> b'''Contains embedded "double" and 'single' quotes'''
b'Contains embedded "double" and 'single' quotes'
>>> b"""Contains embedded "double" and 'single' quotes"""
b'Contains embedded "double" and 'single' quotes'
Only ASCII characters are allowed in a bytes
literal. Any character value greater than 127
must be specified using an appropriate escape sequence:
>>>
>>> b = b'fooxddbar'
>>> b
b'fooxddbar'
>>> b[3]
221
>>> int(0xdd)
221
The 'r'
prefix may be used on a bytes
literal to disable processing of escape sequences, as with strings:
>>>
>>> b = rb'fooxddbar'
>>> b
b'foo\xddbar'
>>> b[3]
92
>>> chr(92)
'\'
Defining a bytes
Object With the Built-in bytes()
Function
The bytes()
function also creates a bytes
object. What sort of bytes
object gets returned depends on the argument(s) passed to the function. The possible forms are shown below.
bytes(<s>, <encoding>)
Creates a
bytes
object from a string.
bytes(<s>, <encoding>)
converts string <s>
to a bytes
object, using str.encode()
according to the specified <encoding>
:
>>>
>>> b = bytes('foo.bar', 'utf8')
>>> b
b'foo.bar'
>>> type(b)
<class 'bytes'>
bytes(<size>)
Creates a
bytes
object consisting of null (0x00
) bytes.
bytes(<size>)
defines a bytes
object of the specified <size>
, which must be a positive integer. The resulting bytes
object is initialized to null (0x00
) bytes:
>>>
>>> b = bytes(8)
>>> b
b'x00x00x00x00x00x00x00x00'
>>> type(b)
<class 'bytes'>
bytes(<iterable>)
Creates a
bytes
object from an iterable.
bytes(<iterable>)
defines a bytes
object from the sequence of integers generated by <iterable>
. <iterable>
must be an iterable that generates a sequence of integers n
in the range 0 ≤ n ≤ 255
:
>>>
>>> b = bytes([100, 102, 104, 106, 108])
>>> b
b'dfhjl'
>>> type(b)
<class 'bytes'>
>>> b[2]
104
Operations on bytes
Objects
Like strings, bytes
objects support the common sequence operations:
-
The
in
andnot in
operators:>>>
>>> b = b'abcde' >>> b'cd' in b True >>> b'foo' not in b True
-
The concatenation (
+
) and replication (*
) operators:>>>
>>> b = b'abcde' >>> b + b'fghi' b'abcdefghi' >>> b * 3 b'abcdeabcdeabcde'
-
Indexing and slicing:
>>>
>>> b = b'abcde' >>> b[2] 99 >>> b[1:3] b'bc'
-
Built-in functions:
>>>
>>> len(b) 5 >>> min(b) 97 >>> max(b) 101
Many of the methods defined for string objects are valid for bytes
objects as well:
>>>
>>> b = b'foo,bar,foo,baz,foo,qux'
>>> b.count(b'foo')
3
>>> b.endswith(b'qux')
True
>>> b.find(b'baz')
12
>>> b.split(sep=b',')
[b'foo', b'bar', b'foo', b'baz', b'foo', b'qux']
>>> b.center(30, b'-')
b'---foo,bar,foo,baz,foo,qux----'
Notice, however, that when these operators and methods are invoked on a bytes
object, the operand and arguments must be bytes
objects as well:
>>>
>>> b = b'foo.bar'
>>> b + '.baz'
Traceback (most recent call last):
File "<pyshell#72>", line 1, in <module>
b + '.baz'
TypeError: can't concat bytes to str
>>> b + b'.baz'
b'foo.bar.baz'
>>> b.split(sep='.')
Traceback (most recent call last):
File "<pyshell#74>", line 1, in <module>
b.split(sep='.')
TypeError: a bytes-like object is required, not 'str'
>>> b.split(sep=b'.')
[b'foo', b'bar']
Although a bytes
object definition and representation is based on ASCII text, it actually behaves like an immutable sequence of small integers in the range 0
to 255
, inclusive. That is why a single element from a bytes
object is displayed as an integer:
>>>
>>> b = b'fooxddbar'
>>> b[3]
221
>>> hex(b[3])
'0xdd'
>>> min(b)
97
>>> max(b)
221
A slice is displayed as a bytes
object though, even if it is only one byte long:
You can convert a bytes
object into a list of integers with the built-in list()
function:
>>>
>>> list(b)
[97, 98, 99, 100, 101]
Hexadecimal numbers are often used to specify binary data because two hexadecimal digits correspond directly to a single byte. The bytes
class supports two additional methods that facilitate conversion to and from a string of hexadecimal digits.
bytes.fromhex(<s>)
Returns a
bytes
object constructed from a string of hexadecimal values.
bytes.fromhex(<s>)
returns the bytes
object that results from converting each pair of hexadecimal digits in <s>
to the corresponding byte value. The hexadecimal digit pairs in <s>
may optionally be separated by whitespace, which is ignored:
>>>
>>> b = bytes.fromhex(' aa 68 4682cc ')
>>> b
b'xaahFx82xcc'
>>> list(b)
[170, 104, 70, 130, 204]
b.hex()
Returns a string of hexadecimal value from a
bytes
object.
b.hex()
returns the result of converting bytes
object b
into a string of hexadecimal digit pairs. That is, it does the reverse of .fromhex()
:
>>>
>>> b = bytes.fromhex(' aa 68 4682cc ')
>>> b
b'xaahFx82xcc'
>>> b.hex()
'aa684682cc'
>>> type(b.hex())
<class 'str'>
bytearray
Objects
Python supports another binary sequence type called the bytearray
. bytearray
objects are very like bytes
objects, despite some differences:
-
There is no dedicated syntax built into Python for defining a
bytearray
literal, like the'b'
prefix that may be used to define abytes
object. Abytearray
object is always created using thebytearray()
built-in function:>>>
>>> ba = bytearray('foo.bar.baz', 'UTF-8') >>> ba bytearray(b'foo.bar.baz') >>> bytearray(6) bytearray(b'x00x00x00x00x00x00') >>> bytearray([100, 102, 104, 106, 108]) bytearray(b'dfhjl')
-
bytearray
objects are mutable. You can modify the contents of abytearray
object using indexing and slicing:>>>
>>> ba = bytearray('foo.bar.baz', 'UTF-8') >>> ba bytearray(b'foo.bar.baz') >>> ba[5] = 0xee >>> ba bytearray(b'foo.bxeer.baz') >>> ba[8:11] = b'qux' >>> ba bytearray(b'foo.bxeer.qux')
A bytearray
object may be constructed directly from a bytes
object as well:
>>>
>>> ba = bytearray(b'foo')
>>> ba
bytearray(b'foo')
Conclusion
This tutorial provided an in-depth look at the many different mechanisms Python provides for string handling, including string operators, built-in functions, indexing, slicing, and built-in methods. You also were introduced to the bytes
and bytearray
types.
These types are the first types you have examined that are composite—built from a collection of smaller parts. Python provides several composite built-in types. In the next tutorial, you will explore two of the most frequently used: lists and tuples.
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Strings and Character Data in Python
Question :
Python – Check If Word Is In A String
I’m working with Python v2, and I’m trying to find out if you can tell if a word is in a string.
I have found some information about identifying if the word is in the string – using .find, but is there a way to do an IF statement. I would like to have something like the following:
if string.find(word):
print 'success'
Thanks for any help.
Answer #1:
What is wrong with:
if word in mystring:
print 'success'
Answer #2:
if 'seek' in 'those who seek shall find':
print('Success!')
but keep in mind that this matches a sequence of characters, not necessarily a whole word – for example, 'word' in 'swordsmith'
is True. If you only want to match whole words, you ought to use regular expressions:
import re
def findWholeWord(w):
return re.compile(r'b({0})b'.format(w), flags=re.IGNORECASE).search
findWholeWord('seek')('those who seek shall find') # -> <match object>
findWholeWord('word')('swordsmith') # -> None
Answer #3:
If you want to find out whether a whole word is in a space-separated list of words, simply use:
def contains_word(s, w):
return (' ' + w + ' ') in (' ' + s + ' ')
contains_word('the quick brown fox', 'brown') # True
contains_word('the quick brown fox', 'row') # False
This elegant method is also the fastest. Compared to Hugh Bothwell’s and daSong’s approaches:
>python -m timeit -s "def contains_word(s, w): return (' ' + w + ' ') in (' ' + s + ' ')" "contains_word('the quick brown fox', 'brown')"
1000000 loops, best of 3: 0.351 usec per loop
>python -m timeit -s "import re" -s "def contains_word(s, w): return re.compile(r'b({0})b'.format(w), flags=re.IGNORECASE).search(s)" "contains_word('the quick brown fox', 'brown')"
100000 loops, best of 3: 2.38 usec per loop
>python -m timeit -s "def contains_word(s, w): return s.startswith(w + ' ') or s.endswith(' ' + w) or s.find(' ' + w + ' ') != -1" "contains_word('the quick brown fox', 'brown')"
1000000 loops, best of 3: 1.13 usec per loop
Edit: A slight variant on this idea for Python 3.6+, equally fast:
def contains_word(s, w):
return f' {w} ' in f' {s} '
Answer #4:
find returns an integer representing the index of where the search item was found. If it isn’t found, it returns -1.
haystack = 'asdf'
haystack.find('a') # result: 0
haystack.find('s') # result: 1
haystack.find('g') # result: -1
if haystack.find(needle) >= 0:
print 'Needle found.'
else:
print 'Needle not found.'
Answer #5:
You can split string to the words and check the result list.
if word in string.split():
print 'success'
Answer #6:
This small function compares all search words in given text. If all search words are found in text, returns length of search, or False
otherwise.
Also supports unicode string search.
def find_words(text, search):
"""Find exact words"""
dText = text.split()
dSearch = search.split()
found_word = 0
for text_word in dText:
for search_word in dSearch:
if search_word == text_word:
found_word += 1
if found_word == len(dSearch):
return lenSearch
else:
return False
usage:
find_words('çelik güray ankara', 'güray ankara')
Answer #7:
If matching a sequence of characters is not sufficient and you need to match whole words, here is a simple function that gets the job done. It basically appends spaces where necessary and searches for that in the string:
def smart_find(haystack, needle):
if haystack.startswith(needle+" "):
return True
if haystack.endswith(" "+needle):
return True
if haystack.find(" "+needle+" ") != -1:
return True
return False
This assumes that commas and other punctuations have already been stripped out.
Answer #8:
Using regex is a solution, but it is too complicated for that case.
You can simply split text into list of words. Use split(separator, num) method for that. It returns a list of all the words in the string, using separator as the separator. If separator is unspecified it splits on all whitespace (optionally you can limit the number of splits to num).
list_of_words = mystring.split()
if word in list_of_words:
print 'success'
This will not work for string with commas etc. For example:
mystring = "One,two and three"
# will split into ["One,two", "and", "three"]
If you also want to split on all commas etc. use separator argument like this:
# whitespace_chars = " tnrf" - space, tab, newline, return, formfeed
list_of_words = mystring.split( tnrf,.;!?'"()"")
Related posts:
Adding backslashes without escaping [Python] [duplicate]
main() function doesn’t run when running script
Numpy getting in the way of int -> float type casting