If youβre looking for ways to remove or replace all or part of a string in Python, then this tutorial is for you. Youβll be taking a fictional chat room transcript and sanitizing it using both the .replace() method and the re.sub() function.
In Python, the .replace() method and the re.sub() function are often used to clean up text by removing strings or substrings or replacing them. In this tutorial, youβll be playing the role of a developer for a company that provides technical support through a one-to-one text chat. Youβre tasked with creating a script thatβll sanitize the chat, removing any personal data and replacing any swear words with emoji.
Youβre only given one very short chat transcript:
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!
Even though this transcript is short, itβs typical of the type of chats that agents have all the time. It has user identifiers, ISO time stamps, and messages.
In this case, the client johndoe filed a complaint, and company policy is to sanitize and simplify the transcript, then pass it on for independent evaluation. Sanitizing the message is your job!
Sample Code: Click here to download the free sample code that youβll use to replace strings in Python.
The first thing youβll want to do is to take care of any swear words.
How to Remove or Replace a Python String or Substring
The most basic way to replace a string in Python is to use the .replace() string method:
>>> "Fake Python".replace("Fake", "Real")
'Real Python'
As you can see, you can chain .replace() onto any string and provide the method with two arguments. The first is the string that you want to replace, and the second is the replacement.
Note: Although the Python shell displays the result of .replace(), the string itself stays unchanged. You can see this more clearly by assigning your string to a variable:
>>> name = "Fake Python"
>>> name.replace("Fake", "Real")
'Real Python'
>>> name
'Fake Python'
>>> name = name.replace("Fake", "Real")
'Real Python'
>>> name
'Real Python'
Notice that when you simply call .replace(), the value of name doesnβt change. But when you assign the result of name.replace() to the name variable, 'Fake Python' becomes 'Real Python'.
Now itβs time to apply this knowledge to the transcript:
>>> transcript = """\
... [support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
... [johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
... [support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
... [johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!"""
>>> transcript.replace("BLASTED", "π€")
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY π€ ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!
Loading the transcript as a triple-quoted string and then using the .replace() method on one of the swear words works fine. But thereβs another swear word thatβs not getting replaced because in Python, the string needs to match exactly:
>>> "Fake Python".replace("fake", "Real")
'Fake Python'
As you can see, even if the casing of one letter doesnβt match, itβll prevent any replacements. This means that if youβre using the .replace() method, youβll need to call it various times with the variations. In this case, you can just chain on another call to .replace():
>>> transcript.replace("BLASTED", "π€").replace("Blast", "π€")
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY π€ ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : π€! You're right!
Success! But youβre probably thinking that this isnβt the best way to do this for something like a general-purpose transcription sanitizer. Youβll want to move toward some way of having a list of replacements, instead of having to type out .replace() each time.




