Panda’s apply function is very versatile. It is able to run complex instructions on each row of records in the data frame. Take the data of column A and B for example:
A | B | C |
Alex | Dick | False |
Alexandra | Babylon | True |
Alexis | Baby | True |
Boris | Charlie | False |
Michael | Custom | False |
The rules to get the result of C are:
If Column A contains "Alex" and Column B contains "Baby", return True, else False
The apply function can look something like this:
def regexApply(s):
# the records on a single row is passed in as s, and to access the
# variables, can use the following methods
varA = s['A']
varB = s['B']
if (re.search("Alex.*", varA)) and (re.search("Baby.*", varB)):
return True
else:
return False
If the data is stored in df, then to call the function:
df['C'] = df.apply(regexApply, axis=1)
The example may look simple, and some may argue that it can be achieved via lambda function, however, the beauty of this is that, since it is a function, it can have as many lines of code as possible, and I have used it to do relatively complex operations.
I hope you can benefit from this as well.