Using pandas dataframe’s apply function

Panda’s apply function is very versatile. It is able to run complex instructions on each row of records in the data frame. Take the data of column A and B for example:

ABC
AlexDickFalse
AlexandraBabylonTrue
AlexisBabyTrue
BorisCharlieFalse
MichaelCustomFalse

The rules to get the result of C are:

If Column A contains "Alex" and Column B contains "Baby", return True, else False

The apply function can look something like this:

def regexApply(s):
  # the records on a single row is passed in as s, and to access the
  # variables, can use the following methods
  varA = s['A']
  varB = s['B']

  if (re.search("Alex.*", varA)) and (re.search("Baby.*", varB)):
    return True
  else:
    return False

If the data is stored in df, then to call the function:

df['C'] = df.apply(regexApply, axis=1)

The example may look simple, and some may argue that it can be achieved via lambda function, however, the beauty of this is that, since it is a function, it can have as many lines of code as possible, and I have used it to do relatively complex operations.

I hope you can benefit from this as well.

Leave a Comment