Writing Functions
Last updated on 2025-02-14 | Edit this page
Overview
Questions
- How can I create my own functions?
Objectives
- Explain and identify the difference between function definition and function call.
- Write a function that takes a small, fixed number of arguments and produces a single result.
Break programs down into functions to make them easier to understand.
- Human beings can only keep a few items in working memory at a time.
- Understand larger/more complicated ideas by understanding and
combining pieces.
- Components in a machine.
- Lemmas when proving theorems.
- Functions serve the same purpose in programs.
- Encapsulate complexity so that we can treat it as a single “thing”.
- Also enables re-use.
- Write one time, use many times.
Define a function using def
with a name, parameters,
and a block of code.
- Begin the definition of a new function with
def
. - Followed by the name of the function.
- Must obey the same rules as variable names.
- Then parameters in parentheses.
- Empty parentheses if the function doesn’t take any inputs.
- We will discuss this in detail in a moment.
- Then a colon.
- Then an indented block of code.
Defining a function does not run it.
- Defining a function does not run it.
- Like assigning a value to a variable.
- Must call the function to execute the code it contains.
OUTPUT
Hello!
Arguments in a function call are matched to its defined parameters.
- Functions are most useful when they can operate on different data.
- Specify parameters when defining a function.
- These become variables when the function is executed.
- Are assigned the arguments in the call (i.e., the values passed to the function).
- If you don’t name the arguments when using them in the call, the arguments will be matched to parameters in the order the parameters are defined in the function.
PYTHON
def print_date(year, month, day):
joined = str(year) + '/' + str(month) + '/' + str(day)
print(joined)
print_date(1871, 3, 19)
OUTPUT
1871/3/19
Or, we can name the arguments when we call the function, which allows us to specify them in any order and adds clarity to the call site; otherwise as one is reading the code they might forget if the second argument is the month or the day for example.
OUTPUT
1871/3/19
- Via Twitter:
()
contains the ingredients for the function while the body contains the recipe.
Functions may return a result to their caller using
return
.
- Use
return ...
to give a value back to the caller. - May occur anywhere in the function.
- But functions are easier to understand if
return
occurs:- At the start to handle special cases.
- At the very end, with a final result.
OUTPUT
average of actual values: 2.6666666666666665
OUTPUT
average of empty list: None
- Remember: every function returns something.
- A function that doesn’t explicitly
return
a value automatically returnsNone
.
OUTPUT
1871/3/19
result of call is: None
Identifying Syntax Errors
- Read the code below and try to identify what the errors are without running it.
- Run the code and read the error message. Is it a
SyntaxError
or anIndentationError
? - Fix the error.
- Repeat steps 2 and 3 until you have fixed all the errors.
OUTPUT
calling <function report at 0x7fd128ff1bf8> 22.5
A function call always needs parenthesis, otherwise you get memory address of the function object. So, if we wanted to call the function named report, and give it the value 22.5 to report on, we could have our function call as follows
OUTPUT
calling
pressure is 22.5
Order of Operations
- What’s wrong in this example?
PYTHON
result = print_time(11, 37, 59)
def print_time(hour, minute, second):
time_string = str(hour) + ':' + str(minute) + ':' + str(second)
print(time_string)
- After fixing the problem above, explain why running this example code:
gives this output:
OUTPUT
11:37:59
result of call is: None
- Why is the result of the call
None
?
The problem with the example is that the function
print_time()
is defined after the call to the function is made. Python doesn’t know how to resolve the nameprint_time
since it hasn’t been defined yet and will raise aNameError
e.g.,NameError: name 'print_time' is not defined
The first line of output
11:37:59
is printed by the first line of code,result = print_time(11, 37, 59)
that binds the value returned by invokingprint_time
to the variableresult
. The second line is from the second print call to print the contents of theresult
variable.print_time()
does not explicitlyreturn
a value, so it automatically returnsNone
.
Find the First
Calling by Name
Earlier we saw this function:
PYTHON
def print_date(year, month, day):
joined = str(year) + '/' + str(month) + '/' + str(day)
print(joined)
We saw that we can call the function using named arguments, like this:
- What does
print_date(day=1, month=2, year=2003)
print? - When have you seen a function call like this before?
- When and why is it useful to call functions this way?
2003/2/1
- We saw examples of using named arguments when working with
the pandas library. For example, when reading in a dataset using
data = pd.read_csv('data/data-penguins-named.csv', index_col='species')
, the last argumentindex_col
is a named argument. - Using named arguments can make code more readable since one can see from the function call what name the different arguments have inside the function. It can also reduce the chances of passing arguments in the wrong order, since by using named arguments the order doesn’t matter.
Encapsulation of an If/Print Block
The code below will run on a label-printer for chicken eggs. A digital scale will report a chicken egg mass (in grams) to the computer and then the computer will print a label.
PYTHON
import random
for i in range(10):
# simulating the mass of a chicken egg
# the (random) mass will be 70 +/- 20 grams
mass = 70 + 20.0 * (2.0 * random.random() - 1.0)
print(mass)
# egg sizing machinery prints a label
if mass >= 85:
print("jumbo")
elif mass >= 70:
print("large")
elif mass < 70 and mass >= 55:
print("medium")
else:
print("small")
The if-block that classifies the eggs might be useful in other
situations, so to avoid repeating it, we could fold it into a function,
get_egg_label()
. Revising the program to use the function
would give us this:
PYTHON
# revised version
import random
for i in range(10):
# simulating the mass of a chicken egg
# the (random) mass will be 70 +/- 20 grams
mass = 70 + 20.0 * (2.0 * random.random() - 1.0)
print(mass, get_egg_label(mass))
- Create a function definition for
get_egg_label()
that will work with the revised program above. Note that theget_egg_label()
function’s return value will be important. Sample output from the above program would be71.23 large
. - A dirty egg might have a mass of more than 90 grams, and a spoiled
or broken egg will probably have a mass that’s less than 50 grams.
Modify your
get_egg_label()
function to account for these error conditions. Sample output could be25 too light, probably spoiled
.
PYTHON
def get_egg_label(mass):
# egg sizing machinery prints a label
egg_label = "Unlabelled"
if mass >= 90:
egg_label = "warning: egg might be dirty"
elif mass >= 85:
egg_label = "jumbo"
elif mass >= 70:
egg_label = "large"
elif mass < 70 and mass >= 55:
egg_label = "medium"
elif mass < 50:
egg_label = "too light, probably spoiled"
else:
egg_label = "small"
return egg_label
Encapsulating Data Analysis
Assume that the following code has been executed:
PYTHON
import pandas as pd
data_penguins = pd.read_csv('data/data-penguins-named.csv')
data_penguins_adelie = data_penguins[data_penguins['species'] == 'Adelie']
- Complete the statements below to obtain the average body mass of Adelie penguins.
- Abstract the code above into a single function which can calculate the average body mass of any penguin species.
PYTHON
def avg_body_mass_for_species(species):
data_penguins = pd.read_csv('data/data-penguins-named.csv')
____
____
return ____
- How would you generalize this function if you did not know beforehand whether the data contain any empty values? Or if you wanted to calculate an average value of some other feature in the dataset?
- The average GDP for Japan across the years reported for the 1980s is computed with:
- That code as a function is:
PYTHON
def avg_body_mass_for_species(species):
data_penguins = pd.read_csv('data/data-penguins-named.csv')
species_data = data_penguins[data_penguins['species'] == species]
avg_body_mass = species_data['body_mass_g'].dropna().mean()
return avg_body_mass
- To obtain the average for the relevant years, we need to loop over them:
PYTHON
def avg_column_for_species(data, species, column='body_mass_g'):
if column not in data.columns:
return (f"Column '{column}' not found in the dataset.")
species_data = data[data['species'] == species]
avg_value = species_data[column].dropna().mean()
return avg_value
The function can now be called by:
PYTHON
avg_adelie_body_mass = avg_column_for_species(data_penguins, 'Adelie', 'body_mass_g')
print(f"Average body mass for Adelie: {avg_adelie_body_mass} grams")
OUTPUT
Average body mass for Adelie: 3706.1643835616437 grams
Simulating a dynamical system
In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in a geometrical space. A canonical example of a dynamical system is the logistic map, a growth model that computes a new population density (between 0 and 1) based on the current density. In the model, time takes discrete values 0, 1, 2, …
- Define a function called
logistic_map
that takes two inputs:x
, representing the current population (at timet
), and a parameterr = 1
. This function should return a value representing the state of the system (population) at timet + 1
, using the mapping function:
f(t+1) = r * f(t) * [1 - f(t)]
Using a
for
orwhile
loop, iterate thelogistic_map
function defined in part 1, starting from an initial population of 0.5, for a period of timet_final = 10
. Store the intermediate results in a list so that after the loop terminates you have accumulated a sequence of values representing the state of the logistic map at timest = [0,1,...,t_final]
(11 values in total). Print this list to see the evolution of the population.Encapsulate the logic of your loop into a function called
iterate
that takes the initial population as its first input, the parametert_final
as its second input and the parameterr
as its third input. The function should return the list of values representing the state of the logistic map at timest = [0,1,...,t_final]
. Run this function for periodst_final = 100
and1000
and print some of the values. Is the population trending toward a steady state?
-
PYTHON
def iterate(initial_population, t_final, r): population = [initial_population] for t in range(t_final): population.append( logistic_map(population[t], r) ) return population for period in (10, 100, 1000): population = iterate(0.5, period, 1) print(population[-1])
OUTPUT
0.06945089389714401 0.009395779870614648 0.0009913908614406382
Using Functions With Conditionals in Pandas
Functions will often contain conditionals. Here is a short example that will indicate how heavy the penguin is based on hand-coded values.
PYTHON
def how_heavy(weight):
if weight < 3500:
return "Not heavy at all, this penguin is clearly hungry!"
elif weight >= 3500 and weight < 4500:
return "Normal weight penguin, he is eating well!"
elif weight >= 4500:
return "Heavy penguin, its eating way too much!"
else:
# This observation has bad data
return None
how_heavy(5000)
OUTPUT
'Heavy penguin, its eating way too much!'
That function would typically be used within a for
loop,
but Pandas has a different, more efficient way of doing the same thing,
and that is by applying a function to a dataframe or a portion
of a dataframe. Here is an example, using the definition above.
PYTHON
data_penguin = pd.read_csv("data/data-penguins-named.csv")
data_penguin['how_heavy'] = data_penguin['body_mass_g'].apply(how_heavy)
There is a lot in that second line, so let’s take it piece by piece.
On the right side of the =
we start with
data_penguin['body_mass_g']
, which is the column in the
dataframe called data
labeled body_mass_g
. We
use the apply()
to do what it says, apply the
how_heavy
to the value of this column for every row in the
dataframe, to create a new values for every row, under the column
how_heavy
.
Key Points
- Break programs down into functions to make them easier to understand.
- Define a function using
def
with a name, parameters, and a block of code. - Defining a function does not run it.
- Arguments in a function call are matched to its defined parameters.
- Functions may return a result to their caller using
return
.