Functional Aspects of Python

This post will talk about some of the lesser used (but useful) functions of Python. Most of it is tried in the Python 2.7 versions with subtle differences in the 3.x versions.

Generators and iterators

To know what generators are, consider the following example-

   $ ulimit -v 131072
   $ python
   >>> a = (range(50000000))

This will most likely result in an error. This is because you set the user limit memory to 131072kB (128MB) and a list of 50000000 items cannot fit in. Now, try the following code.

   >>> a = (xrange(50000000))

This time, you won't see an error. Why? This is the fundamental difference between a list and a generator. Although `xrange` is not a generator by definition, the python docstring says the following -

| Like range(), but instead of returning a list, returns an object that
| generates the numbers in the range on demand. For looping, this is
| slightly faster than range() and more memory efficient.

Iterators and generators are much faster and memory efficient as compared to a list. This is a general idea of what generators and iterators are. This is not a tutorial but you will get some really excellent material to read online.

Note: In Python3.x, range(<num>) returns an iterator instead of a list.

Some useful functions

While programming a tool, application, or whatever, using python, you might be tempted to write some "inner code" which will use (or abuse) the language. But chances are that the utility functions you are building might be a part of the standard library itself. I'm going to discuss some of them that were useful to me. There is a huge collection of functions and modules in the standard library, so at least skim through them the next time you are building something because chances are those functions exist already. Here are some functions that can solve many problems with (almost) a single line of code.

map

In simple terms, this in-built function maps a function to every element of a list/iterator. This makes code much simpler to read and more compact. For example, consider the following code.

   >>> a = map(lambda x: x+1, xrange(5))
   >>> a
   [1, 2, 3, 4, 5]

Chances are you know what a lambda function is. If you do not, make a quick Google search. This code is certainly more compact than the for loop you would have written to get this done. In Python3.x, map returns an iterator, but to get an iterator in Python2.7, you can use the `itertools` module to use the `imap`

   >>> from itertools import imap
   >>> a = imap(lambda x: x+1, xrange(5))
   >>> a
   <itertools.imap at 0x7fba9e9cca10>

For me, this function serves as a way to make code easier to read and more compact. Although excessive use of this can make code a little tricky to read if the code is too nested.

reduce

This in-built function maps a function with two arguments to consecutive elements of a list/iterator. To understand it better, consider the following code.

   >>> a = reduce(lambda x,y: x+y, xrange(5))
   >>> a
   10

This is useful in combination with map when you need to do a parallel operation. For example, you want to find the number of occurrences of an alphabet in a list of words. Consider the following.

   >>> a = ["Ram", "gives", "Sita", "a", "kissy", "on", "her", "cheeks"]
   >>> reduce(lambda x,y:x+y, map(lambda x:x.count('e'), a))

As a fun fact, Google uses a similar concept to make its searches faster. It maps your search to its network of documents and reduces the results into the final page that you get to see. It uses Apache Hadoop

filter

As the name suggests, this function creates another list, after filtering through a function that returns `True` or `False` for each element. A simple example to demonstrate this -

   >>> filter(lambda x: x%2 , xrange(10))
   [1, 3, 5, 7, 9]

To get an iterator, you can use the `ifilter` function from the `itertools` module.

The point here is to understand the importance of iterators and the situations where they can be used, leading to faster code and lesser use of memory. In some other cases, where you may randomly need to access particular elements, lists or tuples are more favorable. The bottom line is, iterators and generators are useful, but you must know the limitations.

any, all

These 2 functions return a boolean `True` or `False` depending on whether any (or all) elements of an iterator/list/tuple are equivalent to `True`. Here are some examples:

   >>> any(xrange(10))
   True
   >>> all(xrange(10))
   False

Now, how do we use this? Instead of using some loop to check whether any(or all) elements satisfy some condition, we can simply use the one-liner code-

   >>> def some_condition(element):
   ...     # some code here
   >>> any(map(some_condition, mylist ))

Again, the point is to write concise and more 'Pythonic' code. The last function I want to mention here is slightly tricky for newbies, but is important nevertheless.

Memoization and LRU

Memoization is a quick solution when you are solving a problem having optimal subproblems as part of the solution for the main problem (competitive coding guys can relate ;) ). In any case, if you want to save time by 'memoizing' the solution rather than calculating the solution, you can use the `lru_cache` decorator from the `functools` module. If you don't know what it is, stay hooked, I will discuss it in a while. LRU stands for 'least recently used' (self explanatory).

This works for Python 3.3+ only. I didn't find it in Python 2.7. If you find any such function as part of the standard library, please let me know.

We use it in the following way -

   >>> from functools import lru_cache
   >>> @lru_cache(maxsize=10)
   ... def complex_calculation_function(x):
   ...     # some code
   ... 
   >>>

The first time you put some value of the parameter, it calculates the value and returns it. The next time, if its stored in the cache, it returns the value without calculating it. Hence, you should be careful what function has to be used with it, if at all (do not use functions which have some side effects, or produce different answers based on the history of inputs). The `maxsize` attribute stores the max. number of recently used results that can be stored. There are some other cache statistics shown as well (like hits, misses, current cache size). Look it up for more details.

In the above example, you are probably wondering what a decorator and what the weird `@` symbol means there, if you aren't familiar with decorators. Let's have a brief introduction to decorators.

Decorators

To understand decorators, let us use an example to memoize the functions in a `dict`:

   >>> _names = {}
   >>> def memoize_function(f):
   ... 	global _names
   ... 	_names[f.__name__] = f
   ... 	return f
   >>> @memoize_function
   ... def someFunc():
   ... 	# code here
   >>>

The last line of code is equivalent to -

   >>> def someFunc():
   ... 	# code here
   >>> someFunc = memoize_function(someFunc)

In either of the cases, the name of the function and the function itself are stored in a dict `_names`. Although, decorators have a lot of details to look into, like maintaining the docstrings, and function name, etc. after applying the decorator, and others, but that is beyond the scope of this post. Decorators make the code look more beautiful and its purpose is distinguished from a regular function use due to its different syntax.

Thanks a lot for reading! Happy coding!! :)

Contents

See Also