Go to: Na-Rae Han's home page  

Python 3 Notes

        [ HOME | LING 1330/2330 ]

List Comprehension

<< Previous Note           Next Note >>
On this page: list comprehension [f(x) for x in li if ...].

Filtering Items In a List

Suppose we have a list. Often, we want to gather only the items that meet certain criteria. Below, we have a list of words, and we want to extract from it only the ones that contain 'wo'. For this, we will need to first make a new empty list, and then iterate through the original list to find items put in:
 
>>> wood = 'How much wood would a woodchuck chuck if a woodchuck could
chuck wood?'.split()
>>> wood
['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if', 'a', 
'woodchuck', 'could', 'chuck', 'wood?'] 
>>> wolist = []
>>> for x in wood:
        if 'wo' in x:
            wolist.append(x)
>>> wolist
['wood', 'would', 'woodchuck', 'woodchuck', 'wood?'] 
>>> 
OK, that works, but that's a lot of lines of code. What if I told you you can accomplish it all with one line of Python code? Well you can! Behold the superpower of list comprehension:
 
>>> [x for x in wood if 'wo' in x]
['wood', 'would', 'woodchuck', 'woodchuck', 'wood?'] 
>>> 
You want a list of words that are 5+ characters? That too can be done with list comprehension:
 
>>> [x for x in wood if len(x) >= 5]
['would', 'woodchuck', 'chuck', 'woodchuck', 'could', 'chuck', 'wood?'] 
>>> 
Words that are 5+ characters AND end with 'ck':
 
>>> [x for x in wood if len(x) >= 5 and x.endswith('ck')]
['woodchuck', 'chuck', 'woodchuck', 'chuck'] 
>>> 
You get the idea. Basically, list comprehension for filtering starts with [x for x in li], which in fact creates a new list that's identical to li, and then tacks on an if ... clause at the end, which works as filtering criteria.
 
>>> [x for x in wood if len(x) <= 4]    # if ... clause for filtering
['How', 'much', 'wood', 'a', 'if', 'a'] 
>>> 

Transforming Items in a List

Another popular type of task with a list is to transform each item. For example, suppose I want to create a new list where each 'o' is replaced by 'oo' in every word. As before, the usual for-loop process gets the job done but is tedious:
 
>>> wood
['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if', 'a', 
'woodchuck', 'could', 'chuck', 'wood?'] 
>>> doubleo = []
>>> for x in wood:
        doubleo.append(x.replace('o', 'oo'))
>>> doubleo
['Hoow', 'much', 'wooood', 'woould', 'a', 'woooodchuck', 'chuck', 'if', 
'a', 'woooodchuck', 'coould', 'chuck', 'wooood?'] 
>>> 
Again, with list comprehension, all you need is one line of code:
 
>>> [x.replace('o', 'oo') for x in wood]
['Hoow', 'much', 'wooood', 'woould', 'a', 'woooodchuck', 'chuck', 'if', 
'a', 'woooodchuck', 'coould', 'chuck', 'wooood?'] 
>>> 
Another example -- capitalizing every word:
 
>>> [x.capitalize() for x in wood]
['How', 'Much', 'Wood', 'Would', 'A', 'Woodchuck', 'Chuck', 'If', 'A', 
'Woodchuck', 'Could', 'Chuck', 'Wood?'] 
>>> 
A list of word length, for every word in wood:
 
>>> [len(x) for x in wood]         # f(x) for transformation
[3, 4, 4, 5, 1, 9, 5, 2, 1, 9, 5, 5, 5] 
>>> 
So you can see how handy this is. The syntax works like this: starting with [x for x in li], which creates a new list that's identical to li, the initial x is substituted with f(x), a certain function with x as the input. The result is a new list where each x is transformed to f(x).

Filtering and Transformation, Applied Together

You might ask: can we filter AND transform at the same time? Sure we can. Below, we are filtering in only those words with 'wo' and then uppercasing them:
 
>>> [x.upper() for x in wood if 'wo' in x]
['WOOD', 'WOULD', 'WOODCHUCK', 'WOODCHUCK', 'WOOD?'] 
>>> 
What we have here is this syntax: [f(x) for x in li if ...]. Here's another example:
 
>>> [x+'-away' for x in wood if len(x) <= 4]    # f(x) and if ...
['How-away', 'much-away', 'wood-away', 'a-away', 'if-away', 'a-away'] 
>>> 
The transformation operation f(x) can be more complex. Below, you are filtering in words that are 5+ characters long, and outputing the words and their length as tuples.
 
>>> [(x, len(x)) for x in wood if len(x) >=5]     # f(x) and if ...
[('would', 5), ('woodchuck', 9), ('chuck', 5), ('woodchuck', 9), 
('could', 5), ('chuck', 5), ('wood?', 5)] 
>>> 
In the NLTK book, you will see a lot of examples of list comprehension in action, performing exciting operations on gigantic lists of words and other linguistic data. You should get comfortable with list comprehension: it will super-charge your text processing.