Go to: Na-Rae Han's home page  

Python 3 Notes

        [ HOME | LING 1330/2330 ]

Pickling

<< Previous Note           Next Note >>
On this page: pickle module, pickle.dump(), pickle.load(), cPickle module

Pickling: the Concept

Suppose you just spent a better part of your afternoon working in Python, processing many data sources to build an elaborate, highly structured data object. Say it is a dictionary of English words with their frequency counts, translation into other languages, etc. And now it's time to close your Python program and go eat dinner. Obviously you want to save this object for future use, but how?

You *could* write the data object out to a text file, but that's not optimal. Once written as a text file, it is a simple text file, meaning next time you read it in you will have parse the text and process it back to your original data structure.

What you want, then, is a way to save your Python data object as itself, so that next time you need it you can simply load it up and get your original object back. Pickling and unpickling let you do that. A Python data object can be "pickled" as itself, which then can be directly loaded ("unpickled") as such at a later point; the process is also known as "object serialization".

How to Pickle/Unpickle

Pickling functions are part of the pickle module. You will first need to import it. And, pickling/unpickling obviously involves file IO, so you will have to use the file writing/reading routines you learned in the previous tutorial.

Below, grades, a small dictionary data object, is being pickled. pickle.dump() is the method for saving the data out to the designated pickle file, usually with the .p or .pkl extension.
grades = {'Bart':75, 'Lisa':98, 'Milhouse':80, 'Nelson':65}

import pickle              # import module first

f = open('gradesdict.pkl', 'w')   # Pickle file is newly created where foo1.py is
pickle.dump(grades, f)          # dump data to f
f.close()                 
foo1.py 
Unpickling works as follows. Again you start by importing pickle. You then open the pickle file for reading, load the content into a new variable, and close up the file. Loading is done through the pickle.load() method. Your dictionary has a different name of mydict, but the content is the same.
import pickle              # import module first

f = open('gradesdict.pkl', 'r')   # 'r' for reading; can be omitted
mydict = pickle.load(f)         # load file content as mydict
f.close()                       

print(mydict)
# prints {'Lisa': 98, 'Bart': 75, 'Milhouse': 80, 'Nelson': 65}
foo2.py 

Pickling in the Binary

The default pickling routine shown above saves the data as an ASCII text file, albeit in a Python-specific data format. This means that your pickle file is going to be large. For improved efficiency, it is recommended to use a binary protocol instead. This is basically achieved by specifying a third, optional "protocol level" argument while dumping, e.g., pickle.dump(grades, f, -1). "-1" means the highest available binary protocol. In addition, file IO will have to be done in a binary mode: you need to use 'wb' ('b' for binary) during file writing and 'rb' during file opening.
grades = {'Bart':75, 'Lisa':98, 'Milhouse':80, 'Nelson':65}

import pickle

f = open('gradesdict.pkl', 'wb')   # 'wb' instead 'w' for binary file
pickle.dump(grades, f, -1)       # -1 specifies highest binary protocol
f.close()                 
foo1.py 
import pickle

f = open('gradesdict.pkl', 'rb')   # 'rb' for reading binary file
mydict = pickle.load(f)     
f.close()                       

print(mydict)
# prints {'Lisa': 98, 'Bart': 75, 'Milhouse': 80, 'Nelson': 65}
foo2.py 
One caveat of having the binary protocol option is that for a particular pickle file you might not remember if it was pickled in the binary mode or not. For this reason, you should pick a pickling mode you routinely use and stick with it. Actually, you should always use the binary protocol.