Helping ordinary people create extraordinary websites!
HOME TUTORIALS SCRIPTS WEB HOSTING BLOG FORUM
Get Our Newsletter
Email:

Python Persistence Management

By Patrick O'Brien
2005-04-05


Pickle power

So far we've covered the basics of pickling. In this section, we'll cover some advanced issues that arise when you start to pickle complex objects, including instances of custom classes. Fortunately, you'll see that Python handles these situations quite readily.

Portability
Pickles are portable over space and time. In other words, the pickle file format is independent of machine architecture, which means you can create a pickle under Linux, for example, and send it to a Python program running under Windows or the Mac OS. And when you upgrade to a newer version of Python, you don't have to worry that you might be abandoning existing pickles. The Python developers have guaranteed that the pickle format will be backwards compatible across Python versions. In fact, details about current and supported formats are provided with the pickle module:

Listing 3. Retrieving supported formats
>>> pickle.format_version

'1.3'
>>> pickle.compatible_formats
['1.0', '1.1', '1.2']
Multiple references, same object
In Python, a variable is a reference to an object. And you can have multiple variables referencing the same object. It turns out that Python has no trouble at all maintaining this behavior with pickled objects, as Listing 4 demonstrates:

Listing 4. Maintenance of object references

>>> a = [1, 2, 3]

>>> b = a
>>> a
[1, 2, 3]
>>> b
[1, 2, 3]
>>> a.append(4)
>>> a
[1, 2, 3, 4]
>>> b
[1, 2, 3, 4]
>>> c = pickle.dumps((a, b))
>>> d, e = pickle.loads(c)
>>> d
[1, 2, 3, 4]
>>> e
[1, 2, 3, 4]
>>> d.append(5)
>>> d
[1, 2, 3, 4, 5]
>>> e
[1, 2, 3, 4, 5]
Circular and recursive references
The support for object references that we just demonstrated extends to circular references, where two objects contain references to each other, and recursive references, where an object contains a reference to itself. The following two listings highlight this capability. Let's look at a recursive reference first:

Listing 5. Recursive reference

>>> l = [1, 2, 3]

>>> l.append(l)
>>> l
[1, 2, 3, [...]]
>>> l[3]
[1, 2, 3, [...]]
>>> l[3][3]
[1, 2, 3, [...]]
>>> p = pickle.dumps(l)
>>> l2 = pickle.loads(p)
>>> l2
[1, 2, 3, [...]]
>>> l2[3]
[1, 2, 3, [...]]
>>> l2[3][3]
[1, 2, 3, [...]]
Now let's look at an example of a circular reference:

Listing 6. Circular reference

>>> a = [1, 2]

>>> b = [3, 4]
>>> a.append(b)
>>> a
[1, 2, [3, 4]]
>>> b.append(a)
>>> a
[1, 2, [3, 4, [...]]]
>>> b
[3, 4, [1, 2, [...]]]
>>> a[2]
[3, 4, [1, 2, [...]]]
>>> b[2]
[1, 2, [3, 4, [...]]]
>>> a[2] is b
1
>>> b[2] is a
1
>>> f = file('temp.pkl', 'w')
>>> pickle.dump((a, b), f)
>>> f.close()
>>> f = file('temp.pkl', 'r')
>>> c, d = pickle.load(f)
>>> f.close()
>>> c
[1, 2, [3, 4, [...]]]
>>> d
[3, 4, [1, 2, [...]]]
>>> c[2]
[3, 4, [1, 2, [...]]]
>>> d[2]
[1, 2, [3, 4, [...]]]
>>> c[2] is d
1
>>> d[2] is c
1
Notice how we get slightly, but significantly, different results when we pickle each object separately, rather than pickling them together inside a tuple as shown in Listing 7:

Listing 7. Pickling separately versus together inside a tuple

>>> f = file('temp.pkl', 'w')

>>> pickle.dump(a, f)
>>> pickle.dump(b, f)
>>> f.close()
>>> f = file('temp.pkl', 'r')
>>> c = pickle.load(f)
>>> d = pickle.load(f)
>>> f.close()
>>> c
[1, 2, [3, 4, [...]]]
>>> d
[3, 4, [1, 2, [...]]]
>>> c[2]
[3, 4, [1, 2, [...]]]
>>> d[2]
[1, 2, [3, 4, [...]]]
>>> c[2] is d
0
>>> d[2] is c
0
Equal, but not always identical
As we hinted in our last example, objects are only identical if they refer to the same object in memory. In the case of pickles, each is restored to an object that is equal to its original, but not identical. In other words, each pickle is a copy of the original object:

Listing 8. Restored objects as copies of originals

>>> j = [1, 2, 3]

>>> k = j
>>> k is j
1
>>> x = pickle.dumps(k)
>>> y = pickle.loads(x)
>>> y
[1, 2, 3]
>>> y == k
1
>>> y is k
0
>>> y is j
0
>>> k is j
1
At the same time, we saw that Python is able to maintain references between objects that are pickled as a unit. However, we also saw that separate calls to dump() take away Python's ability to maintain references to objects outside of the unit being pickled. Instead, Python makes a copy of the referenced object and stores it with the item being pickled. This isn't a problem for an application that pickles and restores a single object hierarchy. But it is something to be aware of for other situations.

It's also worth pointing out that there is an option that does allow separately pickled objects to maintain references to each other as long as they are all pickled to the same file. The pickle and cPickle modules provide a Pickler (and corresponding Unpickler) that is able to keep track of objects that have already been pickled. By using this Pickler, shared and circular references will be pickled by reference, rather than by value:

Listing 9. Maintenance of references among separately pickled objects

>>> f = file('temp.pkl', 'w')

>>> pickler = pickle.Pickler(f)
>>> pickler.dump(a)
<cPickle.Pickler object at 0x89b0bb8>
>>> pickler.dump(b)
<cPickle.Pickler object at 0x89b0bb8>
>>> f.close()
>>> f = file('temp.pkl', 'r')
>>> unpickler = pickle.Unpickler(f)
>>> c = unpickler.load()
>>> d = unpickler.load()
>>> c[2]
[3, 4, [1, 2, [...]]]
>>> d[2]
[1, 2, [3, 4, [...]]]
>>> c[2] is d
1
>>> d[2] is c
1
Nonpicklable objects
A few object types cannot be pickled. For example, Python cannot pickle a file object (or any object with a reference to a file object), because Python cannot guarantee that it can recreate the state of the file upon unpickling. (The other examples are so obscure that they aren't worth mentioning in an article of this nature.) Attempting to pickle a file object results in the following error:

Listing 10. Result of trying to pickle a file object

>>> f = file('temp.pkl', 'w')

>>> p = pickle.dumps(f)
Traceback (most recent call last):
File "", line 1, in ?
File "/usr/lib/python2.2/copy_reg.py", line 57, in _reduce
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle file objects
Class instances
The pickling of class instances requires a bit more attention than the pickling of simple object types. This is primarily due to the fact that Python pickles the instance data (usually the _dict_ attribute) and the name of the class, but not the code for the class. When Python unpickles a class instance, it attempts to import the module containing the class definition using the exact class name and module name (including any package path prefixes) as they were at the time the instance was pickled. Also note that class definitions must appear at the top level of a module, meaning they cannot be nested classes (classes defined inside other classes or functions).

When class instances are unpickled, their _init_() method isn't normally called again. Instead, Python creates a generic class instance, applies the instance attributes that were pickled, and sets the instance's _class_ attribute to point to the original class.

New-style classes, introduced in Python 2.2, rely on a slightly different unpickling mechanism. While the result of the process is essentially the same as with old-style classes, Python uses the copy_reg module's _reconstructor() function to restore new-style class instances.

If you want to modify the default pickling behavior for either new-style or old-style class instances, you can define special class methods, named _getstate_() and _setstate_(), that will be called by Python during the saving and restoring of state information for instances of the class. We'll see some examples that make use of these special methods in the following sections.

For now, let's take a look at a simple class instance. To begin, we created a Python module named persist.py, containing the following new-style class definition:

Listing 11. New-style class definition

class Foo(object):


def __init__(self, value):
self.value = value
Now we can pickle a Foo instance and take a look at its representation:

Listing 12. Pickling a Foo instance

>>> import cPickle as pickle

>>> from Orbtech.examples.persist import Foo
>>> foo = Foo('What is a Foo?')
>>> p = pickle.dumps(foo)
>>> print p
ccopy_reg
_reconstructor
p1
(cOrbtech.examples.persist
Foo
p2
c__builtin__
object
p3
NtRp4
(dp5
S'value'
p6
S'What is a Foo?'
sb.
>>>
You can see that the class name, Foo, and the fully qualified module name, Orbtech.examples.persist, are both stored in the pickle. If we had pickled this instance to a file, and unpickled it later or on another machine, Python would attempt to import the Orbtech.examples.persist module and would raise an exception if it could not. Similar errors would occur if we renamed the class, renamed the module, or moved the module to another directory.

Here is the error Python gives when we rename the Foo class and then try to load a previously pickled Foo instance:

Listing 13. Trying to load a pickled instance of a renamed Foo class

>>> import cPickle as pickle

>>> f = file('temp.pkl', 'r')
>>> foo = pickle.load(f)
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'module' object has no attribute 'Foo'
A similar error occurs when we rename the persist.py module:

Listing 14. Trying to load a pickled instance of a renamed persist.py module

>>> import cPickle as pickle

>>> f = file('temp.pkl', 'r')
>>> foo = pickle.load(f)
Traceback (most recent call last):
File "<input>", line 1, in ?
ImportError: No module named persist
We'll provide techniques for managing these kinds of changes, without breaking existing pickles, in the Schema evolution section below.

Special state methods
Earlier we mentioned that a few object types, such as file objects, cannot be pickled. One way to handle instance attributes that are not picklable objects is to use the special methods available for modifying a class instance's state: _getstate_() and _setstate_(). Here is an example of our Foo class, which we've modified to handle a file object attribute:

Listing 15. Handling unpicklable instance attributes

class Foo(object):


def __init__(self, value, filename):
self.value = value
self.logfile = file(filename, 'w')

def __getstate__(self):
"""Return state values to be pickled."""
f = self.logfile
return (self.value, f.name, f.tell())

def __setstate__(self, state):
"""Restore state from the unpickled state values."""
self.value, name, position = state
f = file(name, 'w')
f.seek(position)
self.logfile = f
When an instance of Foo is pickled, Python will pickle only the values returned to it when it calls the instance's _getstate_() method. Likewise, during unpickling, Python will supply the unpickled values as an argument to the instance's _setstate_() method. Inside the _setstate_() method we are able to recreate the file object based on the name and position information we pickled, and assign the file object to the instance's logfile attribute.

Tutorial Pages:
» Use serialization to store Python objects
» Object persistence
» A peck of pickled Python
» Pickle power
» Schema evolution
» Conclusion
» Resources


First published by IBM DeveloperWorks


 | Bookmark
Related Tutorials:
» Python and Java - A Side by Side Comparison
» Learn Python in 10 Minutes
» Python 201 - (Slightly) Advanced Python Topics
» Python 101 - Introduction to Python
» Google Sitemaps
» Python 101