Python Persistence Management
By Patrick O'Brien2005-04-05
Object persistence
If you want to transparently store Python objects without losing their identity, type, etc., then you need some form of object serialization: a process that turns arbitrarily complex objects into textual or binary representations of those objects. Likewise, you must be able to restore the serialized form of an object back into an object that is the same as the original. In Python the serialization process is called pickling, and you can pickle/unpickle your objects to/from a string, a file on disk, or any file-like object. We'll look at pickling in detail later in this article.
Let's say you like the idea of keeping everything as an object and avoiding the overhead of translating objects into some kind of non-object based storage. Pickle files provide those benefits, but sometimes you need something more robust and scalable than simple pickle files. For example, pickling alone doesn't solve the problem of naming and locating the pickle files, nor does it support concurrent access to persistent objects. For those features you need to turn to something like ZODB, the Z object database for Python. ZODB is a robust, multi-user, object-oriented database system capable of storing and managing arbitrarily complex Python objects with transaction support and concurrency control. (See Resources to download ZODB.) Interestingly enough, even ZODB relies upon Python's native serialization capability, and to use ZODB effectively you must have a solid understanding of pickling.
Another interesting approach to the persistence problem, originally implemented in Java, is called Prevayler. (See Resources for a developerWorks article on Prevaylor.) A group of Python programmers recently ported Prevayler to Python and the result, called PyPerSyst, is hosted on SourceForge. (See Resources for a link to the PyPerSyst project.) The Prevayler/PyPerSyst concept also builds upon the native serialization capabilities of the Java and Python languages. PyPerSyst keeps an entire object system in memory, and provides disaster recovery by occasionally pickling a snapshot of the system to disk and by maintaining a log of commands that can be reapplied to the latest snapshot. While applications that use PyPerSyst are therefore limited by available RAM, the advantages are that a native object system completely loaded in memory is extremely fast and is much simpler to implement than one, such as ZODB, that allows for more objects than can be held in memory at once.
Now that we've briefly touched upon the various ways to store our persistent objects, it's time to examine the pickling process in detail. While our main interest is in exploring ways to persist Python objects without having to translate them into some other format, we are still left with various concerns, such as: how to effectively pickle and unpickle both simple and complex objects, including instances of custom classes; how to maintain object references, including circular and recursive references; and how to handle changes to class definitions without running into problems with previously pickled instances. We'll cover all of these issues in the following examination of Python's pickling capabilities.
Tutorial Pages:
» Use serialization to store Python objects
» Object persistence
» A peck of pickled Python
» Pickle power
» Schema evolution
» Conclusion
» Resources
First published by IBM DeveloperWorks
