Serialization is the process of transforming objects of complex data types (custom-defined classes, object-relational mappers, datetime, etc.) to native data types so that they can then be easily converted to JSON notation.
In this blog post, we will use a neat little library called serpy to see how such transformations work. We will then integrate this code in a tornado-based web server for a quick demo of how we can write APIs returning JSON data.
Step 1: Define the Data Type
Let’s assume that we are working on an API which returns details of people, like an ID, their name, and their birthdate. For the scope of this blog post, we could say that the API has access to a database of people and that requesting /person/42
will return a JSON representation of the person with ID 42. Before we begin, let’s quickly define the data type.
class Person(object):
def __init__(self, id, name, birth_date):
self.id = id
self.name = name
self.birth_date = birth_date
@property
def sidekick(self):
return sidekick_for(self)
Assume for now that the function sidekick_for
returns the sidekick for the given person (also a Person
object). We’ll add a proper function definition later.
Now, JSON only accepts native data types like integers, strings, booleans, and so on. It’s pretty clear that the Python json.dumps
function on a Person
object won’t work. Instead, we need a representation that only uses native data types before we can pass it to a JSON encoding function.
Approach #1 – Straightforward
We could add a to_json
function to the Person
class that returns a dictionary of the Person details. That would look something like the following:
def to_json(self):
return {
'id': self.id,
'name': self.name,
'birth_date': self.birth_date.isoformat(),
'sidekick': self.sidekick.to_json() if self.sidekick else None,
}
We return the values of the attributes which make up a Person
object, and since they’re all native types, we can pass this dictionary into an encoding function to finally get the JSON notation. Looks good!
Note that we need to call isoformat
on self.birth_date
to get a string back since a Python datetime
object is not a native datatype. Also note that we’re recursively calling the to_json
function on self.sidekick
to get its JSON representation. If we don’t, that variable will end up being a Person
object which can’t be converted directly to JSON.
While this works, there are a few issues here. For one, we can’t define the field types. So if some code is consuming this JSON representation and we encounter a boolean value for id
, the calling code would be confused. Ideally we would like to handle such cases already at serialization time. Additionally, some use cases might require that the returned value be different based on some context. As an example, consider a web application that allows chat rooms where two or more users can talk to each other. In such cases, the number of unread messages for the same chat room would be different based on which user is requesting the value.
The simplest thing to do here would be to separate the serializer definition from the original class definition. This is where serpy comes in. If serpy is not yet installed, type pip install serpy==0.1.1
on the command line, and let’s see how we can use it!
Approach #2: Define a Serializer
from serpy import Serializer, IntField, StrField, MethodField
class PersonSerializer(Serializer):
id = IntField(required=True)
name = StrField(required=True)
birth_date = MethodField('serialize_birth_date')
sidekick = MethodField('serialize_sidekick')
def serialize_birth_date(self, person):
return person.birth_date.isoformat()
def serialize_sidekick(self, person):
if not person.sidekick:
return None
return PersonSerializer(person.sidekick).data
What just happened? We defined a PersonSerializer
, which is a class that defines how Person
objects should be serialized. This is good because we now know the field types, which means returning a boolean value for id is considered an error, and returning no value is also an error because id is marked as a required field.
That’s not all. We also achieved separation of concerns by moving the to_json
function into a separate serializer class. In case we have to perform some code surgery in the future, this is fantastic.
What’s also cool is that with a few more lines of code, we can add context to the serializer as well, which would then enable us to change the value of a given field depending on what the context is. Alas, that’s a topic for a separate blog post, or perhaps an exercise for you, the reader. :)
Putting it all together
Let’s write a small API server in tornado (run pip install tornado==4.5.1
on the command line) that combines all the code we wrote in this post. Save the following code in a file called server.py
in the current directory.
from datetime import datetime
from serpy import Serializer, IntField, StrField, MethodField
from tornado.escape import json_encode
from tornado.ioloop import IOLoop
from tornado.web import RequestHandler, Application
class Person(object):
def __init__(self, id, name, birth_date):
self.id = id
self.name = name
self.birth_date = birth_date
@property
def sidekick(self):
return sidekick_for(self)
class PersonSerializer(Serializer):
id = IntField(required=True)
name = StrField(required=True)
birth_date = MethodField('serialize_birth_date')
sidekick = MethodField('serialize_sidekick')
def serialize_birth_date(self, person):
return person.birth_date.isoformat()
def serialize_sidekick(self, person):
if not person.sidekick:
return None
return PersonSerializer(person.sidekick).data
batman = Person(1, 'Batman', datetime(year=1980, month=1, day=1))
robin = Person(2, 'Robin', datetime(year=1980, month=1, day=1))
def sidekick_for(person):
return robin if person == batman else None
class BatmanHandler(RequestHandler):
def get(self):
self.write(json_encode(PersonSerializer(batman).data))
if __name__ == '__main__':
Application([(r'/batman', BatmanHandler)]).listen(8888)
print('Listening on port 8888')
IOLoop.current().start()
Running it by executing python server.py
on the command line in the current working directory. Python should start a web server on port 8888, and visiting the URL /batman should show you the JSON representation of the Person
object. It works!
Conclusion
In this post we explored how to serialize Python objects to JSON. An important thing to keep in mind is that serialization is not limited to JSON. There are plenty of other data formats (XML, for instance) which could use some help. Either way, the basic concept remains the same.
An interesting follow-up exercise would be to try dumping data into different data formats and trying out other libraries (like marshmallow). Happy serializing!
If you have any questions, suggestions, or feedback, feel free to find me online. I’d love to hear from you if this post helped you build something cool!
- Email: siddhantgoel@gmail.com
- Web: https://sgoel.org
- Twitter: @siddhantgoel
- Github: @siddhantgoel