JSON Serialization in Python using serpy

August 30, 2017
Written by
Siddhant Goel
Contributor
Opinions expressed by Twilio contributors are their own

python-serpy

Serialization is the process of transforming objects of complex data types (custom-defined classes, object-relational mappers, datetime, etc.) to native data types so that they can then be easily converted to JSON notation.

In this blog post, we will use a neat little library called serpy to see how such transformations work. We will then integrate this code in a tornado-based web server for a quick demo of how we can write APIs returning JSON data.

Step 1: Define the Data Type

Let’s assume that we are working on an API which returns details of people, like an ID, their name, and their birthdate. For the scope of this blog post, we could say that the API has access to a database of people and that requesting /person/42 will return a JSON representation of the person with ID 42. Before we begin, let’s quickly define the data type.

class Person(object):
    def __init__(self, id, name, birth_date):
        self.id = id
        self.name = name
        self.birth_date = birth_date

    @property
    def sidekick(self):
        return sidekick_for(self)

Assume for now that the function sidekick_for returns the sidekick for the given person (also a Person object). We’ll add a proper function definition later.

Now, JSON only accepts native data types like integers, strings, booleans, and so on. It’s pretty clear that the Python json.dumps function on a Person object won’t work. Instead, we need a representation that only uses native data types before we can pass it to a JSON encoding function.

Approach #1 – Straightforward

We could add a to_json function to the Person class that returns a dictionary of the Person details. That would look something like the following:

def to_json(self):
    return {
        'id': self.id,
        'name': self.name,
        'birth_date': self.birth_date.isoformat(),
        'sidekick': self.sidekick.to_json() if self.sidekick else None,
    }

We return the values of the attributes which make up a Person object, and since they’re all native types, we can pass this dictionary into an encoding function to finally get the JSON notation. Looks good!

giphy.gif

Note that we need to call isoformat on self.birth_date to get a string back since a Python datetime object is not a native datatype. Also note that we’re recursively calling the to_json function on self.sidekick to get its JSON representation. If we don’t, that variable will end up being a Person object which can’t be converted directly to JSON.

While this works, there are a few issues here. For one, we can’t define the field types. So if some code is consuming this JSON representation and we encounter a boolean value for id, the calling code would be confused. Ideally we would like to handle such cases already at serialization time. Additionally, some use cases might require that the returned value be different based on some context. As an example, consider a web application that allows chat rooms where two or more users can talk to each other. In such cases, the number of unread messages for the same chat room would be different based on which user is requesting the value.

The simplest thing to do here would be to separate the serializer definition from the original class definition. This is where serpy comes in. If serpy is not yet installed, type pip install serpy==0.1.1 on the command line, and let’s see how we can use it!

Approach #2: Define a Serializer

from serpy import Serializer, IntField, StrField, MethodField


class PersonSerializer(Serializer):
    id = IntField(required=True)
    name = StrField(required=True)
    birth_date = MethodField('serialize_birth_date')
    sidekick = MethodField('serialize_sidekick')

    def serialize_birth_date(self, person):
        return person.birth_date.isoformat()

    def serialize_sidekick(self, person):
        if not person.sidekick:
            return None
        return PersonSerializer(person.sidekick).data

What just happened? We defined a PersonSerializer, which is a class that defines how Person objects should be serialized. This is good because we now know the field types, which means returning a boolean value for id is considered an error, and returning no value is also an error because id is marked as a required field.

That’s not all. We also achieved separation of concerns by moving the to_json function into a separate serializer class. In case we have to perform some code surgery in the future, this is fantastic.

giphy.gif

What’s also cool is that with a few more lines of code, we can add context to the serializer as well, which would then enable us to change the value of a given field depending on what the context is. Alas, that’s a topic for a separate blog post, or perhaps an exercise for you, the reader. :)

Putting it all together

Let’s write a small API server in tornado (run pip install tornado==4.5.1 on the command line) that combines all the code we wrote in this post. Save the following code in a file called server.py in the current directory.

from datetime import datetime

from serpy import Serializer, IntField, StrField, MethodField
from tornado.escape import json_encode
from tornado.ioloop import IOLoop
from tornado.web import RequestHandler, Application

class Person(object):
    def __init__(self, id, name, birth_date):
        self.id = id
        self.name = name
        self.birth_date = birth_date

    @property
    def sidekick(self):
        return sidekick_for(self)

class PersonSerializer(Serializer):
    id = IntField(required=True)
    name = StrField(required=True)
    birth_date = MethodField('serialize_birth_date')
    sidekick = MethodField('serialize_sidekick')

    def serialize_birth_date(self, person):
        return person.birth_date.isoformat()

    def serialize_sidekick(self, person):
        if not person.sidekick:
            return None
        return PersonSerializer(person.sidekick).data

batman = Person(1, 'Batman', datetime(year=1980, month=1, day=1))
robin = Person(2, 'Robin', datetime(year=1980, month=1, day=1))

def sidekick_for(person):
    return robin if person == batman else None

class BatmanHandler(RequestHandler):
    def get(self):
        self.write(json_encode(PersonSerializer(batman).data))

if __name__ == '__main__':
    Application([(r'/batman', BatmanHandler)]).listen(8888)
    print('Listening on port 8888')

    IOLoop.current().start()

Running it by executing python server.py on the command line in the current working directory. Python should start a web server on port 8888, and visiting the URL /batman should show you the JSON representation of the Person object. It works!

giphy.gif

Conclusion

In this post we explored how to serialize Python objects to JSON. An important thing to keep in mind is that serialization is not limited to JSON. There are plenty of other data formats (XML, for instance) which could use some help. Either way, the basic concept remains the same.
An interesting follow-up exercise would be to try dumping data into different data formats and trying out other libraries (like marshmallow). Happy serializing!

If you have any questions, suggestions, or feedback, feel free to find me online. I’d love to hear from you if this post helped you build something cool!