Validating Content of JSONFields in Django Models

Learn how to validate content stored in a JSONField with JSONSchema in Django.

Posted by Pierre on Feb. 10, 2024, 1:44 p.m.

Storing data in a relational database by using JSON offers many benefits:

  • Flexibility to change its content without changing the database schema. As an example, it is common to use jsonfields to store metadata from a third party service.
  • Large complex models can be stored in database with low cost for read and write, avoiding costly joins.

Since Django 3.1, JSONField is available for all supported database backends. It previously was part of the django.contrib.postgres.fields package and limited to be used with a PostgreSQL database.

One potential disadvantage of storing data in JSONField is that the data it contains can become unpredictable overtime. This means that as a developer, it can be challenging to understand precisely how the content of a JSONField is structured and to know what it contains (or it may require reading through a substantial amount of code).”

In this article, we will attempt to address this problem by:

  • Using JSON Schema to validate the content of a dict.
  • Creating a new JSONSchemaField to be able to declare a JSON Schema and to use it to validate the content of a model field.

The sources of the project are available on Github.

Table of content

here we go

Introduction to JSON Schema

JSONSchema is a vocabulary that you can use to annotate and validate JSON documents.

In essence, a JSONSchema makes it easy to specify the following aspects of your JSON object:

  • A list of properties it should contains.
  • The type (and optional format) for each property.
  • Whether the properties are required or can be optional.
  • If an object should accept additional (or not) properties beyond those specified in the schema.

Here is a straightforward example of a Product object, which has a name and a price:

{
  "title": "Product",
  "description": "A product from Acme's catalog",
  "type": "object",
  "properties": {
    "productId": {
      "description": "The unique identifier for a product",
      "type": "integer"
    },
    "productName": {
      "description": "Name of the product",
      "type": "string"
    },
    "price": {
      "description": "The price of the product",
      "type": "number",
      "exclusiveMinimum": 0
    },
    "tags": {
      "description": "Tags for the product",
      "type": "array",
      "items": {
        "type": "string"
      },
      "minItems": 1,
      "uniqueItems": true
    }
  },
  "required": [ "productId", "productName", "price" ]
}
  • Both the title and description properties serve as pure documentation and don’t impose any constraints on the data being validated. Although they aren’t required to be declared in the schema, you can use them for natural documentation in your code.
  • The type keyword enforces a validation rule: the input data must match the type specified in the schema.
  • A JSON object with three required properties (productId, productName, and price) is considered valid. An optional tags property can be included but isn’t listed in the required field of the schema.
  • To disallow additional properties, set the additionalProperties keyword to false.

More examples can be found in the miscellaneous-examples page of the documentation.

Exploring the python-jsonschema library

We will use python-jsonschema to validate the content of a Python dict. This library provides an implementation of the JSON Schema specification for Python.

pip install jsonschema

Here is an example taken directly from the documentation:

>>> from jsonschema import validate

>>> # A sample schema, like what we'd get from json.load()
>>> schema = {
    "type" : "object",
    "properties" : {
        "price" : {"type" : "number"},
        "name" : {"type" : "string"},
    },
}

>>> # If no exception is raised by validate(), the instance is valid.
>>> validate(instance={"name" : "Eggs", "price" : 34.99}, schema=schema)

>>> validate(
>>>    instance={"name" : "Eggs", "price" : "Invalid"}, schema=schema,
>>> )                                   
Traceback (most recent call last):
    ...
ValidationError: 'Invalid' is not of type 'number'

Validating JSON Fields in Django Models

In this section, we will utilize the ease of the python-jsonschema library to validate the content of a JSONField before saving it into our database. To accomplish this task, we will create a custom Django model field named JSONSchemaField. This field will inherit from JSONField, maintaining its original behavior while applying an additional layer of validation using python-jsonschema.

Our goal is to define a user information model that stores complementary information about users. Here’s an example of the resulting model::

from django.db import models
from django.contrib.auth.models import User
from django.conf import settings

from .fields import JSONSchemaField

class UserInformation(models.Model)
    user = models.OneToOneField(
        settings.AUTH_USER_MODEL,
        on_delete=models.CASCADE,
    )

    information = JSONSchemaField(
        schema={
            "type": "object",
            "properties": {
                "name": {"type": "string", "maxLength": 255},
                "email": {"type": "string", "format": "email"},
                "centers_of_interest": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "maxLength": 255,
                    },
                },
            },
            "required": ["name", "email"],
            "additionalProperties": False,
        },
    )

This simple model declare:

  • A foreign key relationship with the User model.
  • An optional field to store extra information as JSON data.

Of course, it is just a simple use case for demonstration. You may have a more interesting one!


First, create a new file (which I will name fields.py) and place it in a Django app.

from django.db import models
from django.core import checks, exceptions

from jsonschema import SchemaError
from jsonschema import exceptions as jsonschema_exceptions
from jsonschema import validate
from jsonschema.validators import validator_for


class JSONSchemaField(models.JSONField):
    """
    JSONField with a schema validation by using `python-jsonschema`.
    Cf: https://python-jsonschema.readthedocs.io/en/stable/validate/
    """

    def __init__(self, *args, **kwargs):
        self.schema = kwargs.pop("schema", None)
        super().__init__(*args, **kwargs)

    def validate(self, value, model_instance):
        return super().validate(value, model_instance)

    def check(self, **kwargs):
        return super().check(**kwargs)

Our new field declares:

  • a schema argument. It will be used to declare the json schema to use at field level.
  • a validate() function: it is responsible to validate the content to be stored in the field. This function is used by Django when validating forms, admin forms, DRF serializers … If you are unfamiliar with it, I recommend the reading of the Form and field validation documentation.
  • a check() function: it is responsible to performs checks and to assert that the field is correctly implemented to your model. We use it to validate the schema associated to our field. Django’s system checks are performed at the startup of the server (and when you run python manage.py check) and inform you about errors in your project. For this implementation, I take inspiration from Django’s CharField that do the same.

The checks

It’s recommended to start by implementing the checks. Implementing checks first can save you time by ensuring that your JSON schema is valid.

The check() method is prototyped like this:

  • It take **kwargs that contains the parameters you declared for your field (as an example, max_length for a CharField). So we will received our schema in it.
  • It returns a list of CheckMessage objects (an empty list meaning no errors) that may have different level of severity (Debug, Info, Warning, Error, Critical). If the check() function returns messages with level greater or equal to Error, then Django will prevent management commands from executing. Messages are reported to the console otherwise but won’t prevent the server to start.

Here’s an example implementation of our check function:

from jsonschema.validators import validator_for

class JSONSchemaField(models.JSONField):
    def __init__(self, *args, **kwargs):
        self.schema = kwargs.pop("schema", None)
        super().__init__(*args, **kwargs)

    @property
    def _has_valid_schema(self) -> bool:
        if not isinstance(self.schema, dict):
            return False

        # Determine validator class for the schema.
        schema_cls = validator_for(self.schema)
        try:
            # Check the schema.
            schema_cls.check_schema(self.schema)
        except SchemaError:
            return False
        return True

    def _check_schema_attribute(self):
        if self.schema is None:
            return [
                checks.Error(
                    "JSONSchemaField must define a 'schema' attribute.", obj=self
                )
            ]
        elif not self._has_valid_schema:
            return [
                checks.Error("Given 'schema' is not a valid json schema.", obj=self)
            ]
        else:
            return []

    def check(self, **kwargs):
        return [*super().check(**kwargs), *self._check_schema_attribute()]

We perform the following checks on the schema attribute:

  • Ensure that it is a dict.
  • Ensure that it is a valid JSON schema.
  • During this step, we call the validator_for() method. This function is able to retrieve the validator class appropriate for validating the given schema. As an example, if you declare a $schema property in your jsonschema, it will use it to determine the appropriate class to be used to validate the format of your schema.

You can validate your implementation by running:

>>> python manage.py check
System check identified no issues (0 silenced).

Now, you can try to mess up with your JSON schema and rerun the command:

    information = JSONSchemaField(
        schema={
            "type": "foo",
            "properties": {
            },
        },
    )
>>> python manage.py check
SystemCheckError: System check identified some issues:

ERRORS:
jsonschemafield.UserInformation.information: Given 'schema' is not a valid json schema.

System check identified 1 issue (0 silenced).

The validation process

It is now the moment to validate the input to be stored in our JSON Field.

To do this, we will override the content of the JSONSchemaField.validate() method and use the validate() method from the jsonschema library.

from jsonschema import validate

class JSONSchemaField(models.JSONField):

    def validate(self, value, model_instance):
        """Validate the content of the json field."""
        super().validate(value, model_instance)
        try:
            validate(instance=value, schema=self.schema)
        except jsonschema_exceptions.ValidationError as e:
            raise exceptions.ValidationError(
                "Invalid json content: %(value)s",
                code="invalid_content",
                params={"value": e.message},
            )

You can validate your implement by playing with the Django’s admin or by opening a shell:

>>> from myapp.models import UserInformation
>>> from django.contrib.auth import get_user_model
>>> User = get_user_model()
>>> user = User.objects.get(pk=42)
>>> user_information = UserInformation(user=user, information={'name': 'Joe', 'email': 'joe@fasterthan.fr'})
>>> user_information.full_clean()
>>> user_information = UserInformation(user=user, information={'name': 'Joe', 'email': 'joe@fasterthan.fr', 'foo': 'bar'})
>>> user_information.full_clean()
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[9], line 1
----> 1 user_information.full_clean()

File ~/.venvs/blog-entry-jsonschemafield/lib/python3.10/site-packages/django/db/models/base.py:1552, in Model.full_clean(self, exclude, validate_unique, validate_constraints)
   1549         errors = e.update_error_dict(errors)
   1551 if errors:
-> 1552     raise ValidationError(errors)

ValidationError: {'information': ["Invalid json content: Additional properties are not allowed ('foo' was unexpected)"]}

(Optional) Enforcing format consistency for properties

JSONSchema exposes a format keyword that you can use to validate the content of a property. As an example, it is possible to validate that a property is a valid email. Lets have a look of the schema we declared ealier:

    information = JSONSchemaField(
        schema={
            "type": "object",
            "properties": {
                "name": {"type": "string", "maxLength": 255},
                "email": {"type": "string", "format": "email"},
                "centers_of_interest": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "maxLength": 255,
                    },
                },
            },
            "required": ["name", "email"],
            "additionalProperties": False,
        },
    )

You can see that the email property declare a format or type email. But, if we try to validate again our content with a malformed email:

>>> user_information = UserInformation(user=user, information={'name': 'Joe', 'email': 'bar'})
>>> user_information.full_clean()

no ValidationError is raised this time …

This is because the jsonschema specification do not enforce the validation of the format. So the python library follow this behavior: https://python-jsonschema.readthedocs.io/en/stable/validate/#validating-formats. It is then possible to validate the format of our properties by updating our code to give the format_checker argument to the validate() method:

class JSONSchemaField(models.JSONField):

    def validate(self, value, model_instance):
        super().validate(value, model_instance)

        # Determine validator class for the schema ...
        schema_cls = validator_for(self.schema)
        try:
            validate(
                instance=value, 
                schema=self.schema,
                # ... use it to validate the content of the json.
                format_checker=schema_cls.FORMAT_CHECKER,
            )
        except jsonschema_exceptions.ValidationError as e:
            raise exceptions.ValidationError(
                "Invalid json content: %(value)s",
                code="invalid_content",
                params={"value": e.message},
            )

You can now run the same code again:

>>> user_information = UserInformation(user=user, information={'name': 'Joe', 'email': 'bar'})
>>> user_information.full_clean()
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[7], line 1
----> 1 user_information.full_clean()

File ~/.venvs/blog-entry-jsonschemafield/lib/python3.10/site-packages/django/db/models/base.py:1552, in Model.full_clean(self, exclude, validate_unique, validate_constraints)
   1549         errors = e.update_error_dict(errors)
   1551 if errors:
-> 1552     raise ValidationError(errors)

ValidationError: {'information': ["Invalid json content: 'bar' is not a 'email'"]}

Please note that, as specified into the documentation of python-jsonschema:

  • some formats require additional dependencies to be installed
  • the email format we used in this example don’t (as for other formats such as date or ipv4).
  • if required dependencies are not installed, then validation will succeed without throwing an error.

If you want to install extra dependencies in order to be able to use any format, then you can do it this way:

pip install jsonschema'[format]'

or

pip install jsonschema'[format-nongpl]'

Please note that if your app use softwares under the GPL licence it imply for it to be distributed under the GPL license. If you want to avoid that, then use the format-nongpl package.

Last thoughts

We arrive to the end of this article. Here is the final version of the code:

fields.py

from django.db import models
from django.core import checks, exceptions

from jsonschema import SchemaError
from jsonschema import exceptions as jsonschema_exceptions
from jsonschema import validate
from jsonschema.validators import validator_for


class JSONSchemaField(models.JSONField):
    """
    JSONField with a schema validation by using `python-jsonschema`.
    Cf: https://python-jsonschema.readthedocs.io/en/stable/validate/
    """

    def __init__(self, *args, **kwargs):
        self.schema = kwargs.pop("schema", None)
        super().__init__(*args, **kwargs)

    @property
    def _has_valid_schema(self):
        """Check that the given `schema` is a valid json schema."""
        schema_cls = validator_for(self.schema)
        try:
            schema_cls.check_schema(self.schema)
        except SchemaError:
            return False
        return True

    def check(self, **kwargs):
        return [*super().check(**kwargs), *self._check_schema_attribute()]

    def _check_schema_attribute(self):
        """Ensure that the given schema is a valid json schema during Django's checks."""
        if self.schema is None:
            return [
                checks.Error(
                    "JSONSchemaField must define a 'schema' attribute.", obj=self
                )
            ]
        elif not self._has_valid_schema:
            return [
                checks.Error("Given 'schema' is not a valid json schema.", obj=self)
            ]
        else:
            return []

    def validate(self, value, model_instance):
        """Validate the content of the json field."""
        super().validate(value, model_instance)
        schema_cls = validator_for(self.schema)
        try:
            validate(instance=value, schema=self.schema, format_checker=schema_cls.FORMAT_CHECKER)
        except jsonschema_exceptions.ValidationError as e:
            raise exceptions.ValidationError(
                "Invalid json content: %(value)s",
                code="invalid_content",
                params={"value": e.message},
            )

models.py

from django.db import models
from django.contrib.auth.models import User
from django.conf import settings

from .fields import JSONSchemaField


class UserInformation(models.Model):
    user = models.OneToOneField(
        settings.AUTH_USER_MODEL,
        on_delete=models.CASCADE,
    )

    information = JSONSchemaField(
        schema={
            "$schema": "https://json-schema.org/draft/2020-12/schema",
            "type": "object",
            "properties": {
                "name": {"type": "string", "maxLength": 255},
                "email": {"type": "string", "format": "email"},
                "centers_of_interest": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "maxLength": 255,
                    },
                },
            },
            "required": ["name", "email"],
            "additionalProperties": False,
        },
    )

The sources of the project are available on github.

The code we produced will validate the content of your JSONField in forms, admin forms, drf serializers, … Depending on your project, you may want to enforce model’s validation at save. It can be done by calling full_clean() in the save() method of your Model.

class UserInformation(models.Model):

    def save(self, *args, **kwargs):
        super().full_clean()
        super().save(*args, **kwargs)

Please note that the validation of JSONSchemaField is done on the application side (eg: with python code) and not on the database side. It mean that it won’t work on bulk create and update.

It is also possible to do more complex validation that those we’ve seen today. As an example, the following schema can be used to assert that a url starts with https://:

"url": {"type": "string", "format": "uri", "pattern": "^https://"},