Storing data in a relational database by using JSON offers many benefits:
- Flexibility to change its content without changing the database schema. As an example, it is common to use jsonfields to store metadata from a third party service.
- Large complex models can be stored in database with low cost for read and write, avoiding costly joins.
Since Django 3.1, JSONField is available for all supported database backends. It previously was part of the django.contrib.postgres.fields package and limited to be used with a PostgreSQL database.
One potential disadvantage of storing data in JSONField is that the data it contains can become unpredictable overtime. This means that as a developer, it can be challenging to understand precisely how the content of a JSONField is structured and to know what it contains (or it may require reading through a substantial amount of code).”
In this article, we will attempt to address this problem by:
- Using JSON Schema to validate the content of a
dict. - Creating a new
JSONSchemaFieldto be able to declare a JSON Schema and to use it to validate the content of a model field.
The sources of the project are available on Github.
Table of content

Introduction to JSON Schema
JSONSchema is a vocabulary that you can use to annotate and validate JSON documents.
In essence, a JSONSchema makes it easy to specify the following aspects of your JSON object:
- A list of properties it should contains.
- The type (and optional format) for each property.
- Whether the properties are required or can be optional.
- If an object should accept additional (or not) properties beyond those specified in the schema.
Here is a straightforward example of a Product object, which has a name and a price:
{
"title": "Product",
"description": "A product from Acme's catalog",
"type": "object",
"properties": {
"productId": {
"description": "The unique identifier for a product",
"type": "integer"
},
"productName": {
"description": "Name of the product",
"type": "string"
},
"price": {
"description": "The price of the product",
"type": "number",
"exclusiveMinimum": 0
},
"tags": {
"description": "Tags for the product",
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
"uniqueItems": true
}
},
"required": [ "productId", "productName", "price" ]
}
- Both the
titleanddescriptionproperties serve as pure documentation and don’t impose any constraints on the data being validated. Although they aren’t required to be declared in the schema, you can use them for natural documentation in your code. - The
typekeyword enforces a validation rule: the input data must match the type specified in the schema. - A JSON object with three required properties (
productId,productName, andprice) is considered valid. An optionaltagsproperty can be included but isn’t listed in therequiredfield of the schema. - To disallow additional properties, set the additionalProperties keyword to
false.
More examples can be found in the miscellaneous-examples page of the documentation.
Exploring the python-jsonschema library
We will use python-jsonschema to validate the content of a Python dict. This library provides an implementation of the JSON Schema specification for Python.
pip install jsonschema
Here is an example taken directly from the documentation:
>>> from jsonschema import validate
>>> # A sample schema, like what we'd get from json.load()
>>> schema = {
"type" : "object",
"properties" : {
"price" : {"type" : "number"},
"name" : {"type" : "string"},
},
}
>>> # If no exception is raised by validate(), the instance is valid.
>>> validate(instance={"name" : "Eggs", "price" : 34.99}, schema=schema)
>>> validate(
>>> instance={"name" : "Eggs", "price" : "Invalid"}, schema=schema,
>>> )
Traceback (most recent call last):
...
ValidationError: 'Invalid' is not of type 'number'
Validating JSON Fields in Django Models
In this section, we will utilize the ease of the python-jsonschema library to validate the content of a JSONField before saving it into our database. To accomplish this task, we will create a custom Django model field named JSONSchemaField. This field will inherit from JSONField, maintaining its original behavior while applying an additional layer of validation using python-jsonschema.
Our goal is to define a user information model that stores complementary information about users. Here’s an example of the resulting model::
from django.db import models
from django.contrib.auth.models import User
from django.conf import settings
from .fields import JSONSchemaField
class UserInformation(models.Model)
user = models.OneToOneField(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
)
information = JSONSchemaField(
schema={
"type": "object",
"properties": {
"name": {"type": "string", "maxLength": 255},
"email": {"type": "string", "format": "email"},
"centers_of_interest": {
"type": "array",
"items": {
"type": "string",
"maxLength": 255,
},
},
},
"required": ["name", "email"],
"additionalProperties": False,
},
)
This simple model declare:
- A foreign key relationship with the
Usermodel. - An optional field to store extra information as
JSONdata.
Of course, it is just a simple use case for demonstration. You may have a more interesting one!
First, create a new file (which I will name fields.py) and place it in a Django app.
from django.db import models
from django.core import checks, exceptions
from jsonschema import SchemaError
from jsonschema import exceptions as jsonschema_exceptions
from jsonschema import validate
from jsonschema.validators import validator_for
class JSONSchemaField(models.JSONField):
"""
JSONField with a schema validation by using `python-jsonschema`.
Cf: https://python-jsonschema.readthedocs.io/en/stable/validate/
"""
def __init__(self, *args, **kwargs):
self.schema = kwargs.pop("schema", None)
super().__init__(*args, **kwargs)
def validate(self, value, model_instance):
return super().validate(value, model_instance)
def check(self, **kwargs):
return super().check(**kwargs)
Our new field declares:
- a
schemaargument. It will be used to declare the json schema to use at field level. - a
validate()function: it is responsible to validate the content to be stored in the field. This function is used by Django when validating forms, admin forms, DRF serializers … If you are unfamiliar with it, I recommend the reading of the Form and field validation documentation. - a
check()function: it is responsible to performs checks and to assert that the field is correctly implemented to your model. We use it to validate theschemaassociated to our field. Django’s system checks are performed at the startup of the server (and when you runpython manage.py check) and inform you about errors in your project. For this implementation, I take inspiration from Django’s CharField that do the same.
The checks
It’s recommended to start by implementing the checks. Implementing checks first can save you time by ensuring that your JSON schema is valid.
The check() method is prototyped like this:
- It take
**kwargsthat contains the parameters you declared for your field (as an example,max_lengthfor aCharField). So we will received ourschemain it. - It returns a list of
CheckMessageobjects (an empty list meaning no errors) that may have different level of severity (Debug,Info,Warning,Error,Critical). If thecheck()function returns messages with level greater or equal toError, then Django will prevent management commands from executing. Messages are reported to the console otherwise but won’t prevent the server to start.
Here’s an example implementation of our check function:
from jsonschema.validators import validator_for
class JSONSchemaField(models.JSONField):
def __init__(self, *args, **kwargs):
self.schema = kwargs.pop("schema", None)
super().__init__(*args, **kwargs)
@property
def _has_valid_schema(self) -> bool:
if not isinstance(self.schema, dict):
return False
# Determine validator class for the schema.
schema_cls = validator_for(self.schema)
try:
# Check the schema.
schema_cls.check_schema(self.schema)
except SchemaError:
return False
return True
def _check_schema_attribute(self):
if self.schema is None:
return [
checks.Error(
"JSONSchemaField must define a 'schema' attribute.", obj=self
)
]
elif not self._has_valid_schema:
return [
checks.Error("Given 'schema' is not a valid json schema.", obj=self)
]
else:
return []
def check(self, **kwargs):
return [*super().check(**kwargs), *self._check_schema_attribute()]
We perform the following checks on the schema attribute:
- Ensure that it is a
dict. - Ensure that it is a valid JSON schema.
- During this step, we call the validator_for() method. This function is able to retrieve the validator class appropriate for validating the given
schema. As an example, if you declare a $schema property in your jsonschema, it will use it to determine the appropriate class to be used to validate the format of your schema.
You can validate your implementation by running:
>>> python manage.py check
System check identified no issues (0 silenced).
Now, you can try to mess up with your JSON schema and rerun the command:
information = JSONSchemaField(
schema={
"type": "foo",
"properties": {
},
},
)
>>> python manage.py check
SystemCheckError: System check identified some issues:
ERRORS:
jsonschemafield.UserInformation.information: Given 'schema' is not a valid json schema.
System check identified 1 issue (0 silenced).
The validation process
It is now the moment to validate the input to be stored in our JSON Field.
To do this, we will override the content of the JSONSchemaField.validate() method and use the validate() method from the jsonschema library.
from jsonschema import validate
class JSONSchemaField(models.JSONField):
def validate(self, value, model_instance):
"""Validate the content of the json field."""
super().validate(value, model_instance)
try:
validate(instance=value, schema=self.schema)
except jsonschema_exceptions.ValidationError as e:
raise exceptions.ValidationError(
"Invalid json content: %(value)s",
code="invalid_content",
params={"value": e.message},
)
You can validate your implement by playing with the Django’s admin or by opening a shell:
>>> from myapp.models import UserInformation
>>> from django.contrib.auth import get_user_model
>>> User = get_user_model()
>>> user = User.objects.get(pk=42)
>>> user_information = UserInformation(user=user, information={'name': 'Joe', 'email': 'joe@fasterthan.fr'})
>>> user_information.full_clean()
>>> user_information = UserInformation(user=user, information={'name': 'Joe', 'email': 'joe@fasterthan.fr', 'foo': 'bar'})
>>> user_information.full_clean()
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[9], line 1
----> 1 user_information.full_clean()
File ~/.venvs/blog-entry-jsonschemafield/lib/python3.10/site-packages/django/db/models/base.py:1552, in Model.full_clean(self, exclude, validate_unique, validate_constraints)
1549 errors = e.update_error_dict(errors)
1551 if errors:
-> 1552 raise ValidationError(errors)
ValidationError: {'information': ["Invalid json content: Additional properties are not allowed ('foo' was unexpected)"]}
(Optional) Enforcing format consistency for properties
JSONSchema exposes a format keyword that you can use to validate the content of a property. As an example, it is possible to validate that a property is a valid email. Lets have a look of the schema we declared ealier:
information = JSONSchemaField(
schema={
"type": "object",
"properties": {
"name": {"type": "string", "maxLength": 255},
"email": {"type": "string", "format": "email"},
"centers_of_interest": {
"type": "array",
"items": {
"type": "string",
"maxLength": 255,
},
},
},
"required": ["name", "email"],
"additionalProperties": False,
},
)
You can see that the email property declare a format or type email. But, if we try to validate again our content with a malformed email:
>>> user_information = UserInformation(user=user, information={'name': 'Joe', 'email': 'bar'})
>>> user_information.full_clean()
no ValidationError is raised this time …
This is because the jsonschema specification do not enforce the validation of the format. So the python library follow this behavior: https://python-jsonschema.readthedocs.io/en/stable/validate/#validating-formats. It is then possible to validate the format of our properties by updating our code to give the format_checker argument to the validate() method:
class JSONSchemaField(models.JSONField):
def validate(self, value, model_instance):
super().validate(value, model_instance)
# Determine validator class for the schema ...
schema_cls = validator_for(self.schema)
try:
validate(
instance=value,
schema=self.schema,
# ... use it to validate the content of the json.
format_checker=schema_cls.FORMAT_CHECKER,
)
except jsonschema_exceptions.ValidationError as e:
raise exceptions.ValidationError(
"Invalid json content: %(value)s",
code="invalid_content",
params={"value": e.message},
)
You can now run the same code again:
>>> user_information = UserInformation(user=user, information={'name': 'Joe', 'email': 'bar'})
>>> user_information.full_clean()
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[7], line 1
----> 1 user_information.full_clean()
File ~/.venvs/blog-entry-jsonschemafield/lib/python3.10/site-packages/django/db/models/base.py:1552, in Model.full_clean(self, exclude, validate_unique, validate_constraints)
1549 errors = e.update_error_dict(errors)
1551 if errors:
-> 1552 raise ValidationError(errors)
ValidationError: {'information': ["Invalid json content: 'bar' is not a 'email'"]}
Please note that, as specified into the documentation of python-jsonschema:
- some formats require additional dependencies to be installed
- the
emailformat we used in this example don’t (as for other formats such asdateoripv4). - if required dependencies are not installed, then validation will succeed without throwing an error.
If you want to install extra dependencies in order to be able to use any format, then you can do it this way:
pip install jsonschema'[format]'
or
pip install jsonschema'[format-nongpl]'
Please note that if your app use softwares under the GPL licence it imply for it to be distributed under the GPL license. If you want to avoid that, then use the format-nongpl package.
Last thoughts
We arrive to the end of this article. Here is the final version of the code:
fields.py
from django.db import models
from django.core import checks, exceptions
from jsonschema import SchemaError
from jsonschema import exceptions as jsonschema_exceptions
from jsonschema import validate
from jsonschema.validators import validator_for
class JSONSchemaField(models.JSONField):
"""
JSONField with a schema validation by using `python-jsonschema`.
Cf: https://python-jsonschema.readthedocs.io/en/stable/validate/
"""
def __init__(self, *args, **kwargs):
self.schema = kwargs.pop("schema", None)
super().__init__(*args, **kwargs)
@property
def _has_valid_schema(self):
"""Check that the given `schema` is a valid json schema."""
schema_cls = validator_for(self.schema)
try:
schema_cls.check_schema(self.schema)
except SchemaError:
return False
return True
def check(self, **kwargs):
return [*super().check(**kwargs), *self._check_schema_attribute()]
def _check_schema_attribute(self):
"""Ensure that the given schema is a valid json schema during Django's checks."""
if self.schema is None:
return [
checks.Error(
"JSONSchemaField must define a 'schema' attribute.", obj=self
)
]
elif not self._has_valid_schema:
return [
checks.Error("Given 'schema' is not a valid json schema.", obj=self)
]
else:
return []
def validate(self, value, model_instance):
"""Validate the content of the json field."""
super().validate(value, model_instance)
schema_cls = validator_for(self.schema)
try:
validate(instance=value, schema=self.schema, format_checker=schema_cls.FORMAT_CHECKER)
except jsonschema_exceptions.ValidationError as e:
raise exceptions.ValidationError(
"Invalid json content: %(value)s",
code="invalid_content",
params={"value": e.message},
)
models.py
from django.db import models
from django.contrib.auth.models import User
from django.conf import settings
from .fields import JSONSchemaField
class UserInformation(models.Model):
user = models.OneToOneField(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
)
information = JSONSchemaField(
schema={
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"name": {"type": "string", "maxLength": 255},
"email": {"type": "string", "format": "email"},
"centers_of_interest": {
"type": "array",
"items": {
"type": "string",
"maxLength": 255,
},
},
},
"required": ["name", "email"],
"additionalProperties": False,
},
)
The sources of the project are available on github.
The code we produced will validate the content of your JSONField in forms, admin forms, drf serializers, … Depending on your project, you may want to enforce model’s validation at save. It can be done by calling full_clean() in the save() method of your Model.
class UserInformation(models.Model):
def save(self, *args, **kwargs):
super().full_clean()
super().save(*args, **kwargs)
Please note that the validation of JSONSchemaField is done on the application side (eg: with python code) and not on the database side. It mean that it won’t work on bulk create and update.
It is also possible to do more complex validation that those we’ve seen today. As an example, the following schema can be used to assert that a url starts with https://:
"url": {"type": "string", "format": "uri", "pattern": "^https://"},