Skip to content

Blog

From Scratch - Building a Pydantic Clone

A couple of months ago, I attended a Python talk featuring some of the most amazing open-source contributors in the entire Python ecosystem, including Sebastián Ramírez, the creator of FastAPI, Armin Ronacher, the creator of Flask, and Samuel Colvin, the creator of Pydantic. It was utterly inspiring to hear them talk about technology and their approach to development, but one moment in particular caught my attention. Sebastián Ramírez was praising Pydantic and recalled his first impression of using it: he thought it surely must be doing some crazy Python magic to be able to perform run-time type validation with built-in typing syntax. That made me realize how I took all sorts of Python libraries for granted and assumed they just "worked". I was immediately curious about how one might implement the functionality behind Pydantic and decided to recreate an MVP clone. In this post, I'll provide an overview of my solution and explore three relatively obscure Python topics in a practical context: descriptors, the __init_subclass__ special method, and metaclasses.

Here is the full project on Github if you're just interested in seeing the code.

Python Metaclasses - For Curiosity's Sake

Python metaclasses might seem arcane and utterly not worth your time learning. You might say: "surely I'll never need to use this". And to be frank, you almost certainly won't, and neither will I. Nonetheless, if you want to make sense of code you see in the wild in open-source libraries, such as Pydantic, which makes heavy use of the concept, or in CPython source code, it's worth knowing. And, one day you might come across a situation where metaclasses present the perfect solution, so you can never have too many tools in your tool belt (although I hope you don't start thinking everything looks like a nail because there will likely almost always be a simpler, more readable solution).

With that demoralising intro out of the way, let's first begin with a deeper look into how classes operate within Python.

Proofs of the Four Fundamental Equations Behind Back Propagation

When learning about how gradient descent and back propagation work, it can be tempting to take the underlying mathematics as a given and skip over the details. This is especially true with the level of abstraction that modern machine learning libraries like TensorFlow and PyTorch work at. However, a quote that I have observed to be true over and over again applies here:

"All non-trivial abstractions, to some degree, are leaky." - Joel Spolsky

I believe it always pays to understand the behind-the-scenes workings of the technologies we work with because at some point things will go wrong and we will need understanding at a deeper level (you can read more about the Law of Leaky Abstractions here).

Therefore, since back propagation is a foundational concept in machine learning, it is worth attaining a deep level of understanding of the constituting equations. For mathematics, this typically requires understanding the proofs of the equations.

All four proofs rely on the chain rule from multivariate calculus. If you are comfortable with the chain rule I recommend first attempting the proofs yourself.

The Single Underlying Principle of Clean Code

I didn't realise how many different opinions there were on what constitues clean code until I recently asked a series interview candidates "What do you consider to be clean code?". Each response was unique and often enlightening, and made me curious to reflect on my own definition.

Typically, when learning about clean code, we're presented with a checklist of principles to follow or pitfalls to avoid. Occasionally, we delve into fundamental concepts like "readability," though often without understanding their significance.

I'd like to share my perspective on what I believe is the core principle underlying all other clean code principles, particularly in Agile development, and illustrate this with a few examples.

Static Duck Typing in Python

At HeyJobs, we make heavy use of type hints in our Python codebases and include MyPy as a blocking check in our CI/CD builds. Static type checking can be invaluable for preventing bugs and improving code readability; however, it also has its own drawbacks, including removing some of the flexibility offered by duck typing. In this article, I walk through a work-around for limitation: static duck typing.