A Repository Pattern for SQLAlchemy

There are certain idioms, ideas or patterns that I’ve encountered which have stuck with me, that I cart about as I muddle through the process of piecing together software systems.

One of those is the Clean Architecture proposed by Uncle Bob - which I stumbled across towards the end of my very first professional software project, and which helped me to rationalise a lot of what I didn’t like about that code: the intermingling of database access logic and business logic. Since then, I’ve taken it as something of a core belief that this is a big no-no and that Repository patterns are our route out of the coupling quagmire.

The Confusion

I was casting about for ideas on use of Repository patterns using the SQLAlchemy ORM, when I stumbled over this post by Mike Bayer, the creator of SQLAlchemy. There’s a claim in there as follows:

Repository - An interface that serves as the gateway to the database, in terms of object-relational mappings. This is the SQLAlchemy Session.

Which didn’t add up to me - if the system is to be independent of any particular framework for persistence and its business rules testable without requiring a database, it seems that code such as this could never be described as a Repository:

def get_user_by_email(db: Session, email: str) -> User:
    stmt = (
      select(User)
      .where(User.email == email)
    )
    users = db.execute(stmt).scalars().all()

    if not users:
        raise UserNotFoundError("No user found for email")

    if len(users) > 1:
        raise DuplicateEmailError("Multiple users found sharing an email")

    return users[0]

I think this becomes really clear when we look at what a test for such code would look like:

def test_get_user_by_email():
    # Set up the test database and add a test user
    db = create_test_db()
    test_email = "[email protected]"
    db.add(User(email=test_email))
    db.commit()

    # Call the function being tested
    user = get_user_by_email(db, test_email)

    # Assert the expected result
    assert user.email == test_email

where you can see that the test function is relying on database records existing in order that the behaviour of the business logic can be verified. As the complexity of the data model (inevitably) expands, authoring these tests becomes a considerable task unto itself.

How could it possibly be that the maintainer of the project doesn’t respect my deeply-held ideas on backend patterns?

Now, you might reasonably ask: who cares? Just whack the DB logic into the unit test and be done with it. It’s a fair question: after all, it’s the same direction in which Django pushes developers. For applications that are heavy on CRUD and light on business rules, removing an entire layer of indirection means less developer friction.

But it does leave me wondering: what could an injected Repository layer look like? Could we make SQLAlchemy more closely align with Uncle Bob’s Clean Architecture ideas? Or will arguing against the implementer of the library itself have me baking some proverbial humble pie?

Want the TL;DR? Skip to the final iteration to see how the Repository pattern can be applied.

A Repository Pattern for SQLAlchemy

With the foundations of my Engineering ideas on the line, I decided to go tinker with some options here. Thankfully, I’ve been working on a side project of late¹ which gave the chance to test drive my assertions!

It consists of a few, core components:

A base SQLAlchemy Model
A generic Repository class providing common data access methods
A set of Filter classes that can be applied to queries

Taking each of these in turn, I’ll outline the essence of the pattern (omitting certain details for the sake of brevity).

The BaseModel provides little by way of shared attributes - just a uid field - but is crucial as a type that we can refer to later on:

from sqlalchemy.orm import DeclarativeBase

class BaseModel(DeclarativeBase):
    __abstract__ = True

    uid: Mapped[str] = mapped_column(
        String(36), primary_key=True, default=lambda: str(uuid.uuid7()), index=True
    )

So far, so SQLAlchemy. Now is where we start to diverge: rather than passing around Session instances, we are instead going to introduce the Repository:

class Repository[T: BaseModel]:
    def __init__(self, db: Session, entity: type[T]) -> None:
        """Initialize repository with a database session and entity type.

        Args:
            db: SQLAlchemy session
            entity: The SQLAlchemy model class
        """
        self.db = db
        self.entity = entity

    def get_by_uid(self, uid: str) -> T:
        stmt = select(self.entity).where(self.entity.uid == uid)
        return self.db.execute(stmt).scalar_one()

    def get_filtered(
        self,
        filters: Sequence[Filter] | None = None,
        sort_field: InstrumentedAttribute[Any] | None = None,
        sort_order: SortOrder = SortOrder.ASC,
    ) -> List[T]:
        ...

And, finally, the Filter class:

from abc import ABC, abstractmethod

class Filter(ABC):

    field: str

    @abstractmethod
    def apply(
        self, stmt: Select[tuple[Any, ...]], entity: type[BaseModel]
    ) -> Select[tuple[Any, ...]]:
        """Apply this filter to a SQLAlchemy select statement.

        Args:
            stmt: The select statement to modify.
            entity: The entity class being queried.

        Returns:
            Modified select statement with filter applied.
        """

Now, we have all the ingredients to start writing our testable, framework-agnostic business logic. So what does this look like when it all comes together?

Let’s take the same example as before, querying for a user by email. For that, we’ll need our User table and a EqualsFilter filter that we can apply:

class User(BaseModel):
    __tablename__ = "user"

    email: Mapped[str] = mapped_column(String, unique=True, nullable=False, index=True)


class EqualsFilter[T](Filter):
    """Filter for exact equality comparison.

    Attributes:
            field: The field name to filter on.
            value: The value to compare for equality.

        Example:
            EqualsFilter(field="archived", value=False)
            # Generates: WHERE archived = False
    """

    value: T = Field(..., description="Value to compare for equality")

    def apply(
        self, stmt: Select[tuple[Any, ...]], entity: type[BaseModel]
    ) -> Select[tuple[Any, ...]]:
        """Apply equality filter to the statement."""
        column = getattr(entity, self.field)
        return stmt.where(column == self.value)

giving us the following source code:

def get_user_by_email(user_repo: Repository[User], email: str) -> User:
    users = user_repo.get_filtered(
        filters=[EqualsFilter(field="email", value=email)]
    )

    if not users:
        raise UserNotFoundError("No user found for email")

    if len(users) > 1:
        raise DuplicateEmailError("Multiple users found sharing an email")

    return users[0]

Neat! We’ve made the business logic independent² of the database access logic! So, what do our tests look like?

def test_get_user_by_email():
    # Set up the mock repository
    user_repo = MagicMock(spec=Repository[User])
    user_repo.get_filtered.return_value = [User(email="[email protected]")]

    # Call the function being tested
    user = get_user_by_email(user_repo, "[email protected]")

    # Assert the expected result
    assert user.email == "[email protected]"

Evolving the API

Now let’s add in an additional requirement: a user can only see their own or their Friend’s emails - a piece of permissions logic.

Implemented in pure SQLAlchemy, that might look as follows:

class Friend(BaseModel):
    __tablename__ = "friend"

    user_id: Mapped[str] = mapped_column(String(36), ForeignKey("user.uid"), nullable=False)
    friend_id: Mapped[str] = mapped_column(String(36), ForeignKey("user.uid"), nullable=False)


def get_user_by_email(db: Session, email: str, current_user_id: str) -> User:
    stmt = (
      select(User)
      .where(User.email == email)
      .where( 
          or_( 
              User.uid == current_user_id, 
              User.uid.in_( 
                  select(Friend.friend_id).where(Friend.user_id == current_user_id) 
              ), 
          ) 
      ) 
    )

    users = db.execute(stmt).scalars().all()

    if not users:
        raise UserNotFoundError("No user found for email")

    if len(users) > 1:
        raise DuplicateEmailError("Multiple users found sharing an email")

    return users[0]

Which isn’t so bad - but let’s suppose that this is one function among tens or hundreds that require this same permissions logic. We would have to either:

repeat the same join and where clauses in each of those functions, or
introduce implicit behaviour in the Session that applies these filters to all queries for the User model

My general observation is that permissions systems are amongst the hardest to understand and so benefit from clarity and explicitness in their implementation.

Contrasting this with the Repository pattern, we can simply add an additional filter³:

def get_user_by_email(user_repo: Repository[User], email: str, current_user_id: str) -> User:
    users = user_repo.get_filtered(
        filters=[
            EqualsFilter(field="email", value=email),
            OrFilter( 
                EqualsFilter(field="uid", value=current_user_id), 
                ExistsFilter( 
                    field="friends", 
                    filters=[EqualsFilter(field="user_id", value=current_user_id)], 
                ), 
            ), 
        ]
    )

    if not users:
        raise UserNotFoundError("No user found for email")

    if len(users) > 1:
        raise DuplicateEmailError("Multiple users found sharing an email")

    return users[0]

We might even go a step further and introduce a permissions_filters function that generates the necessary filters for us:

def permissions_filters(current_user_id: str) -> List[Filter]:
    return [
        OrFilter(
            EqualsFilter(field="uid", value=current_user_id),
            ExistsFilter(
                field="friends",
                filters=[EqualsFilter(field="user_id", value=current_user_id)],
            ),
        )
    ]

def get_user_by_email(user_repo: Repository[User], email: str, current_user_id: str) -> User:
    users = user_repo.get_filtered(
        filters=[
            EqualsFilter(field="email", value=email),
            *permissions_filters(current_user_id),  
        ]
    )
    ...

which gives us a pure permissions_filters function and a get_user_by_email function that are both completely independent of the database access logic!

Closing remarks

If we accept the idea that a great API is one that flexible to changes in requirements, then I’m inclined to argue for the extra layer of abstraction that this combination of Repository + Filters provides.

I’ve not yet pushed this idea far enough to be fully confident on this. It’s quite possible that it will run out of road in short order - who knows what additional limbs might sprout when I need several layers of JOINs or aggregations? What if the entities exposed by the API start to diverge wildly from the database models?

For now, I’m happy enough to be writing something that is vaguely PEP 20-compliant: explicit and readable - maybe even beautiful.

More on that to come soon! ↩
Ok, so it’s not completely independent: you’ll notice that I’m using the User model, rather than a “Plain Old Python Object” - since the SQLAlchemy model behaves in the same manner as a plain object, I’m taking the shortcut of not introducing a “mapping” class right now. ↩
The OrFilter and ExistsFilter implementations are left as an exercise for the reader 📝 ↩

A Repository Pattern for SQLAlchemy

The Confusion

A Repository Pattern for SQLAlchemy

Evolving the API

Closing remarks

Footnotes