Multi-tenancy implementation with PostgreSQL: Learn through a simple real-world example
Learn how to implement multi-tenant architecture with PostgreSQL Row-Level Security (RLS) and database roles through a real-world example for secure data isolation between tenants.
In some of our previous articles, we delved into the concept of multi-tenancy and its applications in products and real-world business scenarios.
In this article, we'll explore how to implement a multi-tenant architecture for your application using PostgreSQL from a technical standpoint.
What is single-tenant architecture?
Single-tenant architecture refers to a software architecture where each customer has their own dedicated instance of the application and database.
In this architecture, each tenant's data and resources are completely isolated from other tenants.
What is multi-tenant architecture?
Multi-tenant architecture is a software architecture where multiple customers (tenants) share the same application instance and infrastructure while maintaining data isolation. In this architecture, a single instance of the software serves multiple tenants, with each tenant's data kept separate from others through various isolation mechanisms.
Single-tenant architecture vs multi-tenant architecture
Single-tenant architecture and multi-tenant architecture differ in aspects such as data isolation, resource utilization, scalability, management and maintenance, and security.
In single-tenant architecture, each customer has an independent data space, leading to lower resource utilization but relatively simpler for customization. Typically, single-tenant software tailored to specific customer needs, such as inventory systems for a particular fabric supplier or a personal blog web app. The commonality among them is that each customer occupies a separate instance of the application service, facilitating customization to meet specific requirements.
In a multi-tenant architecture, multiple tenants share the same underlying resources, resulting in higher resource utilization. However, it's crucial to ensure data isolation and security.
Multi-tenant architecture is often the preferred software architecture when service providers offer standardized services to different customers. These services typically have low levels of customization, and all customers share the same application instance. When an application requires an update, updating one application instance is equivalent to updating the application for all customers. For instance, CRM (Customer Relationship Management) is a standardized requirement. These systems typically use a multi-tenant architecture to provide the same service to all tenants.
Tenant data isolation strategies in multi-tenant architecture
In a multi-tenant architecture, all tenants share the same underlying resources, making the isolation of resources between tenants crucial. This isolation doesn't necessarily need to be physical; it simply requires ensuring that resources between tenants are not visible to each other.
In the design of the architecture, various degrees of resource isolation between tenants can be achieved:
In general, the more resources shared among tenants, the lower the cost of system iteration and maintenance. Conversely, the fewer shared resources, the higher the cost.
Starting multi-tenant implementation with a real-world example
In this article, we will use a CRM system as an example to introduce a simple yet practical multi-tenant architecture.
We recognize that all tenants use the same standard services, so we decided to have all tenants share the same basic resources, and we will implemented data isolation between different tenants at the database level using PostgreSQL's Row-Level Security.
Additionally, we will create a separate data connection for each tenant to facilitate better management of tenant permissions.
Next, we will introduce how to implement this multi-tenant architecture.
How to implement multi-tenant architecture with PostgreSQL
Add tenant identifier for all resources
In a CRM system, we will have a lot of resources and they're stored in different tables. For instance, customer information is stored in the customers
table.
Before implementing multi-tenancy, these resources are not associated with any tenant:
To differentiate the tenants owning different resources, we introduce a tenants
table to store tenant information (where db_user
and db_user_password
are used to store the database connection information for each tenant, will be detailed below). Additionally, we add a tenant_id
field to each resource to identify which tenant it belongs to:
Now, each resource is associated with a tenant_id
, theoretically enabling us to add a where
clause to all queries to restrict access to resources for each tenant:
At first glance, this seems simple and feasible. However, it will have the following issues:
- Almost every query will include this
where
clause, cluttering the code and making it harder to maintain, especially when writing complex join statements. - Newcomers to the codebase may easily forget to add this
where
clause. - Data between different tenants isn't truly isolated, as each tenant still has permissions to access data belonging to other tenants.
Therefore, we will not adopt this approach. Instead, we will use PostgreSQL' Row Level Security to address these concerns. However, before proceeding, we will create a dedicated database account for each tenant to access this shared database.
Setup DB roles for tenants
It’s a good practice to assign a database role to each user who can connect to the database. This allows for better control over each user's access to the database, facilitating isolation of operations between different users and improving system stability and security.
Since all tenants have the same database operation permissions, we can create a base role to manage these permissions:
Then, to differentiate each tenant role, a role inherited from the base role is assigned to each tenant upon creation:
Next, the database connection information for each tenant will be stored in the tenants
table:
id | db_user | db_user_password |
---|---|---|
x2euic | crm_tenant_x2euic | pa55w0rd |
This mechanism provides each tenant with its own database role, and these roles share the permissions granted to the crm_tenant
role.
We can then define the permission scope for tenants using the crm_tenant
role:
- Tenants should have CRUD access to all CRM system resource tables.
- Tables not related to CRM system resources should be invisible to tenants (assuming only
systems
table). - Tenants should not be able to modify the
tenants
table, and only theid
anddb_user
fields should be visible to them for querying their own tenant id when performing database operations.
Once the roles for tenants are set up, when a tenant requests access to the service, we can interact with the database using the database role representing that tenant:
Secure tenant data using PostgreSQL Row-Level Security
So far, we've established corresponding database roles for tenants, but this doesn't restrict data access between tenants. Next, we'll leverage PostgreSQL' Row-Level Security feature to limit each tenant's access to their own data.
In PostgreSQL, tables can have row security policies that control which rows can be accessed by queries or modified by data manipulation commands. This feature is also known as RLS (Row-Level Security).
By default, tables have no row security policies. To utilize RLS, you need to enable it for the table and create security policies that execute every time the table is accessed.
Taking the customers
table in the CRM system as an example, we'll enable RLS and create a security policy to restrict each tenant only have access to their own customers' data:
In the statement creating the security policy:
for all
(optional) indicates that this access policy will be used forselect
,insert
,update
, anddelete
operations on the table. You can specify an access policy for specific operations usingfor
followed by the command keyword.to crm_tenant
indicates that this policy applies to users with the database rolecrm_tenant
, meaning all tenants.as restrictive
specifies the enforcement mode of the policy, indicating that access should be strictly limited. By default, a table can have multiple policies, multiplepermissive
policies will be combined with anOR
relationship. In this scenario, we declare this policy asrestrictive
because we want this policy check to be mandatory for users belonging to CRM system tenants.using
expression defines the conditions for actual access, restricting the current querying database user to only view data belonging to their respective tenant. This constraint applies to rows selected by a command (select
,update
, ordelete
).with check
expression defines the constraint necessary when modifying data rows (insert
orupdate
), guaranteeing that tenants can only add or update records for themselves.
Using RLS to constrain tenant access to our resource tables offers several benefits:
- This policy effectively adds
where tenant_id = (select id from tenants where db_user = current_user)
to all query operations (select
,update
, ordelete
). For instance, when you executeselect * from customers
, it's equivalent to executingselect * from customers where tenant_id = (select id from tenants where db_user = current_user)
. This eliminates the need to explicitly addwhere
conditions in the application code, simplifying it and reducing the likelihood of errors. - It centrally controls data access permissions between different tenants at the database level, mitigating the risk of vulnerabilities or inconsistencies in the application, thus enhancing system security.
However, there are some points to note:
- RLS policies are executed for every row of data. If the query conditions within the RLS policy are too complex, it could significantly impact system performance. Fortunately, our tenant data check query is simple enough and won't affect performance. If you plan to implement other functionalities using RLS later, you can follow the Row-Level Security performance recommendations from Supabase to optimize RLS performance.
- RLS policies don't automatically populate
tenant_id
duringinsert
operations. They only restrict tenants to inserting their own data. This means that when inserting data, we still need to provide the tenant ID, which is inconsistent with the querying process and can lead to confusion during development, increasing the likelihood of errors (this will be addressed in subsequent steps).
In addition to the customers
table, we need to apply the same operations to all CRM system resource tables (this process may be a bit tedious, but we can write a program to configure it during table initialization), thus isolating data from different tenants.
Create trigger function for data inserting
As mentioned earlier, RLS (Row-Level Security) allows us to execute queries without worrying about the existence of tenant_id
, as the database handles it automatically. However, for insert
operations, we still need to manually specify the corresponding tenant_id
.
To achieve a similar convenience as RLS for data insertion, we need the database be able to handle tenant_id
automatically during data insertion.
This has a clear benefit: at the application development level, we no longer need to consider which tenant the data belongs to, reducing the likelihood of errors and easing our mental burden when developing multi-tenant applications.
Fortunately, PostgreSQL provides powerful trigger functionality.
Triggers are special functions associated with tables that automatically execute specific actions (such as insert, update, or delete) when performed on the table. These actions can be triggered at the row level (for each row) or statement level (for the entire statement). With triggers, we can execute custom logic before or after specific database operations, allowing us to easily achieve our goal.
First, let's create a trigger function set_tenant_id
to be executed before each data insertion:
Next, associate this trigger function with the customers
table for insert operations (similar to enabling RLS for a table, this trigger function needs to be associated with all relevant tables):
This trigger ensures that inserted data contains the correct tenant_id
. If the new data already includes a tenant_id
, the trigger function does nothing. Otherwise, it automatically populates the tenant_id
field based on the information of the current user in the tenants
table.
This way, we achieve automatic handling of tenant_id
at the database level during data insertion by tenants.
Summary
In this article, we delve into the practical application of multi-tenant architecture, using a CRM system as an example to demonstrate a practical solution utilizing PostgreSQL database.
We discuss database role management, access control, and PostgreSQL's Row-Level Security feature to ensure data isolation between tenants. Additionally, we utilize trigger functions to reduce the cognitive burden of developers in managing different tenants.
That's all for this article. If you want to further enhance your multi-tenant application with user access management, you can refer to An easy guide to begin with Logto organizations - for building a multi-tenant app for more insights.