Async Database Connection Pool Optimization Guide
This guide explains how to configure and optimize the asynchronous read-only database connection pool for high-performance scenarios. These optimizations are particularly important for production deployments expecting high traffic volumes.
Overview
Fides uses SQLAlchemy 1.4 (opens in a new tab) with asyncpg (opens in a new tab) for asynchronous database operations. The async read-only connection pool can be pre-warmed and configured to skip expensive rollback operations, resulting in significant performance improvements under load.
Related SQLAlchemy Documentation
- Connection Pooling (opens in a new tab) - Comprehensive guide to SQLAlchemy connection pooling
- QueuePool (opens in a new tab) - The pool implementation used by Fides (wrapped for async)
- Engine Configuration (opens in a new tab) - create_engine() and create_async_engine() parameters
Important: External Connection Pooling Strongly Recommended
For production deployments, it is strongly recommended to use an external database connection pooling technology such as:
- PgBouncer (opens in a new tab) - Lightweight connection pooler for PostgreSQL
- AWS RDS Proxy (opens in a new tab) - Managed connection pooler for AWS RDS/Aurora
- Azure Database for PostgreSQL built-in pooling (opens in a new tab)
- Google Cloud SQL Auth Proxy (opens in a new tab) with connection pooling
- Odyssey (opens in a new tab) - Advanced multi-threaded PostgreSQL connection pooler
Why External Pooling?
PostgreSQL uses a process-per-connection model, where each connection spawns a separate backend process. This architecture has important implications:
-
High Memory Overhead: Each PostgreSQL backend process consumes significant memory (typically 10-20MB base + working memory). With hundreds of connections across multiple Fides workers, this can quickly exhaust database server memory.
-
Connection Establishment Cost: Creating new PostgreSQL connections involves process forking, authentication, and initialization. While pre-warming helps on the application side, it doesn't solve the database-side resource consumption.
-
Scaling Challenges: Modern cloud deployments often run multiple pods/containers/workers. If each worker maintains a pool of 300 connections, and you have 10 workers, that's 3,000 database connections - far exceeding typical PostgreSQL limits.
-
Connection Multiplexing: External poolers maintain a small pool of persistent database connections (e.g., 50-100) and multiplex hundreds or thousands of application connections onto them. This dramatically reduces database server load while allowing applications to maintain large pools.
Architecture with External Pooling
┌─────────────────┐
│ Fides Worker 1 │──┐
│ (300 conns) │ │
└─────────────────┘ │
│ ┌──────────────┐ ┌────────────────┐
┌─────────────────┐ ├───▶│ PgBouncer │────────▶│ PostgreSQL │
│ Fides Worker 2 │──┤ │ (50 conns) │ │ (50 backends) │
│ (300 conns) │ │ └──────────────┘ └────────────────┘
└─────────────────┘ │
│
┌─────────────────┐ │
│ Fides Worker 3 │──┘
│ (300 conns) │
└─────────────────┘
Total app connections: 900
Total database backends: 50 (multiplexed)Configuration with External Pooling
When using an external connection pooler, you can safely configure larger application-side pools:
# Point to PgBouncer instead of directly to PostgreSQL
FIDES__DATABASE__SERVER=pgbouncer.internal.example.com
FIDES__DATABASE__PORT=6432
# Configure larger application pools (safe with external pooling)
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SIZE=300
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PREWARM=true
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SKIP_ROLLBACK=true
FIDES__DATABASE__ASYNC_READONLY_DATABASE_AUTOCOMMIT=trueImportant PgBouncer Configuration:
- Use transaction pooling mode (not session mode) for best multiplexing
- With transaction pooling, ensure you set
pool_skip_rollback=truein Fides (already recommended) - Be aware that transaction pooling has limitations with certain PostgreSQL features (prepared statements, LISTEN/NOTIFY, etc.)
When You Might Not Need External Pooling
External pooling may not be necessary if:
- You have a single Fides instance with modest traffic (fewer than 100 req/s)
- Your PostgreSQL server has abundant resources (high memory, low connection limit concerns)
- You're running a development/staging environment
- Total connections across all workers remain well under PostgreSQL's
max_connectionslimit
However, for production deployments expecting high traffic, external pooling is considered a best practice and will significantly improve both application and database performance.
Why These Optimizations Matter
Connection Establishment Cost
Establishing new database connections is an expensive operation, especially under high load. Each new connection requires:
- TCP handshake
- SSL/TLS negotiation (if enabled)
- Database authentication
- Session initialization
Under heavy traffic, these costs compound and can lead to:
- Increased latency for user requests
- Database connection exhaustion
- Cascading failures under load spikes
Pre-warming the connection pool solves this by establishing all connections at startup, ensuring they're immediately available when requests arrive.
Rollback Overhead
By default, SQLAlchemy performs a rollback operation when returning connections to the pool. For read-only queries, this rollback is unnecessary overhead that adds latency to every database operation.
Skipping rollback on return eliminates this unnecessary work for read-only operations, improving throughput and reducing latency.
Configuration for higher-throughput scenarios
All configuration is done via environment variables with the FIDES__DATABASE__ prefix.
Required Configuration
First, ensure you have a read-only database replica configured:
# Read-only database server (required for all settings below)
FIDES__DATABASE__READONLY_SERVER=readonly-db.example.com
FIDES__DATABASE__READONLY_PORT=5432
FIDES__DATABASE__READONLY_USER=fides_readonly
FIDES__DATABASE__READONLY_PASSWORD=your_password
FIDES__DATABASE__READONLY_DB=fidesRecommended Performance Settings
For optimal performance under load, configure these settings together:
# Enable connection pool pre-warming (RECOMMENDED)
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PREWARM=true
# Set pool size based on expected peak concurrent requests (start with 300)
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SIZE=300
# Disable rollback on connection return (RECOMMENDED for read-only)
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SKIP_ROLLBACK=true
# Enable autocommit for read-only operations (RECOMMENDED)
FIDES__DATABASE__ASYNC_READONLY_DATABASE_AUTOCOMMIT=true
# Allow overflow connections for traffic spikes
FIDES__DATABASE__ASYNC_READONLY_DATABASE_MAX_OVERFLOW=50
# Enable pre-ping to verify connection health
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PRE_PING=trueEnvironment Variables Explained
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PREWARM
Type: Boolean
Default: false
Recommended: true for production
What it does:
When enabled, Fides will establish all connections in the pool during application startup rather than lazily creating them on-demand.
SQLAlchemy Reference: While SQLAlchemy doesn't have built-in pre-warming, Fides implements this by checking out all connections and returning them to the pool. See Pool.connect() (opens in a new tab) for the underlying mechanism.
Why it's important:
- Eliminates connection establishment latency for initial requests
- Provides predictable performance from the moment the application starts
- Prevents connection storms during traffic spikes
- Allows you to detect connection issues at startup rather than during request handling
Trade-offs:
- Increases application startup time proportional to pool size
- Requires your database to handle all connections immediately
- All connections count against your database's max connection limit from startup
When to use:
Enable this for production environments with predictable traffic patterns where consistent low latency is critical.
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SIZE
Type: Integer
Default: 5
Recommended: Start with 300, adjust based on performance testing
What it does:
Sets the maximum number of concurrent database connections maintained in the pool. These connections are reused for multiple requests.
SQLAlchemy Reference: pool_size parameter (opens in a new tab) in create_engine() documentation. Note that SQLAlchemy pools start with zero connections and grow lazily to this size.
Why it's important:
- Determines how many concurrent read-only database operations can execute simultaneously
- Must be sized to handle peak concurrent request volume
- Too small: requests queue waiting for available connections, increasing latency
- Too large: wastes database resources and may exceed database connection limits
How to size it:
- Start with 300 connections as a baseline for high-traffic applications
- Monitor these metrics during load testing:
- Connection pool utilization (should not regularly reach 100%)
- Request latency (p95, p99 percentiles)
- Database connection count
- Adjust based on observations:
- Increase if: Pool exhaustion occurs, high latency during peak traffic
- Decrease if: Database connection limits approached, pool rarely exceeds 50% utilization
Formula for estimation:
pool_size ≈ (peak_requests_per_second × average_query_duration_seconds) × 1.5For example:
- 1000 req/s with 200ms average query time:
1000 × 0.2 × 1.5 = 300 connections
Important considerations:
- Each worker/pod needs its own pool, so total connections =
pool_size × number_of_workers - Your database must support total concurrent connections from all sources
- PostgreSQL default max_connections is typically 100-200 (often needs increasing)
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SKIP_ROLLBACK
Type: Boolean
Default: false
Recommended: true for read-only databases
What it does:
When enabled, SQLAlchemy skips executing ROLLBACK when returning connections to the pool by setting pool_reset_on_return=None.
SQLAlchemy Reference: Reset On Return (opens in a new tab) and pool_reset_on_return parameter (opens in a new tab) documentation.
Why it's important:
- Massive performance gain: Eliminates unnecessary round-trip to database for every query
- Read-only queries never modify data, so rollback serves no purpose
- Under high load, this can reduce database CPU usage by 20-30%
- Reduces connection return latency, improving overall throughput
Technical details:
By default, SQLAlchemy sets pool_reset_on_return='rollback' which executes:
ROLLBACK; -- Unnecessary for read-only operationsWith this setting enabled, connections are returned to the pool without any cleanup, trusting that read-only operations leave no state to clean up.
Safety:
This is safe for read-only replicas because:
- Read-only queries cannot modify data
- No transaction state needs to be cleaned up
- Subsequent queries will start fresh transactions anyway
When to use:
Always enable this for read-only database connections. Only disable if you're using the same connection pool for both read and write operations (not recommended).
FIDES__DATABASE__ASYNC_READONLY_DATABASE_AUTOCOMMIT
Type: Boolean
Default: false
Recommended: true for read-only databases
What it does:
Enables PostgreSQL autocommit mode by setting isolation_level='AUTOCOMMIT'.
SQLAlchemy Reference: Setting Transaction Isolation Levels including DBAPI Autocommit (opens in a new tab) and isolation_level parameter (opens in a new tab).
Why it's important:
- Eliminates explicit
BEGINandCOMMITstatements for read queries - Reduces overhead for simple SELECT queries
- Prevents long-running read transactions from holding resources
- Works synergistically with
pool_skip_rollbackfor maximum performance
Technical details:
Without autocommit, PostgreSQL wraps queries in transactions:
BEGIN;
SELECT * FROM users WHERE id = 1;
COMMIT;With autocommit enabled:
-- No BEGIN/COMMIT overhead
SELECT * FROM users WHERE id = 1;Trade-offs:
- Cannot use explicit transaction blocks in read-only queries
- May see slight inconsistencies if multiple queries in a request read data that changes between calls
When to use:
Enable for read-only replicas where you don't need transaction isolation across multiple queries in a single request.
FIDES__DATABASE__ASYNC_READONLY_DATABASE_MAX_OVERFLOW
Type: Integer
Default: 10
Recommended: 50-100 for production
What it does:
Allows the pool to create additional temporary connections beyond pool_size when all pool connections are in use. These overflow connections are closed (not pooled) after use.
SQLAlchemy Reference: max_overflow parameter (opens in a new tab) in create_engine() documentation.
Why it's important:
- Provides safety valve for unexpected traffic spikes
- Prevents request failures during brief bursts above normal capacity
- Overflow connections are temporary and released after use
How to size it:
Set to 15-20% of your pool size to handle bursts without excessive resource consumption.
Trade-offs:
- Overflow connections are not pre-warmed (must be established on-demand)
- Creating overflow connections is expensive (connection establishment cost)
- Too many overflow connections may exceed database limits
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PRE_PING
Type: Boolean
Default: true
Recommended: true
What it does:
Before using a connection from the pool, SQLAlchemy executes a lightweight query to verify the connection is still valid. This corresponds to SQLAlchemy's pool_pre_ping option (the same concept is used for the API async engine via api_async_engine_pool_pre_ping).
SQLAlchemy Reference: Disconnect Handling - Pessimistic (opens in a new tab) and pool_pre_ping parameter (opens in a new tab).
Why it's important:
- Detects stale connections (closed by database, network issues, timeouts)
- Prevents query failures from dead connections
- Automatically establishes new connection if pre-ping fails
Technical details:
Executes a simple query like SELECT 1 before each connection use.
Trade-offs:
- Adds minimal overhead (microseconds) per query
- Worth it to prevent errors from stale connections
- Disabling for performance: Pre-ping can be disabled to improve performance slightly by avoiding the extra check before each connection use. However, disabling it comes with a risk: connections that have been closed (e.g. by the database or due to network issues) may not be detected, so the pool may hand out stale connections. Those connections can silently fail or be used when already closed, which may lead to occasional errors. Only disable if you have very aggressive connection keepalives and can guarantee connections never go stale.
Example Configurations
High-Traffic Production (Recommended)
For applications with 500+ requests per second:
# Read-only replica configuration
FIDES__DATABASE__READONLY_SERVER=readonly.db.prod.internal
FIDES__DATABASE__READONLY_PORT=5432
FIDES__DATABASE__READONLY_USER=fides_readonly
FIDES__DATABASE__READONLY_PASSWORD=secure_password
# Optimized pool settings
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PREWARM=true
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SIZE=300
FIDES__DATABASE__ASYNC_READONLY_DATABASE_MAX_OVERFLOW=50
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SKIP_ROLLBACK=true
FIDES__DATABASE__ASYNC_READONLY_DATABASE_AUTOCOMMIT=true
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PRE_PING=trueMedium-Traffic Production
For applications with 100-500 requests per second:
FIDES__DATABASE__READONLY_SERVER=readonly.db.prod.internal
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PREWARM=true
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SIZE=100
FIDES__DATABASE__ASYNC_READONLY_DATABASE_MAX_OVERFLOW=25
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SKIP_ROLLBACK=true
FIDES__DATABASE__ASYNC_READONLY_DATABASE_AUTOCOMMIT=true
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PRE_PING=trueDevelopment/Staging
Smaller pool for non-production environments:
FIDES__DATABASE__READONLY_SERVER=readonly.db.staging.internal
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PREWARM=false
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SIZE=10
FIDES__DATABASE__ASYNC_READONLY_DATABASE_MAX_OVERFLOW=5
FIDES__DATABASE__ASYNC_READONLY_DATABASE_POOL_SKIP_ROLLBACK=true
FIDES__DATABASE__ASYNC_READONLY_DATABASE_AUTOCOMMIT=true
FIDES__DATABASE__ASYNC_READONLY_DATABASE_PRE_PING=truePerformance Testing & Tuning
Metrics to Monitor
Monitor these metrics to validate your configuration:
-
Connection Pool Utilization
- Should stay below 80% during normal operation
- Brief spikes to 100% are acceptable during traffic bursts
- Consistently at 100% means pool is undersized
-
Request Latency
- Track p50, p95, p99 response times
- Compare before/after optimization changes
- Look for latency spikes during high load
-
Database Connections
- Total active connections:
pool_size × number_of_workers - Monitor database connection count doesn't exceed limits
- Total active connections:
-
Database CPU/Memory
- Should decrease with
pool_skip_rollbackenabled - Monitor for resource exhaustion
- Should decrease with
Load Testing Procedure
- Baseline: Test with default settings (small pool, no optimizations)
- Enable skip_rollback: Measure improvement (typically 20-40% latency reduction)
- Increase pool_size: Increment by 50, test until latency stabilizes
- Enable prewarm: Verify startup time acceptable, measure cold-start elimination
- Validate under burst traffic: Test with traffic spikes to validate overflow settings
Signs You Need to Adjust
Increase pool_size if:
- Frequent "TimeoutError: Connection pool exhausted" errors
- High p95/p99 latency during peak traffic
- Connection pool utilization consistently above 80%
Decrease pool_size if:
- Database connection limits are being approached
- Pool utilization rarely exceeds 30-40%
- Database resource costs are concern
Disable prewarm if:
- Application startup time is unacceptably long
- Connection issues during startup
- Running in environments with very short-lived containers
Database Configuration Requirements
PostgreSQL Configuration
Ensure your PostgreSQL database can handle the connection load:
-- Check current max_connections setting
SHOW max_connections;
-- Typical production setting for high-traffic deployments
-- File: postgresql.conf
max_connections = 1000
-- Monitor current connection usage
SELECT count(*) FROM pg_stat_activity;Sizing max_connections:
max_connections ≥ (pool_size × number_of_fides_workers) + connections_for_other_services + 50Read Replica Configuration
For read-only replicas, ensure:
- Replication lag is acceptable for your use case
- Hot standby is enabled (PostgreSQL)
- User has appropriate read-only permissions
-- Create read-only user
CREATE USER fides_readonly WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE fides TO fides_readonly;
GRANT USAGE ON SCHEMA public TO fides_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO fides_readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO fides_readonly;Troubleshooting
Connection Pool Exhaustion
Symptoms:
asyncio.TimeoutError: Connection pool exhaustedSolutions:
- Increase
ASYNC_READONLY_DATABASE_POOL_SIZE - Increase
ASYNC_READONLY_DATABASE_MAX_OVERFLOW - Check for connection leaks (unclosed connections)
- Verify queries are not taking excessively long
Slow Startup
Symptoms:
Application takes minutes to start when prewarm is enabled.
Solutions:
- Reduce
ASYNC_READONLY_DATABASE_POOL_SIZE - Disable prewarm for development environments
- Check database connection latency/network issues
- Verify database can handle concurrent connection requests
Database Connection Limit Exceeded
Symptoms:
FATAL: remaining connection slots are reservedSolutions:
- Increase PostgreSQL
max_connectionssetting - Reduce pool_size across all workers
- Calculate total connections:
pool_size × workers + other_services
High Latency Despite Optimizations
Potential causes:
- Queries themselves are slow (check query performance)
- Database replica has high replication lag
- Network latency between Fides and database
- Database CPU/memory exhaustion
- Pool size still too small for traffic volume
Best Practices
- Always enable skip_rollback for read-only pools - Free performance win with no downsides
- Start with prewarm disabled, enable once pool size is properly tuned
- Size pool based on actual load testing, not guesswork
- Monitor pool utilization continuously in production
- Plan for 2x capacity to handle traffic spikes gracefully
- Configure database connection limits before enabling large pools
- Use separate pools for read-only and read-write operations (Fides does this by default)
- Test disaster recovery - what happens when database is unreachable?
Additional Resources
SQLAlchemy Documentation (Version 1.4)
- Connection Pooling (opens in a new tab) - Complete pooling guide
- Reset On Return (opens in a new tab) - Detailed explanation of reset behavior
- Dealing with Disconnects (opens in a new tab) - Pessimistic and optimistic disconnect handling
- Pool Events (opens in a new tab) - Hooks for pool lifecycle events
- Engine Configuration (opens in a new tab) - All create_engine() parameters
- Transaction Isolation Levels (opens in a new tab) - AUTOCOMMIT and other isolation levels
- Asynchronous I/O Support (opens in a new tab) - Async engine documentation
Database-Specific Documentation
- asyncpg Connection Pools (opens in a new tab) - asyncpg performance tuning
- PostgreSQL Connection Management (opens in a new tab) - PostgreSQL connection settings
- PostgreSQL max_connections (opens in a new tab) - Configuring connection limits
Related Fides Documentation
- Check
src/fides/config/database_settings.pyfor all available database configuration options - See
src/fides/api/db/ctl_session.pyfor the async connection pool implementation
Summary
For most production deployments expecting high traffic:
- Always enable:
pool_skip_rollback=trueandautocommit=true - Start with:
pool_size=300and adjust based on load testing - Enable in production:
prewarm=trueafter pool size is tuned - Monitor: Connection pool utilization, request latency, database connections
- Test thoroughly: Validate configuration under realistic load before deploying
These optimizations can reduce latency by 30-50% and significantly improve throughput under load.