PostgreSQL & database concepts in general#
Architecture#
- PostgreSQL uses a process-based architecture (one OS process per connection).
- Uses WAL (Write-Ahead Log) to guarantee durability.
- Data stored in pages (8 KB) inside tablespaces.
- MVCC (Multi-Version Concurrency Control) enables high concurrency without read locks.
- Uses shared buffers + OS page cache together.
MVCC#
- Each row has xmin/xmax transaction IDs.
- No locks for reads → avoids read contention.
- Dead tuples accumulate → require VACUUM to clean them.
- MVCC ensures consistent snapshot isolation.
Indexes#
- Most common: B-Tree (default).
- Others: GIN (full-text search, JSONB), GiST (geospatial), BRIN (very large, sequential tables), HASH (rarely used).
- Too many indexes slow down writes.
- Reindex needed if index bloats.
Replication#
Native replication is physical WAL streaming.
Modes:
- Asynchronous (default, may lose data).
- Synchronous (strong durability, adds latency).
- Quorum synchronous (majority ACK).
Hot standby supports read-only queries.
Backup & Restore#
Logical backup: pg_dump / pg_restore → portable, slow, not point-in-time.
Physical backup: pg_basebackup → replication-compatible.
PITR (Point-In-Time Recovery) requires:
- Base backup
- Continuous WAL archiving
Tools: pgBackRest, wal-g, barman.
Performance Optimization#
Check slow queries with EXPLAIN / EXPLAIN ANALYZE.
Key parameters:
shared_buffers(25% RAM recommended)work_mem(per sort)maintenance_work_mem(VACUUM, CREATE INDEX)effective_cache_size(OS cache estimate)max_connections(keep low; use PG bouncers)
Enable autovacuum aggressively to prevent bloat.
Query Optimization#
- Avoid
SELECT *. - Use appropriate indexes for joins and WHERE.
- Watch sequential scans vs index scans.
- Ensure statistics are up-to-date (
ANALYZE).
Connection Management#
PostgreSQL processes are expensive → use pooling.
Tools:
- PgBouncer (lightweight pooling, session/transaction pooling)
- PgPool-II (pooling + load balancing + HA)
Avoid thousands of active connections → use poolers.
Locking#
- Row-level locks:
FOR UPDATE,FOR NO KEY UPDATE. - Table-level locks:
ACCESS EXCLUSIVEduring operations like ALTER TABLE. - Long transactions → MVCC bloat → autovacuum freeze issues.
High Availability#
Combine replication + failover tool:
- Patroni
- repmgr
- Stolon
Use etcd/Consul or Kubernetes for leader election.
Security#
Auth methods:
md5(deprecated)scram-sha-256(recommended)peer(local)- TLS support for encrypted transport
Row-level security (RLS) available.
Use separate users with least privilege.
Logging & Monitoring#
Enable slow query log (
log_min_duration_statement).Monitor:
- Cache hits
- Deadlocks
- Autovacuum activity
- Replication lag
- Checkpoint frequency
Tools:
- pg_stat_statements
- Prometheus exporters
- pganalyze
- pgBadger
Common Failure Patterns#
- Autovacuum disabled → massive bloat → downtime.
- Too many connections → OOM / CPU 100%.
- Missing WAL archiving → PITR impossible.
- Long-running transactions prevent VACUUM.
- Large DELETE → bloat → prefer partitioning or soft deletes.
General Database Concepts (Production-grade)#
ACID#
- Atomicity: All or nothing.
- Consistency: Rules must hold before/after transaction.
- Isolation: Transaction visibility control.
- Durability: Data survives crashes (WAL).
Normalization#
- Reduce redundancy.
- Common forms: 1NF, 2NF, 3NF.
- Helps integrity; sometimes denormalize for performance.
Transactions & Isolation Levels#
PostgreSQL isolation levels:
- Read Uncommitted → treated as Read Committed
- Read Committed (default)
- Repeatable Read
- Serializable (strongest)
Higher isolation = fewer anomalies = more locks.
Sharding / Partitioning#
Horizontal sharding → scale-out (complex).
PostgreSQL native partitioning types:
- RANGE, LIST, HASH.
Local indexes per partition improve performance.
Consistency Models#
- Strong consistency → synchronous replication.
- Eventual consistency → async replication.
- Trade-off often related to CAP theorem.
Types of Backups#
- Full / incremental.
- Logical / physical.
- PITR with WAL.
- Verify backups with regular restore tests.
Query Patterns#
- OLTP: small, frequent transactions.
- OLAP: heavy analytical queries.
- Use correct indexing + tuning depending on workload.
Caching Layers#
- Redis / Memcached for expensive queries.
- PostgreSQL relies heavily on OS page cache.
Common Troubleshooting#
- Slow query → check
pg_stat_activity, locks, EXPLAIN. - Replication lag → check network/I/O/WAL generation.
- Autovacuum not running → ensure tuning.
- High load → check index usage, connection spikes, I/O waits.