Skip to main content

PostgreSQL Engine Configuration

PostgreSQL is a realtime database that stores the materialized views and tables generated by your DataSQRL pipeline for low-latency querying.

Configuration Optionsโ€‹

No mandatory configuration keys are required. Physical DDL (tables, indexes, views) is produced automatically by the DataSQRL compiler.

Basic Configurationโ€‹

{
"engines": {
"postgres": {
"config": {
// Optional PostgreSQL-specific settings
}
}
}
}

Deployment Configurationโ€‹

PostgreSQL supports deployment-specific configuration for database scaling and high availability:

KeyTypeDefaultDescription
instance-sizestring-Database instance size for compute and memory
replica-countinteger-Number of read replicas (minimum: 0, maximum varies)

Instance Size Optionsโ€‹

Available instance-size options:

  • dev - Development/testing size with minimal resources
  • small - Small production workloads
  • medium - Medium production workloads
  • large - Large production workloads
  • xlarge - Extra large production workloads

Deployment Exampleโ€‹

{
"engines": {
"postgres": {
"deployment": {
"instance-size": "large",
"replica-count": 2
}
}
}
}

Conflict Handlingโ€‹

When a generated PostgreSQL table receives a row that conflicts with an existing row on a primary key or unique constraint, DataSQRL chooses one of the following conflict handling strategies:

StrategyPostgreSQL behaviorDescription
UpdateON CONFLICT (...) DO UPDATE SET ...The existing row is overwritten with the incoming row.
Timestamp updateON CONFLICT (...) DO UPDATE SET ... WHERE incoming timestamp > existing timestampThe existing row is overwritten only when the incoming row has a newer timestamp than the existing row. Otherwise, the conflicting insert is silently skipped, which prevents out-of-order events from rolling back newer state.
IgnoreON CONFLICT (...) DO NOTHINGThe existing row is kept, and the conflicting insert is silently skipped.

DataSQRL applies these strategies automatically based on the generated table type:

  • STATE tables use timestamp update with the appropriate timestamp column.
  • STREAM tables use ignore.
  • Other PostgreSQL tables use update.

Usage Notesโ€‹

  • Database schema is automatically generated from your SQRL script
  • Tables, indexes, and views are created based on the compiled data pipeline
  • Connection parameters are typically provided via environment variables
  • The engine handles both real-time data ingestion and query serving
  • Optimized for low-latency reads of materialized data
  • Read replicas improve query performance and provide redundancy

Internal Environment Variablesโ€‹

When running pipelines with the DataSQRL run command, the following environment variables are used in the configuration:

  • POSTGRES_VERSION
  • POSTGRES_HOST
  • POSTGRES_PORT
  • POSTGRES_DATABASE
  • POSTGRES_AUTHORITY
  • POSTGRES_JDBC_URL
  • POSTGRES_USERNAME
  • POSTGRES_PASSWORD