PostgreSQL to Snowflake Using OpenFlow 

Pooja Peddapally

Snowflake RBAC Management with Streamlit

1. Executive Summary 

  • Problem: Organizations often struggle to move operational PostgreSQL data into Snowflake efficiently while maintaining freshness and reliability. 
  • Recommended Approach: Use Snowflake OpenFlow with the PostgreSQL connector to capture database changes and ingest them into Snowflake. 
  • Where it fits: Operational analytics, reporting pipelines, and real-time data synchronization use cases. 
  • Key outcomes: Simplified ingestion architecture, low latency replication, improved pipeline reliability, and reduced operational overhead. 
  • What the reader can implement: A practical PostgreSQL → Snowflake ingestion pipeline using OpenFlow with logical replication. 

2. Background 

Modern organizations typically separate operational databases from analytical systems. PostgreSQL is commonly used to support transactional workloads, while Snowflake is designed for large-scale analytics and reporting. 

To enable analytics, operational data must be continuously ingested into the data warehouse. Traditional ETL pipelines often require multiple tools, custom scripts, and orchestration frameworks. These pipelines can become difficult to maintain as data volume grows. 

Snowflake OpenFlow simplifies this process by providing a native data ingestion framework capable of capturing changes from external systems and loading them directly into Snowflake. 

3. Problem 

3.1 Symptoms 

  • Data pipelines require multiple ETL tools and complex orchestration. 
  • Operational data arrives in Snowflake with significant delays. 
  • Pipeline failures require manual intervention and debugging. 

3.2 Impact 

Delayed data ingestion reduces the effectiveness of dashboards, reporting, and analytics. Data engineering teams also spend significant time maintaining ETL pipelines instead of building new data products. A scalable and reliable ingestion framework becomes necessary as data systems grow. 

4. Requirements & Assumptions 

4.1 Data & SLA 

  • Data Volume: Millions of rows per month depending on application workload 
  • Freshness/SLA: Near real-time or frequent ingestion 
  • Environments: Development, staging, and production 

4.2 Security & Compliance 

  • Data Sensitivity: Typically non-PII operational data, though sensitive fields may exist
  • Access Model: Snowflake role-based access control (RBAC)

4.3 Tooling & Constraints 

  • Ingestion tool: Snowflake OpenFlow
  • Source database: PostgreSQL hosted on AWS RDS
  • Constraints: Network connectivity, logical replication configuration, and schema evolution handling 

5. Recommended Architecture 

5.1 High-Level Flow 

  1. PostgreSQL stores operational data. 
  2. Logical replication captures database changes. 
  3. OpenFlow runtime connects to PostgreSQL. 
  4. PostgreSQL connector reads table changes. 
  5. Processors transform and stream the data. 
  6. Data is ingested into Snowflake tables. 
  7. Snowflake enables analytics and reporting. 

5.2 Architecture Diagram 

Architecture Diagram

PostgreSQL (RDS) → OpenFlow Runtime → Processors → Snowflake Tables → BI / Analytics

5.3 Options 

Option A: Snapshot Ingestion 

  • Full table extraction 
  • Used for initial data loads 

Option B: CDC Ingestion 

  • Change Data Capture using logical replication 
  • Used for continuous updates 

Selection Guide 

  • Snapshot ingestion is suitable for initial migrations 
  • CDC ingestion is preferred for ongoing data synchronization 

6. Implementation 

6.1 Setup 

Snowflake environment setup includes: 

  • Creating database and schema 
  • Configuring warehouse for processing 
  • Creating roles for OpenFlow access 
  • Configuring network rules and external access integrations 

External configuration includes: 

  • PostgreSQL database running on AWS RDS 
  • Logical replication enabled 
  • Network connectivity between Snowflake and PostgreSQL 

6.2 Core Build Steps 

  1. Configure PostgreSQL logical replication settings. 
  2. Deploy OpenFlow runtime and PostgreSQL connector. 
  3. Configure connection parameters (host, port, credentials). 
  4. Select tables to replicate. 
  5. Enable processors and validate the data pipeline. 
Architecture Diagram

6.3 Configuration Defaults 

  • Watermark Strategy: CDC via PostgreSQL replication logs 
  • Deduplication Key: Primary key of source tables 
  • Load Frequency: Continuous ingestion
  • Error Handling: Connector retries and runtime monitoring  

7. Validation & Testing 

7.1 Data Validation 

Row Count Validation
Compare row counts between PostgreSQL and Snowflake tables. 

Example:

SELECT COUNT(*) FROM postgres_table;
SELECT COUNT(*) FROM snowflake_table;

Duplicate Checks

SELECT id, COUNT(*)
FROM snowflake_table
GROUP BY id
HAVING COUNT(*) > 1;

Freshness Checks 
Verify the latest ingestion timestamp matches the source database. 

7.2 Reconciliation 

Periodic reconciliation can be performed by comparing: 

  • Row counts 
  • Primary key distributions 
  • Update timestamps 

8. Security & Access 

  • Required Snowflake permissions: Database usage, schema access, and table creation privileges 
  • Secret management: Credentials stored securely within Snowflake integrations 
  • Separation of duties: Data engineers configure pipelines while analysts consume data 
  • Auditability: Snowflake query history and runtime logs provide monitoring and traceability 

9. Performance & Cost 

9.1 Performance Considerations 

  • Use appropriate warehouse sizes depending on ingestion volume 
  • Monitor runtime performance and connector throughput 

9.2 Cost Drivers 

  • Compute cost from Snowflake warehouses 
  • Storage cost for ingested data 
  • Data transfer costs if cross-region connections are used 

9.3 Cost Controls 

  • Configure auto-suspend warehouses 
  • Use resource monitors 
  • Optimize ingestion batch sizes 

10. Operations & Monitoring 

10.1 What to Monitor 

  • Pipeline success or failure rate 
  • Data freshness 
  • Unexpected changes in data volume 
  • Runtime health and connector status 

10.2 Alerting 

Alerts can notify administrators when pipelines fail or ingestion latency increases. 

10.3 Runbook 

Common issues: 

Driver error  Upload correct PostgreSQL driver 
Permission error  Reapply Snowflake role privileges 
Connection failure  Verify network rules and security groups 

Architecture Diagram

11. Common Pitfalls 

  • Missing Snowflake privileges for OpenFlow runtime
  • Incorrect PostgreSQL replication settings
  • Network connectivity issues
  • Large initial snapshot loads
  • Schema changes breaking ingestion pipelines

12. Variations / Use Cases 

  • PostgreSQL CDC pipelines for operational analytics
  • API ingestion pipelines using OpenFlow
  • Real-time analytics dashboards
  • Machine learning pipelines requiring fresh operational data

13. Next Steps 

Organizations implementing Snowflake ingestion pipelines should: 

  • Evaluate CDC strategies for operational databases
  • Implement monitoring and alerting
  • Optimize warehouse sizing for ingestion workloads

Pooja Peddapally

Associate Data Engineer

Boolean Data Systems


Aspiring Associate Data Engineer with hands-on experience in designing and supporting cloud-based data pipelines and analytics workflows. Skilled in building scalable data solutions, streamlining data processing, and enabling meaningful business insights. Passionate about modern data engineering, efficient data systems, and scalable cloud architectures. 

About Boolean Data
Systems

Boolean Data Systems is a Snowflake Select Services partner that implements solutions on cloud platforms. we help enterprises make better business decisions with data and solve real-world business analytics and data problems.

Global
Head Quarters

USA - Atlanta
3970 Old Milton Parkway,
Suite #200, Alpharetta, GA 30005
Ph. : 770-410-7770
Fax : 855-414-2865

Boolean Data is SOC 2 Type 1 compliant
All rights reserved – Boolean Data Systems