Automating Snowflake Data Ingestion from AWS S3 Using Event-Driven Lambda Triggers
Dara Bindara
1. Executive Summary
Modern data platforms increasingly require near real-time ingestion pipelines that can automatically process incoming data without manual orchestration. Traditional batch pipelines that rely on scheduled jobs introduce latency, operational complexity, and unnecessary compute costs.
Recommended approach / pattern
Implement an event-driven ingestion architecture where new files arriving in AWS S3 automatically trigger an AWS Lambda function, which then loads the data into Snowflake using the COPY INTO command.
Where it fits (best use cases)
This architecture is particularly effective for:
- Data platforms receiving frequent file uploads into S3
- Systems requiring near real-time ingestion
- Organizations wanting to eliminate manual ingestion orchestration
- Workloads where ingestion must scale automatically with file arrival volume
Key outcomes
- Automated Snowflake ingestion without polling or scheduled jobs
- Near real-time data availability in Snowflake
- Reduced operational overhead
- Scalable ingestion architecture using serverless infrastructure
What the reader can implement
After reading this article, data engineers can implement:
- An event-driven ingestion pipeline
- Automated Snowflake loading from S3
- Lambda-based ingestion orchestration
- A scalable serverless ingestion architecture
2. Background
Enterprise data pipelines have evolved significantly with the adoption of cloud-native data platforms. Traditional ingestion pipelines often follow a scheduled ETL pattern, where jobs run every hour or every day to check for new files and process them. While this approach works, it introduces several inefficiencies:
- Compute resources run even when no data arrives
- Data freshness depends on schedule frequency
- Operations teams must maintain orchestration workflows
With cloud infrastructure, event-driven architectures provide a more efficient model. Instead of continuously checking for new files, systems react only when events occur.
In AWS environments, S3 events can trigger Lambda functions automatically whenever a new file is uploaded. When integrated with Snowflake, this enables a powerful pattern:
- A file arrives in S3
- S3 generates an event notification
- AWS Lambda is triggered
- Lambda executes Snowflake loading commands
- Data becomes available in Snowflake immediately
This architecture eliminates the need for traditional schedulers and enables low-latency data ingestion.
3. Problem
3.1 Symptoms
Organizations that rely on scheduled ingestion pipelines typically experience several recurring challenges.
Symptom 1 — Delayed Data Availability
Batch pipelines introduce delays between when data arrives and when it becomes available in Snowflake.
Symptom 2 — Operational Complexity
Teams maintain complex orchestration frameworks such as Airflow DAGs, Cron jobs, or Custom ingestion scripts.
Symptom 3 — Inefficient Resource Usage
Scheduled jobs often run even when no new data is available, wasting compute resources.
3.2 Impact
These limitations create both technical and business challenges:
- Slower analytics due to delayed data ingestion
- Increased operational overhead for data teams
- Higher infrastructure costs
- Reduced agility for real-time analytics use cases
Event-driven ingestion addresses these issues by automating ingestion only when new data arrives.
4. Requirements & Assumptions
4.1 Data Characteristics & Operational Context
Typical ingestion environments using this architecture exhibit the following characteristics:
Data scale
- Files ranging from MBs to several GBs
- Thousands of files arriving daily
Refresh frequency
- Data may arrive continuously or in bursts
Environment structure
Most organizations deploy separate environments:
- Development
- UAT
- Production
Each environment may use separate S3 buckets and Snowflake databases.
4.2 Security & Access Control
Security considerations include:
- AWS IAM roles controlling Lambda permissions
- Snowflake roles managing data access
- Secure credential storage using AWS Secrets Manager
Lambda functions should authenticate to Snowflake using secure credentials rather than hardcoded passwords.
4.3 Tooling & Constraints
This architecture leverages several AWS and Snowflake services.
Key technologies include:
- AWS S3 for file storage
- AWS Lambda for serverless event processing
- Snowflake Cloud Data Platform
- Snowflake External Stage
- Snowflake COPY INTO command
This combination enables a fully automated ingestion pipeline.
Several practical constraints must be considered.
Lambda execution limits
- Maximum execution time: 15 minutes
- Memory limits depending on configuration
File size considerations
Very large files may require batching or chunked ingestion.
Snowflake warehouse availability
A virtual warehouse must be available to process ingestion commands.
5. Recommended Architecture
5.1 High-Level Flow
The event-driven ingestion pipeline follows this workflow:
- A file is uploaded to an AWS S3 bucket
- S3 generates an event notification
- The event triggers an AWS Lambda function
- Lambda retrieves file metadata
- Lambda connects to Snowflake
- Lambda executes a COPY INTO command
- Snowflake loads the data into the target table
This approach ensures ingestion occurs immediately after file arrival.
5.2 Architecture Diagram
5.3 Options
Option A — Scheduled Ingestion
Many pipelines use schedulers such as Airflow to periodically ingest files.
Advantages
- Easy to implement
- Widely used
Disadvantages
- Higher latency
- Unnecessary compute usage
- Operational overhead
Option B — Event-Driven Ingestion (Recommended)
S3 events trigger Lambda functions automatically.
Advantages
- Near real-time ingestion
- Reduced operational overhead
- Scales automatically with data arrival
Selection Guide
Organizations requiring real-time or near real-time ingestion should strongly prefer event-driven pipelines.
6. Implementation
6.1 Setup
Core resources required:
AWS components
- S3 bucket
- Lambda function
- IAM role
- S3 event notifications
Snowflake components
- Database and schema
- Target table
- External stage
- Virtual warehouse
6.2 Core Build Steps
Step 1 — Create S3 Bucket
Create an S3 bucket to store incoming data files.
Step 2 — Create Snowflake External Stage
Define an external stage pointing to the S3 bucket.
Step 3 — Configure Target Table
Create a Snowflake table to store ingested data.
Step 4 — Create AWS Lambda Function
The Lambda function will:
- Receive S3 event notifications
- Extract file path
- Connect to Snowflake
- Execute COPY command
Step 5 — Configure S3 Event Notification
Configure the S3 bucket to trigger Lambda when a new file is uploaded.
6.3 Configuration Defaults
Recommended defaults include:
File format definition
Define file formats explicitly in Snowflake.
Error handling
Use COPY options: ON_ERROR = 'CONTINUE'
Logging
Lambda should log ingestion events for monitoring.
7. Validation & Testing
Testing ensures ingestion works reliably and safely.
Validation focuses on:
- Data ingestion correctness
- File detection reliability
- Snowflake load success
7.1 Ingestion Validation
Test cases include:
- Upload a file to S3
- Verify Lambda execution
- Verify Snowflake table ingestion
7.2 Data Validation
Validate:
- Row counts
- Column mappings
- Data format consistency
7.3 Failure Testing
Test failure scenarios such as:
- Invalid file formats
- Missing columns
- Snowflake connection failures
8. Security & Access
Required permissions include:
AWS permissions
- Lambda execution role
- S3 read permissions
Snowflake permissions
- USAGE on database
- USAGE on stage
- INSERT privileges on target table
9. Performance & Cost
9.1 Performance Considerations
Performance depends on:
- Snowflake warehouse size
- File size
- Number of concurrent files
Best practices include:
- Use compressed files
- Batch small files
- Enable auto-scaling warehouses
9.2 Cost Drivers
Primary cost components include:
Compute
Snowflake virtual warehouse usage
Serverless compute
AWS Lambda execution time
Storage
S3 file storage
9.3 Cost Controls
Recommended controls include:
- Warehouse auto-suspend
- Lambda memory optimization
- File batching strategies
10. Operations & Monitoring
10.1 What to Monitor
Key operational metrics include:
- Lambda execution failures
- Snowflake load errors
- Data ingestion latency
10.2 Alerting
Recommended alerts include:
- Lambda failure notifications
- Snowflake COPY errors
- S3 event delivery failures
10.3 Runbook (Top Issues)
Issue: Lambda fails to connect to Snowflake
Fix: Verify credentials and network configuration
Issue: Data not loading
Fix: Check stage configuration and file format
Issue: Duplicate ingestion
Fix: Implement idempotent load logic
11. Common Pitfalls
Pitfall 1
Triggering Lambda for every tiny file.
Pitfall 2
Not handling duplicate file ingestion.
Pitfall 3
Using oversized Lambda functions.
Pitfall 4
Ignoring Snowflake warehouse scaling.
Pitfall 5
Not validating file formats before ingestion.
12. Variations / Use Cases
Variation 1 — Snowpipe Integration
Use Snowpipe with S3 notifications for fully managed ingestion.
Variation 2 — Streaming Pipelines
Combine with Kafka or Kinesis for real-time event streaming.
Variation 3 — Metadata Tracking
Maintain ingestion logs in Snowflake for auditability.
Variation 4 — Data Quality Integration
Add validation frameworks like Great Expectations before ingestion.

Dara Bindara
Associate Data Engineer
Boolean Data Systems

Dara Bindara is a Associate Data Engineer specializing in building and optimizing cloud-based data pipelines. Experienced in Python, SQL, PySpark, Snowflake Cortex, and AI/ML workflows, with a focus on ETL automation, large-scale data transformation, and scalable data warehousing.
About Boolean Data
Systems
Boolean Data Systems is a Snowflake Premier Partner that implements solutions on cloud platforms. We help enterprises make better business decisions with data and solve real-world business analytics and data challenges.
Services and
Offerings
Solutions &
Accelerators
Global
Head Quarters
USA - Atlanta
3970 Old Milton Parkway,
Suite #200, Alpharetta, GA 30005
Ph. : 770-410-7770
Fax : 855-414-2865