From Node.js to Go: Supercharging Sownloads of Thousands of Files as a Single Zip

Front page > Programming > From Node.js to Go: Supercharging Sownloads of Thousands of Files as a Single Zip

From Node.js to Go: Supercharging Sownloads of Thousands of Files as a Single Zip

Published on 2024-08-24

Browse:163

From Node.js to Go: Supercharging Sownloads of Thousands of Files as a Single Zip

As developers, we often face challenges when dealing with large-scale data processing and delivery. At Kamero, we recently tackled a significant bottleneck in our file delivery pipeline. Our application allows users to download thousands of files associated with a particular event as a single zip file. This feature, powered by a Node.js-based Lambda function responsible for fetching and zipping files from S3 buckets, was struggling with memory constraints and long execution times as our user base grew.

This post details our journey from a resource-hungry Node.js implementation to a lean and lightning-fast Go solution that efficiently handles massive S3 downloads. We'll explore how we optimized our system to provide users with a seamless experience when requesting large numbers of files from specific events, all packaged into a convenient single zip download.

The Challenge

Our original Lambda function faced several critical issues when processing large event-based file sets:

Memory Consumption: Even with 10GB of allocated memory, the function would fail when processing 20,000 files for larger events.
Execution Time: Zip operations for events with numerous files were taking too long, sometimes timing out before completion.
Scalability: The function couldn't handle the increasing load efficiently, limiting our ability to serve users with large file sets from popular events.
User Experience: Slow download preparation times were impacting user satisfaction, especially for events with substantial file counts.

The Node.js Implementation: A Quick Look

Our original implementation used the s3-zip library to create zip files from S3 objects. Here's a simplified snippet of how we were processing files:

const s3Zip = require("s3-zip");

// ... other code ...

const body = s3Zip.archive(
  { bucket: bucketName },
  eventId,
  files,
  entryData
);

await uploadZipFile(Upload_Bucket, zipfileKey, body);

While this approach worked, it loaded all files into memory before creating the zip, leading to high memory usage and potential out-of-memory errors for large file sets.

Enter Go: A Game-Changing Rewrite

We decided to rewrite our Lambda function in Go, leveraging its efficiency and built-in concurrency features. The results were astounding:

Memory Usage: Dropped from 10GB to a mere 100MB for the same workload.
Speed: The function became approximately 10 times faster.
Reliability: Successfully processes 20,000 files without issues.

Key Optimizations in the Go Implementation

1. Efficient S3 Operations

We used the AWS SDK for Go v2, which offers better performance and lower memory usage compared to v1:

cfg, err := config.LoadDefaultConfig(context.TODO())
s3Client = s3.NewFromConfig(cfg)

2. Concurrent Processing

Go's goroutines allowed us to process multiple files concurrently:

var wg sync.WaitGroup
sem := make(chan struct{}, 10) // Limit concurrent operations

for _, photo := range photos {
    wg.Add(1)
    go func(photo Photo) {
        defer wg.Done()
        sem 



This approach allows us to process multiple files simultaneously while controlling the level of concurrency to prevent overwhelming the system.


  
  
  3. Streaming Zip Creation


Instead of loading all files into memory, we stream the zip content directly to S3:



pipeReader, pipeWriter := io.Pipe()

go func() {
    zipWriter := zip.NewWriter(pipeWriter)
    // Add files to zip
    zipWriter.Close()
    pipeWriter.Close()
}()

// Upload streaming content to S3
uploader.Upload(ctx, &s3.PutObjectInput{
    Bucket: &destBucket,
    Key:    &zipFileKey,
    Body:   pipeReader,
})




This streaming approach significantly reduces memory usage and allows us to handle much larger file sets.


  
  
  The Results


The rewrite to Go delivered impressive improvements:



Memory Usage: Reduced by 99% (from 10GB to 100MB)

Processing Speed: Increased by approximately 1000%

Reliability: Successfully handles 20,000  files without issues

Cost Efficiency: Lower memory usage and faster execution time result in reduced AWS Lambda costs



  
  
  Lessons Learned




Language Choice Matters: Go's efficiency and concurrency model made a massive difference in our use case.

Understand Your Bottlenecks: Profiling our Node.js function helped us identify key areas for improvement.

Leverage Cloud-Native Solutions: Using AWS SDK for Go v2 and understanding S3's capabilities allowed for better integration and performance.

Think in Streams: Processing data as streams rather than loading everything into memory is crucial for large-scale operations.



  
  
  Conclusion


Rewriting our Lambda function in Go not only solved our immediate scaling issues but also provided a more robust and efficient solution for our file processing needs. While Node.js served us well initially, this experience highlighted the importance of choosing the right tool for the job, especially when dealing with resource-intensive tasks at scale.

Remember, the best language or framework depends on your specific use case. In our scenario, Go's performance characteristics aligned perfectly with our needs, resulting in a significantly improved user experience and reduced operational costs.

Have you faced similar challenges with serverless functions? How did you overcome them? We'd love to hear about your experiences in the comments below!

Release Statement This article is reproduced at: https://dev.to/hiteshsisara/from-nodejs-to-go-supercharging-s3-downloads-of-thousands-of-files-as-a-single-zip-474b?1If there is any infringement , please contact [email protected] to delete

Latest tutorial More>

The compiler error "usr/bin/ld: cannot find -l" solution
Error Encountered: "usr/bin/ld: cannot find -l"When attempting to compile a program, you may encounter the following error message:usr/bin/l...

Programming Posted on 2025-05-23
How Can I Handle UTF-8 Filenames in PHP's Filesystem Functions?
Handling UTF-8 Filenames in PHP's Filesystem FunctionsWhen creating folders containing UTF-8 characters using PHP's mkdir function, you may en...

Programming Posted on 2025-05-23
How to Efficiently Convert Timezones in PHP?
Efficient Timezone Conversion in PHPIn PHP, handling timezones can be a straightforward task. This guide will provide an easy-to-implement method for ...

Programming Posted on 2025-05-23
How to prevent duplicate submissions after form refresh?
Preventing Duplicate Submissions with Refresh HandlingIn web development, it's common to encounter the issue of duplicate submissions when a page ...

Programming Posted on 2025-05-23
What is the difference between nested functions and closures in Python
Nested Functions vs. Closures in PythonWhile nested functions in Python superficially resemble closures, they are fundamentally distinct due to a key ...

Programming Posted on 2025-05-23
The difference between PHP and C++ function overload processing
PHP Function Overloading: Unraveling the Enigma from a C PerspectiveAs a seasoned C developer venturing into the realm of PHP, you may encounter t...

Programming Posted on 2025-05-23
How to Check if an Object Has a Specific Attribute in Python?
Method to Determine Object Attribute ExistenceThis inquiry seeks a method to verify the presence of a specific attribute within an object. Consider th...

Programming Posted on 2025-05-23
Ubuntu 12.04 MySQL Local Connection Error Fix Guide

Programming Posted on 2025-05-23
How to Handle User Input in Java's Full-Screen Exclusive Mode?
Handling User Input in Full Screen Exclusive Mode in JavaIntroductionWhen running a Java application in full screen exclusive mode, the usual event ha...

Programming Posted on 2025-05-23
Effective checking method for Java strings that are non-empty and non-null
Checking if a String is Not Null and Not EmptyTo determine if a string is not null and not empty, Java provides various methods.Option 1: isEmpty()For...

Programming Posted on 2025-05-23
How do Java's Map.Entry and SimpleEntry simplify key-value pair management?
A Comprehensive Collection for Value Pairs: Introducing Java's Map.Entry and SimpleEntryIn Java, when defining a collection where each element com...

Programming Posted on 2025-05-23
How to Correctly Use LIKE Queries with PDO Parameters?
Using LIKE Queries in PDOWhen trying to implement LIKE queries in PDO, you may encounter issues like the one described in the query below:$query = &qu...

Programming Posted on 2025-05-23
Tips for finding element position in Java array
Retrieving Element Position in Java ArraysWithin Java's Arrays class, there is no direct "indexOf" method to determine the position of a...

Programming Posted on 2025-05-23
Eval() vs. ast.literal_eval(): Which Python Function Is Safer for User Input?
Weighing eval() and ast.literal_eval() in Python SecurityWhen handling user input, it's imperative to prioritize security. eval(), a powerful Pyth...

Programming Posted on 2025-05-23
Can CSS locate HTML elements based on any attribute value?
Targeting HTML Elements with Any Attribute Value in CSSIn CSS, it is possible to target elements based on specific attributes, as illustrated in the e...

Programming Posted on 2025-05-23