Module 9: Map/Reduce Scripts

Week 5 • NetSuite SuiteScript 2.0 Training • ~90 minutes

🎯 Learning Objectives

1. Map/Reduce Overview

Map/Reduce is the most powerful script type for processing thousands of records. It breaks work into stages that can run in parallel across multiple queues.

💡 Key Advantages
  • Automatic Governance - System handles checkpoints
  • Distributed Processing - Uses all available queues
  • Built-in Error Handling - Errors don't stop the whole script
  • Yield Control - Can pause to let other scripts run

2. The Four Stages

getInputData → map → reduce → summarize
     ↓           ↓       ↓          ↓
   Get data   Process  Aggregate  Report
   to work    each     by key     results
   with       record

Stage 1: getInputData

Returns the data to process. Can return a search, array, or object.

function getInputData() {
    // Return a search - most common
    return search.create({
        type: search.Type.CUSTOMER,
        filters: [['email', 'isnotempty', '']],
        columns: ['entityid', 'email']
    });
    
    // Or return an array
    // return [1, 2, 3, 4, 5];
    
    // Or return an object
    // return { key1: 'value1', key2: 'value2' };
}

Stage 2: map

Called once per input. Process individual items here.

function map(context) {
    // context.key = index or key
    // context.value = the data (JSON string if from search)
    
    var data = JSON.parse(context.value);
    var customerId = data.id;
    var email = data.values.email;
    
    // Do processing...
    
    // Optionally write to reduce stage
    context.write({
        key: 'processed',
        value: customerId
    });
}

Stage 3: reduce

Groups values by key from map stage. Use for aggregations.

function reduce(context) {
    // context.key = the key written in map
    // context.values = array of all values with that key
    
    var count = context.values.length;
    log.debug('Reduce', 'Key: ' + context.key + ', Count: ' + count);
    
    // Write to summarize
    context.write({
        key: context.key,
        value: count
    });
}

Stage 4: summarize

Final stage for reporting and error handling.

function summarize(summary) {
    // Log any errors
    summary.mapSummary.errors.iterator().each(function(key, error) {
        log.error('Map Error', 'Key: ' + key + ', Error: ' + error);
        return true;
    });
    
    // Log results
    var totalProcessed = 0;
    summary.output.iterator().each(function(key, value) {
        totalProcessed += parseInt(value);
        return true;
    });
    
    log.audit('Complete', 'Total processed: ' + totalProcessed);
    log.audit('Usage', 'Units used: ' + summary.usage);
}

3. Distributed Processing

Map/Reduce automatically distributes work across available queues:

⚠️ How It Works
  • getInputData: Runs once on one queue
  • map: Each record can run on ANY available queue in parallel
  • reduce: Each key group runs on any available queue
  • summarize: Runs once after all map/reduce complete

With 5 queues and 1000 records, up to 5 map executions run simultaneously!

4. Complete Example

/**
 * @NApiVersion 2.1
 * @NScriptType MapReduceScript
 */
define(['N/search', 'N/record', 'N/runtime'], function(search, record, runtime) {
    
    function getInputData() {
        return search.create({
            type: search.Type.SALES_ORDER,
            filters: [
                ['status', 'anyof', 'SalesOrd:B'], // Pending Fulfillment
                ['mainline', 'is', 'T']
            ],
            columns: ['entity', 'total']
        });
    }
    
    function map(context) {
        var searchResult = JSON.parse(context.value);
        var orderId = searchResult.id;
        var customerId = searchResult.values.entity.value;
        var total = parseFloat(searchResult.values.total);
        
        // Write customer ID as key for grouping
        context.write({
            key: customerId,
            value: total
        });
    }
    
    function reduce(context) {
        // context.values = array of order totals for this customer
        var customerId = context.key;
        var totalAmount = 0;
        
        context.values.forEach(function(value) {
            totalAmount += parseFloat(value);
        });
        
        log.debug('Customer Total', 
            'Customer: ' + customerId + ', Total: $' + totalAmount);
        
        context.write({
            key: customerId,
            value: totalAmount
        });
    }
    
    function summarize(summary) {
        // Handle errors
        if (summary.mapSummary.errors) {
            summary.mapSummary.errors.iterator().each(function(key, error) {
                log.error('Map Error', error);
                return true;
            });
        }
        
        // Count results
        var customerCount = 0;
        summary.output.iterator().each(function(key, value) {
            customerCount++;
            return true;
        });
        
        log.audit('Script Complete', 
            customerCount + ' customers processed. Units: ' + summary.usage);
    }
    
    return {
        getInputData: getInputData,
        map: map,
        reduce: reduce,
        summarize: summarize
    };
});

5. Map/Reduce vs Scheduled

FeatureScheduledMap/Reduce
Governance10,000 units, manual checkpointsAutomatic handling
ParallelismSingle queueMultiple queues simultaneously
Error handlingOne error stops scriptErrors isolated per record
Best forSimple batch jobsMassive data processing
ComplexitySimpleMore stages to manage
✅ Use Map/Reduce When:
  • Processing thousands of records
  • Need parallel processing
  • Individual record errors shouldn't stop everything
  • Need automatic governance handling

🏋️ Practice Exercises

Exercise 1: Basic Map/Reduce

Create a Map/Reduce script that finds all Invoices and logs each invoice number in the map stage.

Exercise 2: Aggregation

Modify the script to group invoices by customer (reduce) and calculate total amount owed per customer.

🎯 Key Takeaways

  • Map/Reduce has 4 stages: getInputData, map, reduce, summarize
  • getInputData provides data; map processes each item; reduce aggregates; summarize reports
  • Automatic governance handling - no manual checkpoints needed
  • Distributed processing uses all available queues in parallel
  • Errors are isolated - one failure doesn't stop other records
  • Use for massive datasets (thousands of records)