Module 9: Map/Reduce Scripts

📑 In This Module

Learning Objectives
1. Map/Reduce Overview
2. The Four Stages
3. Distributed Processing
4. Complete Example
5. Map/Reduce vs Scheduled
Practice Exercises

🎯 Learning Objectives

Understand Map/Reduce architecture
Implement the four stages: getInputData, map, reduce, summarize
Leverage distributed processing for massive datasets
Handle errors with the summarize stage
Choose between Map/Reduce and Scheduled scripts

1. Map/Reduce Overview

Map/Reduce is the most powerful script type for processing thousands of records. It breaks work into stages that can run in parallel across multiple queues.

💡 Key Advantages

Automatic Governance - System handles checkpoints
Distributed Processing - Uses all available queues
Built-in Error Handling - Errors don't stop the whole script
Yield Control - Can pause to let other scripts run

2. The Four Stages

getInputData → map → reduce → summarize
     ↓           ↓       ↓          ↓
   Get data   Process  Aggregate  Report
   to work    each     by key     results
   with       record

Stage 1: getInputData

Returns the data to process. Can return a search, array, or object.

function getInputData() {
    // Return a search - most common
    return search.create({
        type: search.Type.CUSTOMER,
        filters: [['email', 'isnotempty', '']],
        columns: ['entityid', 'email']
    });
    
    // Or return an array
    // return [1, 2, 3, 4, 5];
    
    // Or return an object
    // return { key1: 'value1', key2: 'value2' };
}

Stage 2: map

Called once per input. Process individual items here.

function map(context) {
    // context.key = index or key
    // context.value = the data (JSON string if from search)
    
    var data = JSON.parse(context.value);
    var customerId = data.id;
    var email = data.values.email;
    
    // Do processing...
    
    // Optionally write to reduce stage
    context.write({
        key: 'processed',
        value: customerId
    });
}

Stage 3: reduce

Groups values by key from map stage. Use for aggregations.

function reduce(context) {
    // context.key = the key written in map
    // context.values = array of all values with that key
    
    var count = context.values.length;
    log.debug('Reduce', 'Key: ' + context.key + ', Count: ' + count);
    
    // Write to summarize
    context.write({
        key: context.key,
        value: count
    });
}

Stage 4: summarize

Final stage for reporting and error handling.

function summarize(summary) {
    // Log any errors
    summary.mapSummary.errors.iterator().each(function(key, error) {
        log.error('Map Error', 'Key: ' + key + ', Error: ' + error);
        return true;
    });
    
    // Log results
    var totalProcessed = 0;
    summary.output.iterator().each(function(key, value) {
        totalProcessed += parseInt(value);
        return true;
    });
    
    log.audit('Complete', 'Total processed: ' + totalProcessed);
    log.audit('Usage', 'Units used: ' + summary.usage);
}

3. Distributed Processing

Map/Reduce automatically distributes work across available queues:

⚠️ How It Works

getInputData: Runs once on one queue
map: Each record can run on ANY available queue in parallel
reduce: Each key group runs on any available queue
summarize: Runs once after all map/reduce complete

With 5 queues and 1000 records, up to 5 map executions run simultaneously!

4. Complete Example

/**
 * @NApiVersion 2.1
 * @NScriptType MapReduceScript
 */
define(['N/search', 'N/record', 'N/runtime'], function(search, record, runtime) {
    
    function getInputData() {
        return search.create({
            type: search.Type.SALES_ORDER,
            filters: [
                ['status', 'anyof', 'SalesOrd:B'], // Pending Fulfillment
                ['mainline', 'is', 'T']
            ],
            columns: ['entity', 'total']
        });
    }
    
    function map(context) {
        var searchResult = JSON.parse(context.value);
        var orderId = searchResult.id;
        var customerId = searchResult.values.entity.value;
        var total = parseFloat(searchResult.values.total);
        
        // Write customer ID as key for grouping
        context.write({
            key: customerId,
            value: total
        });
    }
    
    function reduce(context) {
        // context.values = array of order totals for this customer
        var customerId = context.key;
        var totalAmount = 0;
        
        context.values.forEach(function(value) {
            totalAmount += parseFloat(value);
        });
        
        log.debug('Customer Total', 
            'Customer: ' + customerId + ', Total: $' + totalAmount);
        
        context.write({
            key: customerId,
            value: totalAmount
        });
    }
    
    function summarize(summary) {
        // Handle errors
        if (summary.mapSummary.errors) {
            summary.mapSummary.errors.iterator().each(function(key, error) {
                log.error('Map Error', error);
                return true;
            });
        }
        
        // Count results
        var customerCount = 0;
        summary.output.iterator().each(function(key, value) {
            customerCount++;
            return true;
        });
        
        log.audit('Script Complete', 
            customerCount + ' customers processed. Units: ' + summary.usage);
    }
    
    return {
        getInputData: getInputData,
        map: map,
        reduce: reduce,
        summarize: summarize
    };
});

5. Map/Reduce vs Scheduled

Feature	Scheduled	Map/Reduce
Governance	10,000 units, manual checkpoints	Automatic handling
Parallelism	Single queue	Multiple queues simultaneously
Error handling	One error stops script	Errors isolated per record
Best for	Simple batch jobs	Massive data processing
Complexity	Simple	More stages to manage

✅ Use Map/Reduce When:

Processing thousands of records
Need parallel processing
Individual record errors shouldn't stop everything
Need automatic governance handling

🏋️ Practice Exercises

Exercise 1: Basic Map/Reduce

Create a Map/Reduce script that finds all Invoices and logs each invoice number in the map stage.

Exercise 2: Aggregation

Modify the script to group invoices by customer (reduce) and calculate total amount owed per customer.

🎯 Key Takeaways

Map/Reduce has 4 stages: getInputData, map, reduce, summarize
getInputData provides data; map processes each item; reduce aggregates; summarize reports
Automatic governance handling - no manual checkpoints needed
Distributed processing uses all available queues in parallel
Errors are isolated - one failure doesn't stop other records
Use for massive datasets (thousands of records)

← Back to Week 5 Next: Module 10 →