📑 In This Module
🎯 Learning Objectives
- Understand Map/Reduce architecture
- Implement the four stages: getInputData, map, reduce, summarize
- Leverage distributed processing for massive datasets
- Handle errors with the summarize stage
- Choose between Map/Reduce and Scheduled scripts
1. Map/Reduce Overview
Map/Reduce is the most powerful script type for processing thousands of records. It breaks work into stages that can run in parallel across multiple queues.
- Automatic Governance - System handles checkpoints
- Distributed Processing - Uses all available queues
- Built-in Error Handling - Errors don't stop the whole script
- Yield Control - Can pause to let other scripts run
2. The Four Stages
getInputData → map → reduce → summarize
↓ ↓ ↓ ↓
Get data Process Aggregate Report
to work each by key results
with record
Stage 1: getInputData
Returns the data to process. Can return a search, array, or object.
function getInputData() {
// Return a search - most common
return search.create({
type: search.Type.CUSTOMER,
filters: [['email', 'isnotempty', '']],
columns: ['entityid', 'email']
});
// Or return an array
// return [1, 2, 3, 4, 5];
// Or return an object
// return { key1: 'value1', key2: 'value2' };
}
Stage 2: map
Called once per input. Process individual items here.
function map(context) {
// context.key = index or key
// context.value = the data (JSON string if from search)
var data = JSON.parse(context.value);
var customerId = data.id;
var email = data.values.email;
// Do processing...
// Optionally write to reduce stage
context.write({
key: 'processed',
value: customerId
});
}
Stage 3: reduce
Groups values by key from map stage. Use for aggregations.
function reduce(context) {
// context.key = the key written in map
// context.values = array of all values with that key
var count = context.values.length;
log.debug('Reduce', 'Key: ' + context.key + ', Count: ' + count);
// Write to summarize
context.write({
key: context.key,
value: count
});
}
Stage 4: summarize
Final stage for reporting and error handling.
function summarize(summary) {
// Log any errors
summary.mapSummary.errors.iterator().each(function(key, error) {
log.error('Map Error', 'Key: ' + key + ', Error: ' + error);
return true;
});
// Log results
var totalProcessed = 0;
summary.output.iterator().each(function(key, value) {
totalProcessed += parseInt(value);
return true;
});
log.audit('Complete', 'Total processed: ' + totalProcessed);
log.audit('Usage', 'Units used: ' + summary.usage);
}
3. Distributed Processing
Map/Reduce automatically distributes work across available queues:
- getInputData: Runs once on one queue
- map: Each record can run on ANY available queue in parallel
- reduce: Each key group runs on any available queue
- summarize: Runs once after all map/reduce complete
With 5 queues and 1000 records, up to 5 map executions run simultaneously!
4. Complete Example
/**
* @NApiVersion 2.1
* @NScriptType MapReduceScript
*/
define(['N/search', 'N/record', 'N/runtime'], function(search, record, runtime) {
function getInputData() {
return search.create({
type: search.Type.SALES_ORDER,
filters: [
['status', 'anyof', 'SalesOrd:B'], // Pending Fulfillment
['mainline', 'is', 'T']
],
columns: ['entity', 'total']
});
}
function map(context) {
var searchResult = JSON.parse(context.value);
var orderId = searchResult.id;
var customerId = searchResult.values.entity.value;
var total = parseFloat(searchResult.values.total);
// Write customer ID as key for grouping
context.write({
key: customerId,
value: total
});
}
function reduce(context) {
// context.values = array of order totals for this customer
var customerId = context.key;
var totalAmount = 0;
context.values.forEach(function(value) {
totalAmount += parseFloat(value);
});
log.debug('Customer Total',
'Customer: ' + customerId + ', Total: $' + totalAmount);
context.write({
key: customerId,
value: totalAmount
});
}
function summarize(summary) {
// Handle errors
if (summary.mapSummary.errors) {
summary.mapSummary.errors.iterator().each(function(key, error) {
log.error('Map Error', error);
return true;
});
}
// Count results
var customerCount = 0;
summary.output.iterator().each(function(key, value) {
customerCount++;
return true;
});
log.audit('Script Complete',
customerCount + ' customers processed. Units: ' + summary.usage);
}
return {
getInputData: getInputData,
map: map,
reduce: reduce,
summarize: summarize
};
});
5. Map/Reduce vs Scheduled
| Feature | Scheduled | Map/Reduce |
|---|---|---|
| Governance | 10,000 units, manual checkpoints | Automatic handling |
| Parallelism | Single queue | Multiple queues simultaneously |
| Error handling | One error stops script | Errors isolated per record |
| Best for | Simple batch jobs | Massive data processing |
| Complexity | Simple | More stages to manage |
- Processing thousands of records
- Need parallel processing
- Individual record errors shouldn't stop everything
- Need automatic governance handling
🏋️ Practice Exercises
Create a Map/Reduce script that finds all Invoices and logs each invoice number in the map stage.
Modify the script to group invoices by customer (reduce) and calculate total amount owed per customer.
🎯 Key Takeaways
- Map/Reduce has 4 stages: getInputData, map, reduce, summarize
- getInputData provides data; map processes each item; reduce aggregates; summarize reports
- Automatic governance handling - no manual checkpoints needed
- Distributed processing uses all available queues in parallel
- Errors are isolated - one failure doesn't stop other records
- Use for massive datasets (thousands of records)