MapReduce is a handy way of approaching any large data problem, and there are more than a few to deal with in Pipeworks.  Luckily, MapReduce is also an incredibly easy algorithm to implement at it's core:  You take a large pile of data and use a map function to produce meaningful results from the pile of data.  You then group the results and pass them to a reduce function, which takes the mapped data and reduces it into common data.  The value in a "good" MapReduce implementation is in allowing the map step and reduce step to be distributed into multiple machines and background jobs.  In other words, the value of a "good" MapRepduce comes from things PowerShell already does about perfectly.

 

As of Pipeworks 1.9.8.4, Start-MapReduce is available to help you churn thru large batches of data.   Here's a sample of using it to churn thru xbox profile records in order to produce a list of profiles with location information (it's what used to produce the Location Leaderboards on UnlockAchievement.com) :

 

Search-AzureTable -TableName UnlockAchievementUsers -Filter "PartitionKey eq 'XboxProfile'" -StorageAccount startautomating -StorageKey (Get-SecureSetting AzureStorageAccountKey -ValueOnly)

    Start-MapReduce {

        param($location, $profile)

 

        $FriendlyLocationLookup = @{

            'AL' =  'Alabama', 'United States'

            'AK' =  'Alaska', 'United States'

            'AZ' =  'Arizona', 'United States'

            'AR' =  'Arkansas', 'United States'

            'CA' =  'California', 'United States'

            'CO' =  'Colorado', 'United States'

            'CT' =  'Connecticut', 'United States'

            'DE' =  'Delaware', 'United States'

            'FL' =  'Florida', 'United States'

            'GA' =  'Georgia', 'United States'

            'HI' =  'Hawaii', 'United States'

            'ID' =  'Idaho', 'United States'

            'IL' =  'Illinois', 'United States'

            'IN' =  'Indiana', 'United States'

            'IA' =  'Iowa', 'United States'

            'KS' =  'Kansas', 'United States'

            'KY' =  'Kentucky', 'United States'

            'LA' =  'Louisiana', 'United States'

            'ME' =  'Maine', 'United States'

            'MD' =  'Maryland', 'United States'

            'MA' =  'Massachusetts', 'United States'

            'MI' =  'Michigan', 'United States'

            'MN' =  'Minnesota', 'United States'

            'MS' =  'Mississippi', 'United States'

            'MO' =  'Missouri', 'United States'

            'MT' =  'Montana', 'United States'

            'NE' =  'Nebraska', 'United States'

            'NV' =  'Nevada', 'United States'

            'NH' =  'New Hampshire', 'United States'

            'NJ' =  'New Jersey', 'United States'

            'NM' =  'New Mexico', 'United States'

            'NY' =  'New York', 'United States'

            'NC' =  'North Carolina', 'United States'

            'ND' =  'North Dakota', 'United States'

            'OH' =  'Ohio', 'United States'

            'OK' =  'Oklahoma', 'United States'

            'OR' =  'Oregon', 'United States'

            'PA' =  'Pennsylvania', 'United States'

            'RI' =  'Rhode Island', 'United States'

            'SC' =  'South Carolina', 'United States'

            'SD' =  'South Dakota', 'United States'

            'TN' =  'Tennessee', 'United States'

            'TX' =  'Texas', 'United States'

            'UT' =  'Utah', 'United States'

            'VT' =  'Vermont', 'United States'

            'VA' =  'Virginia', 'United States'

            'WA' =  'Washington', 'United States'

            'WV' =  'West Virginia', 'United States'

            'WI' =  'Wisconsin', 'United States'

            'WY' =  'Wyoming', 'United States'

            'UK' = 'United Kingdom'  

            'U.K' = 'United Kingdom'

            'USA' = 'United States'       

        }

 

 

        foreach ($word in $location -split ',' -ne '') {

            if ($FriendlyLocationLookup[$word.Trim()]) {

                foreach ($locationName in $FriendlyLocationLookup[$word.Trim()]) {

                    New-Object PSObject -Property @{

                        LocationKeyword = $locationName

                        Profile = $profile

                    }

                }

           

            } else {

                New-Object PSObject -Property @{

                    LocationKeyword = $word.Replace("#39;","'").Replace("&'", "'").Replace("&", '&').Trim()

                    Profile = $profile

                }

            }

             

        }

   

 

    } -Reduce {

        param($LocationKeyword, $profiles)

       

 

        $totalGamerScore = 0

        foreach ($profile in $profiles) {

            $totalGamerScore += $profile.Profile.Gamerscore

        }

   

        New-Object PSObject -Property @{

            TotalGamerScore = $totalGamerScore

            Players = $profiles | Sort-Object { $_.Profile.Gamerscore -as [uint32] } -Descending | ForEach-Object { $_.Profile.GamerTag }

            Scores = $profiles | Sort-Object { $_.Profile.Gamerscore -as [uint32] } -Descending | ForEach-Object { $_.Profile.GamerScore }

            Location = $LocationKeyword

        }

   

    } 

 

 

With the code above, the xbox profiles will be mapped and reduced in background jobs.  To make it quicker or distribute the mapreduce to multiple computers, just use the -Grid option to utilize a grid of machines.   Start-MapReduce is a new handy tool in Pipeworks to make distributed data processing easier to achieve, and it will be useful in adding interesting new features to any site with a small (or large) haystack of data to mine.  Hopefully it gives you as much mileage as it has given me.

 

Hope this helps,

 

James