Google
 

Thursday, April 1, 2010

Splitting csv file based on content in one line using PowerShell

Problem:
you have a csv file that contains department employees in a format like this:
Department,Employee
Sales,emp1
HR,emp2
Sales,emp3
Finance,emp4
Finance,emp5
Security,emp6
Security,emp7
Security,emp8
HR,emp9
And you need to split this file contents to separate files based on department name. So for the above example, we should get four files, Sales.csv, HR.csv, Finance.csv, and Security.csv. Each file contains only it's employees.
And the solution is really shows the power of PowerShell pipelining:

Import-Csv file.csv | Group-Object -Property "department" | Foreach-Object {$path=$_.name+".csv" ; $_.group | Export-Csv -Path $path -NoTypeInformation}

Dissecting the above commands:
Import-Csv file.csv:
Parses the csv file and returns an array of objects.

| Group-Object -Property "department":
Since we need to split by department, it makes sense to group objects by the department property.

| Foreach-Object {...}:
We need to apply an action for each group (department). So we pipeline the resulted groups to Foreach-Object

$path=$_.name+".csv":
Within the foreach, we need to create a temporary variable ($path) to be passed to the next pipeline responsible for the actual saving. Note that I use the semicolon ";" to separate this part from the next. And I used the name property of the group (which maps to department name in our case) to format the file name.

$_.group | Export-Csv -Path $path -NoTypeInformation
:
Then for each group we have, we need to export its contents (csv file rows) to the file path created in the past step. So we again pipeline the group property of the group item (which is an ArrayList of original objects) to the Export-CSV Cmdlt.

And the result should be files like:
Finance.csv:
"Department","Employee"
"Finance","emp4"
"Finance","emp5"

No comments: