I've moved my blog to!. Please update your links. This blog is no longer in use--you can find all posts and comments at my new blog; I will no longer be posting to this site and comments have been disabled.

Thursday, August 12, 2010

Getting an Inventory of All SharePoint Documents Using Windows PowerShell

I got an email today asking if I had anything that would generate a report detailing all the documents throughout an entire SharePoint Farm. As this wasn’t the first time I’ve been asked this same question I decided that I’d just go ahead and post the script for generating such a report.

The script is really quite straightforward – it simply iterates through all Web Applications, Site Collections, Webs, Lists, and finally, List Items. I skip any List that is not a Document Library (as well as the Central Admin site) and then build a hash table containing all the data I want to capture. I then convert that hash table to an object which is written to the pipeline.

All of this is placed in a function which I can call and then pipe the output to something like the Out-GridView cmdlet or the Export-Csv cmdlet. I also wrote the script so that it works with either SharePoint 2007 or SharePoint 2010 so that I don’t have to maintain two versions (I could have used cmdlets such as Get-SPWebApplication, Get-SPSite, and Get-SPWeb but there was little benefit to doing so and the script would be limited to SharePoint 2010).

One word of caution – in a large Farm this script should be run off hours or at least on a back facing server (not your WFE) – it’s going to generate a lot of traffic to your database.

function Get-DocInventory() {
    $farm = [Microsoft.SharePoint.Administration.SPFarm]::Local
    foreach ($spService in $farm.Services) {
        if (!($spService -is [Microsoft.SharePoint.Administration.SPWebService])) {
        foreach ($webApp in $spService.WebApplications) {
            if ($webApp -is [Microsoft.SharePoint.Administration.SPAdministrationWebApplication]) { continue }
            foreach ($site in $webApp.Sites) {
                foreach ($web in $site.AllWebs) {
                    foreach ($list in $web.Lists) {
                        if ($list.BaseType -ne "DocumentLibrary") {
                        foreach ($item in $list.Items) {
                            $data = @{
                                "Web Application" = $webApp.ToString()
                                "Site" = $site.Url
                                "Web" = $web.Url
                                "list" = $list.Title
                                "Item ID" = $item.ID
                                "Item URL" = $item.Url
                                "Item Title" = $item.Title
                                "Item Created" = $item["Created"]
                                "Item Modified" = $item["Modified"]
                                "File Size" = $item.File.Length/1KB
                            New-Object PSObject -Property $data
Get-DocInventory | Out-GridView
#Get-DocInventory | Export-Csv -NoTypeInformation -Path c:\inventory.csv


Anonymous said...

can you also get the number of versions for each document?

Gary Lapointe said...

Yup - you would just add that as another property to expose:

"Item Versions" = $item.Versions.Count

SharePoint Talk and Rantings said...

Hi - thanks for posting the inventory script. When I run it, I am able to get a CSV file with some info and then powershell errors out.

"Get-DocInventory : Exception has been thrown by the target of an invocation.
At C:\getdocs.ps1:41 char:17
+ Get-DocInventory <<<< | Export-Csv -NoTypeInformation -Path c:\inventory.csv
+ CategoryInfo : NotSpecified: (:) [Get-DocInventory], TargetInvocationExcepti
+ FullyQualifiedErrorId : System.Reflection.TargetInvocationException,Get-DocInventory

Any thoughts as to what I might be doing wrong?

Gary Lapointe said...

Not sure what the issue is - you might try adding some Write-Host statements in the script to see where exactly it is failing (or just use the $Error variable to dig into the stack trace if you know how to do that).

Maarten said...

In this case I don't think code or powershell is the right way to go due to the overhead and speed (lack thereof) of sharepoint api at grabbing a dataset like this. I much prefere (and use) this direct SQL query instead:

Webs.Title AS 'Web',
AllLists.tp_Title AS 'List Name',
AllDocs.DirName AS 'URL',
AllDocs.LeafName AS 'Name',
AllDocs.ExtensionForFile AS 'File Type',
AllDocStreams.Size AS 'File Size',
FROM AllDocs
JOIN AllLists
ON AllLists.tp_ID = AllDocs.ListId
JOIN AllDocStreams
ON AllDocStreams.Id = AllDocs.Id
ON Webs.Id = AllLists.tp_WebId
ORDER BY Webs.Title

^takes literally a thousand times less overall processing time than sharepoint api.

You can also filter results easily using something like this:
WHERE Webs.Title NOT LIKE '%SiteName%'
AND AllDocs.TimeCreated > '2010-05-10 00:00:00.000'

Gary Lapointe said...

Yes, hitting the database directly is always going to be faster but it's also not supported and puts you in an unsupported state with Microsoft. The supported route is to use the API - it's slower but...

bigKenny said...

A little Select statement never hurt nobody :)

David Tappan said...

I have two questions:

1. Does this script work on 2010 as well? It worked fine for me on 2007.
2. I need to use a similar approach to get the following properties from each list (not list item):
a. Content types configured on the list, and the scope of the content type (list, web or site)
b. Columns configured on the list, and again the scope (list or content type, and if content type, which one)

How do I go about finding out what properties are available on a list in PowerShell?

Gary Lapointe said...

Yes, this will work in both 2007 and 2010. To get the properties you can either use the SDK or, if you have an instance of a list (an SPList object), you can pass that object to the Get-Member cmdlet to see all the methods and properties that are available.