I've moved my blog to!. Please update your links. This blog is no longer in use--you can find all posts and comments at my new blog; I will no longer be posting to this site and comments have been disabled.

Tuesday, December 18, 2007

Save List Items and Files to Disk

I've seen numerous examples of people needing to save all the files from a document library or custom list (containing attachments) to disk. I didn't necessarily need the ability myself for the upgrade we are doing but I did need a quick way to generate lots of different samples to make sure that my gl-addlistitem command was working correctly. So I decided to create a new command which would make my testing easier as well as help the many out there that have the need of saving lots of files out to disk. The command I created is gl-exportlistitem2. I already had an gl-exportlistitem command which used the deployment API and I just wasn't feeling very creative with the name so I just added "2" (maybe "savelistdata" is better???). The command does two key things - saves all the files to a specified path and creates a Manifest.xml file that contains information about the files and any list items that were in the list. This information can then be used by the gl-addlistitem command to actually import the data into another list. For this initial version I've kept things fairly simple - there's no compression, no security information, and no version history. I'm only storing the file(s) (if present) and any field data (perhaps I'll look to handle more data in the future but for now this met my needs). The nice thing is that if you don't need any of the other information then what I created actually works better than using the deployment API as mine actually takes folder location into account whereas the deployment API is extremely buggy when it comes to folders. I also included a simplified version of the code which just simply dumps all the files to disk without the manifest information (the command does not use this but I kept it in the source in case anyone needed it). The code to do all of this is really straightforward - I decided to break it up into two chunks - the first gathers all the necessary data from the list and stores it in some custom data classes and the second takes those classes and saves to disk and creates the actual manifest file:

   1: /// <summary>
   2: /// Gets the item data.
   3: /// </summary>
   4: /// <param name="web">The web.</param>
   5: /// <param name="list">The list.</param>
   6: /// <param name="ids">The ids.</param>
   7: /// <returns></returns>
   8: private static List<ItemInfo> GetItemData(SPWeb web, SPList list, List<int> ids)
   9: {
  10:  List<ItemInfo> itemData = new List<ItemInfo>();
  12:  foreach (SPListItem item in list.Items)
  13:  {
  14:   if (!(ids.Count == 0 || ids.Contains(item.ID)))
  15:    continue;
  17:   ItemInfo info = new ItemInfo();
  18:   itemData.Add(info);
  19:   info.ID = item.ID;
  21:   if (item.File != null)
  22:   {
  23:    info.File = new FileDetails(item.File.OpenBinary(), item.File.Name, item.File.Author, item.File.TimeCreated);
  24:    info.Title = item.File.Name;
  25:   }
  26:   else
  27:    info.Title = item.Title;
  29:   info.FolderUrl = item.Url.Substring(list.RootFolder.Url.ToString().Length, item.Url.LastIndexOf("/") - list.RootFolder.Url.ToString().Length);
  31:   try
  32:   {
  33:    foreach (string fileName in item.Attachments)
  34:    {
  35:     SPFile file = web.GetFile(item.Attachments.UrlPrefix + fileName);
  36:     info.Attachments.Add(new FileDetails(file.OpenBinary(), file.Name, file.Author, file.TimeCreated));
  37:    }
  38:   }
  39:   catch (ArgumentException)
  40:   {}
  42:   foreach (SPField field in list.Fields)
  43:   {
  44:    if (!field.ReadOnlyField && 
  45:     field.InternalName != "Attachments" && 
  46:     field.InternalName != "FileLeafRef" &&
  47:     item[field.InternalName] != null)
  48:    {
  49:     info.FieldData.Add(field.InternalName, item[field.InternalName].ToString());
  50:    }
  51:   }
  52:  }
  53:  return itemData;
  54: }
  56: /// <summary>
  57: /// Gets the item data from XML.
  58: /// </summary>
  59: /// <param name="itemData">The item data.</param>
  60: /// <param name="manifestPath">The manifest path.</param>
  61: private static void SaveItemData(List<ItemInfo> itemData, string manifestPath)
  62: {
  63:  if (string.IsNullOrEmpty(manifestPath))
  64:   throw new ArgumentNullException("manifest", "No directory was specified for the manifest.");
  66:  if (!Directory.Exists(manifestPath))
  67:   Directory.CreateDirectory(manifestPath);
  69:  string dataPath = Path.Combine(manifestPath, "Data");
  71:  StringBuilder sb = new StringBuilder();
  73:  XmlTextWriter xmlWriter = new XmlTextWriter(new StringWriter(sb));
  74:  xmlWriter.Formatting = Formatting.Indented;
  76:  xmlWriter.WriteStartElement("Items");
  78:  foreach (ItemInfo info in itemData)
  79:  {
  80:   xmlWriter.WriteStartElement("Item");
  82:   if (info.File != null)
  83:   {
  84:    string folder = Path.Combine(dataPath, info.FolderUrl.Trim('\\', '/')).Replace("/", "\\");
  85:    if (!Directory.Exists(folder))
  86:     Directory.CreateDirectory(folder);
  88:    xmlWriter.WriteAttributeString("File", Path.Combine(folder, info.File.Name));
  89:    xmlWriter.WriteAttributeString("Author", info.File.Author.LoginName);
  90:    xmlWriter.WriteAttributeString("CreatedDate", info.File.CreatedDate.ToString());
  91:    File.WriteAllBytes(Path.Combine(folder, info.File.Name), info.File.File);
  92:   }
  93:   xmlWriter.WriteAttributeString("LeafName", info.Title);
  94:   xmlWriter.WriteAttributeString("FolderUrl", info.FolderUrl);
  96:   xmlWriter.WriteStartElement("Fields");
  97:   foreach (string key in info.FieldData.Keys)
  98:   {
  99:    xmlWriter.WriteStartElement("Field");
 100:    xmlWriter.WriteAttributeString("Name", key);
 101:    xmlWriter.WriteString(info.FieldData[key]);
 102:    xmlWriter.WriteEndElement(); // Field
 103:   }
 104:   xmlWriter.WriteEndElement(); // Fields
 106:   xmlWriter.WriteStartElement("Attachments");
 107:   foreach (FileDetails file in info.Attachments)
 108:   {
 109:    string folder = Path.Combine(Path.Combine(dataPath, info.FolderUrl.Trim('\\', '/')).Replace("/", "\\"), "item_" + info.ID);
 110:    if (!Directory.Exists(folder))
 111:     Directory.CreateDirectory(folder);
 113:    xmlWriter.WriteElementString("Attachment", Path.Combine(folder, file.Name));
 115:    File.WriteAllBytes(Path.Combine(folder, file.Name), file.File);
 116:   }
 117:   xmlWriter.WriteEndElement(); // Attachments
 119:   xmlWriter.WriteEndElement(); // Item
 120:  }
 122:  xmlWriter.WriteEndElement();
 123:  xmlWriter.Flush();
 125:  File.WriteAllText(Path.Combine(manifestPath, "Manifest.xml"), sb.ToString());
 126: }
 128: #region Private Classes
 130: private class FileDetails
 131: {
 132:  public byte[] File = null;
 133:  public string Name = null;
 134:  public SPUser Author = null;
 135:  public DateTime CreatedDate = DateTime.Now;
 136:  public FileDetails(byte[] file, string name, SPUser author, DateTime createdDate)
 137:  {
 138:   File = file;
 139:   Name = name;
 140:   Author = author;
 141:   CreatedDate = createdDate;
 142:  }
 143: }
 144: private class ItemInfo
 145: {
 146:  public FileDetails File = null;
 147:  public string FolderUrl = null;
 148:  public List<FileDetails> Attachments = new List<FileDetails>();
 149:  public Dictionary<string, string> FieldData = new Dictionary<string, string>();
 150:  public int ID = -1;
 151:  public string Title = null;
 152: }
 153: #endregion

The syntax of the command can be seen below:

C:\>stsadm -help gl-exportlistitem2

stsadm -o gl-exportlistitem2

Exports list items to disk (exported results can be used with addlistitem).

        -url <list view url to export from>
        -path <export path>
        [-id <list item ID (separate multiple items with a comma)>]
Here's an example of how to do export list items:
stsadm -o gl-exportlistitem2 -url "http://intranet/documents/forms/allitems.aspx" -path "c:\documents"
Note that a "Data" folder will be created under the path specified - all files will be put in this folder and the folder structure will mirror that of the list. The Manifest.xml file will be in the root of the folder specified. Attachments will be stored in sub-folders using the name "item_{ID}" where {ID} is the item ID. Once exported you could then use the gl-addlistitem command to import these items to another list:
stsadm -o gl-addlistitem -url "http://intranet/documents2/forms/allitems.aspx" -datafile "c:\documents\manifest.xml" -publish
Update 1/31/2008: I've modified this command so that it now also supports exporting web part pages. The resultant exported manifest file can be used in conjunction with the gl-addlistitem command so that web part pages can be properly imported using that command.


thethicalhacker said...

Does this also allow the exportation of issues lists? Since you are not exporting versions as well, I would assume not.

Now since I need to export issues lists, I have tried to just export the entire site, but of course, life cannot be that easy and consequently, for large sites, the export typically fails under the banner of something not containing unique something or another. My presumption is that something was allowed to duplicate while the site was underneath 2003.

Gary Lapointe said...

I haven't tried it with an issues list but it may work (not sure if it uses history or folders (like discusions) to group and link information - haven't looked too closely). Did you try the exportlist/importlist commands with the retainobjectidentity flag? Also - I think SP1 may have fixed some issues regarding unique constraints (I seem to remember seeing something when was looking through the release notes).

Frank-Ove Kristiansen said...

Hi, Gary.

First of all, I would just like to say I love what you've been doing, posting all this code and all. You're a lifesaver!

Anyways, I'm trying to use exportlistitem2 to export pages from the Pages library. But it fails on the web part pages. It gives me:

WARNING: Cannot export web part. Make sure that the web part assembly is in the GAC and is registered as a safe control

I've managed to debug this, and it only applies to my custom web part on the page. Funny thing is that my custom web part is indeed installed in the GAC and registered as a safe control. I also have a Telerik web part on the same page, and this gets exported just fine...

Hope you can give me any clue as to what might be wrong.

Once again, thanks a bunch!


Gary Lapointe said...

Frank-Ove - the addlistitem and exportlistitem2 commands is something that I've been doing a lot of work on and it's definitely not perfect (I just posted an update yesterday to address a minor issue with ListViewWebPart controls). With custom web parts it's hard to say - are you able to export the web part via the UI? Is the web part marked as exportable (shouldn't matter but...)? Are you doing anything in the loading of your web part that is dependent on the existence of SPContext (shouldn't matter if you're using a more recent version of the code as I now export using the web part web service)?

Frank-Ove Kristiansen said...

Hi again.
Thank you for quick feedback!

Yes, I am able to export my web part. But that aside, you were correct about me having code in the web part constructor. I had the following code:

Guid pagesGuid = PublishingWeb.GetPagesListId(SPContext.Current.Web);
string pagesName = PublishingWeb.GetPagesListName(SPContext.Current.Web);
if (this.WebUrl.Equals(string.Empty) && this.ListGuid.Equals(string.Empty) && this.ListName.Equals(string.Empty))
this.WebUrl = SPContext.Current.Web.ServerRelativeUrl;
this.ListGuid = pagesGuid.ToString();
this.ListName = pagesName;

After commenting these lines out, and a new build and deployment, it worked.
My web part is inheriting from the ContentByQueryWebPart. As you can see from my lines of code, I'm trying to point it to the current sites' Pages library. Since this is not possible inside a .webpart file, I thought I might do it here.


Gary Lapointe said...

Not sure what version of the code you're using (I really should probably start versioning this stuff) but you may want to try the latest - I modified the code at one point so that it uses the built in web services to get the exported web part xml so as to get around the specific issue of web parts requiring SPContext.

Frank-Ove Kristiansen said...

Sorry, forgot to mention in the previous post.
I downloaded the latest version this morning (Friday).

One other thing I ran into: When exporting from the Pages library, it's not given that all the items are based on a Page Layout. The page newsarchive.aspx created via the site definition SPSNHOME is one example of this. In this case, you should add a check for this in exportlistitem2, and consequencely also in addlistitem.


Peeter said...

Is it possible to export pictures? I tried this but:
Progress: Getting item data for item '7101'
Could not load file or assembly 'Microsoft.SharePoint.Publishing, Version=12.0.0
.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c' or one of its dependencies
. The system cannot find the file specified.

Gary Lapointe said...

My guess is that you are using WSS and not MOSS? I've only tested this command against MOSS. I have some things in there which try to address issues with publishing sites and I haven't done testing to add the necessary error handling when not working with MOSS.

romain said...

Hi Gary!

I'd just like to know if we can use it with Sharepoint 2001 Portal Server.
In fact I have to do a migration from Sharepoint 2001 to a filesystem. I have about 7500 documents and 1000 folders to move with their metadata in a file.


Gary Lapointe said...

Unfortunately, no - all of my commands are specific to the 2007 product.

romain said...

And have you an idea on how to do such a migration on Sharepoint 2001 ?
Is STSADM available on Sharepoint 2001 ?
Thank you for your answers!

Trent said...

I am trying to export all 3000+ documents from a document library, and I got the following error:
Exception of type 'System.OutOfMemoryException' was thrown.

The last message before this error was thrown:
Progress: Getting item data for item '1534'

Any idea? (The server has 16GB physical memory). Thanks!

Gary Lapointe said...

Yeah - it's because I didn't really code it very well - I originally built this to solve a quick issue and kept building on it without refactoring. Problem is that it suffers from one fundamental flaw - I store everything in memory before saving to disk so if you have a lot of stuff it will eventually run out of memory. I'd suggest you look at the gl-exportlistitem (or gl-exportlist) commands - I hope to one day rework this one but I've not yet had a need so haven't really worried about it.

Trent said...

Thanks, Gary. gl-exportlistitem works, but I like gl-exportlistitem2 as it meets my needs better. Look forward to your modified version of gl-exportlistitem2.

I wanted to let you know that your STSADM custom extensions have been great help to me. Super work!

Luke said...

I'll move my post to the proper thread.

Is there any way to just extract the metadata? I am not concerned with the files themselves, just the metadata associated with them.

Gary Lapointe said...

At present there isn't a way to get just the metadata without the files.