MOSS MVP

I've moved my blog to http://blog.falchionconsulting.com!. Please update your links. This blog is no longer in use--you can find all posts and comments at my new blog; I will no longer be posting to this site and comments have been disabled.

Sunday, May 18, 2008

Setting Metadata

In my last post I mentioned a project which required me to move documents from one list to another list in a different farm one folder at a time.  Along with that was a requirement to set various field values (metadata) based on patterns in the folder name and/or filename.  I needed a reasonably flexible way to accomplish this considering that the client didn't actually have a clue as to what they really wanted the rules to be.  I already had a command (gl-replacefieldvalues) which let me set the value of an existing field but it didn't allow me to do it based on the values of other fields and there was not real filtering capability.  So I built a new command called gl-setmetadata which allows me to pass in an XML file containing various rules.

There's really not much to the code - the bulk of it is just parsing the XML and figuring out what to do.  There's two core methods - the first, ProcessFolder, is responsible for getting the collection of items that should be processed using the provided rules.  This is done by using an SPQuery object and passing in the Query XML node if present.  The second method, ApplyRule, is called by the ProcessFolder method for each Rule node found in the XML and it is responsible for setting any field data based on the rules.

   1: public SetMetaData()
   2: {
   3:     SPParamCollection parameters = new SPParamCollection();
   4:     parameters.Add(new SPParam("url", "url", true, null, new SPNonEmptyValidator(), "Please specify the url to search."));
   5:     parameters.Add(new SPParam("quiet", "q"));
   6:     parameters.Add(new SPParam("test", "t"));
   7:     parameters.Add(new SPParam("inputfile", "input", true, null, new SPFileExistsValidator()));
   8:     parameters.Add(new SPParam("logfile", "log", false, null, new SPDirectoryExistsAndValidFileNameValidator()));
   9:     parameters.Add(new SPParam("recursefolders", "recurse"));
  10:  
  11:     StringBuilder sb = new StringBuilder();
  12:     sb.Append("\r\n\r\nUpdates list field values based on the rules defined in the provided input file.  Use -test to verify your updates before executing.\r\n\r\nParameters:");
  13:     sb.Append("\r\n\t-url <list folder url>");
  14:     sb.Append("\r\n\t-inputfile <input file containing meta data rules>");
  15:     sb.Append("\r\n\t[-recursefolders]");
  16:     sb.Append("\r\n\t[-quiet]");
  17:     sb.Append("\r\n\t[-test]");
  18:     sb.Append("\r\n\t[-logfile <log file>]");
  19:  
  20:     Init(parameters, sb.ToString());
  21: }
  22:  
  23: /// <summary>
  24: /// Gets the help message.
  25: /// </summary>
  26: /// <param name="command">The command.</param>
  27: /// <returns></returns>
  28: public override string GetHelpMessage(string command)
  29: {
  30:     return HelpMessage;
  31: }
  32:  
  33: /// <summary>
  34: /// Runs the specified command.
  35: /// </summary>
  36: /// <param name="command">The command.</param>
  37: /// <param name="keyValues">The key values.</param>
  38: /// <param name="output">The output.</param>
  39: /// <returns></returns>
  40: public override int Execute(string command, StringDictionary keyValues, out string output)
  41: {
  42:     output = string.Empty;            
  43:  
  44:     string url = Params["url"].Value.TrimEnd('/');
  45:     bool quiet = Params["quiet"].UserTypedIn;
  46:     bool testMode = Params["test"].UserTypedIn;
  47:     string logFile = Params["logfile"].Value;
  48:     XmlDocument metaDataDoc = new XmlDocument();
  49:     string inputFile = Params["inputfile"].Value;
  50:     bool recurseFolders = Params["recursefolders"].UserTypedIn;
  51:  
  52:     Verbose = !quiet;
  53:     LogFile = logFile;
  54:  
  55:     metaDataDoc.Load(inputFile);
  56:  
  57:     using (SPSite site = new SPSite(url))
  58:     using (SPWeb web = site.OpenWeb())
  59:     {
  60:         SPFolder folder = web.GetFolder(url);
  61:  
  62:         if (!folder.Exists || folder == null) // the null check is unnecessary but it makes me feel better.
  63:             throw new SPException("The specified list folder was not found.");
  64:  
  65:         SPList list = null;
  66:         try
  67:         {
  68:             list = web.Lists[folder.ParentListId];
  69:         }
  70:         catch (ArgumentException)
  71:         {}
  72:         if (list == null) // This should never happen if we found a folder but again, it makes me feel better having it.
  73:             throw new SPException("The specified list was not found.");
  74:  
  75:         // Process the folder.
  76:         ProcessFolder(folder, list, metaDataDoc, recurseFolders, testMode);
  77:     }
  78:     return OUTPUT_SUCCESS;
  79: }
  80:  
  81: /// <summary>
  82: /// Processes the folder.
  83: /// </summary>
  84: /// <param name="folder">The folder.</param>
  85: /// <param name="list">The list.</param>
  86: /// <param name="metaDataDoc">The meta data doc.</param>
  87: /// <param name="recurseFolders">if set to <c>true</c> [recurse folders].</param>
  88: /// <param name="testMode">if set to <c>true</c> [test mode].</param>
  89: private static void ProcessFolder(SPFolder folder, SPList list, XmlDocument metaDataDoc, bool recurseFolders, bool testMode)
  90: {
  91:     // If we don't have any rules to process then there's no sense continueing so error out.
  92:     if (metaDataDoc.SelectNodes("//Rule").Count == 0)
  93:         throw new SPException("Missing \"Rule\" node(s) which should be a child of the root \"MetaData\" node.");
  94:  
  95:     // Get a namespace manager so that we can retrieve the Query element if present.
  96:     XmlNamespaceManager nsManager = new XmlNamespaceManager(metaDataDoc.NameTable);
  97:     nsManager.AddNamespace("sp", "http://schemas.microsoft.com/sharepoint/");
  98:  
  99:     // Look for a Query element
 100:     XmlElement queryElement = (XmlElement)metaDataDoc.SelectSingleNode("//sp:Query", nsManager);
 101:     SPListItemCollection items;
 102:     SPQuery query = new SPQuery();
 103:     if (recurseFolders)
 104:         query.ViewAttributes = "Scope=\"Recursive\"";
 105:     // Set the root folder to query
 106:     query.Folder = folder;
 107:     if (queryElement != null)
 108:     {
 109:         // We have a query element so do an intial filtering using the provided filter
 110:         query.Query = queryElement.OuterXml;
 111:         items = list.GetItems(query);
 112:     }
 113:     else
 114:     {
 115:         // User didn't provide any query parameters so just use an empty query (no filtering)
 116:         items = list.GetItems(query);
 117:     }
 118:  
 119:     Log("Beginning processing of {0} items...", items.Count.ToString());
 120:     int modificationCount = 0;
 121:  
 122:     for (int i = 0; i < items.Count; i++)
 123:     {
 124:         SPListItem item = items[i];
 125:         Log("Progress: Processing item {0}: {1}\r\n", item.ID.ToString(), item["ServerUrl"].ToString());
 126:  
 127:         if (item.FileSystemObjectType == SPFileSystemObjectType.Folder)
 128:         {
 129:             // Currently not handling folders - no particular reason, I just don't need this ability.
 130:             // Commenting out this block will not hurt anything.
 131:             Log("Progress: Item {0} is a folder - skipping.", item.ID.ToString());
 132:             continue;
 133:         }
 134:  
 135:         bool modified = false;
 136:  
 137:         // Loop through each rule element and apply the rules changes
 138:         foreach (XmlElement ruleElement in metaDataDoc.SelectNodes("//Rule"))
 139:         {
 140:             if (ApplyRule(item, ruleElement))
 141:                 modified = true;
 142:         }
 143:  
 144:         if (modified)
 145:         {
 146:             // The rules resulted in modified data so update the item if not in test mode.
 147:             if (!testMode)
 148:                 item.SystemUpdate();
 149:             modificationCount++;
 150:             Log("Progress: Item ID {0} was modified.", item.ID.ToString());
 151:         }
 152:         else
 153:         {
 154:             // There were no modifications made
 155:             Log("Progress: Item ID {0} was NOT modified.", item.ID.ToString());
 156:         }
 157:  
 158:         Log("Progress: Finished Processing item {0}\r\n\r\n", item.ID.ToString());
 159:  
 160:     }
 161:     Log("Finished processing items.  {0} out of {1} items were modified.\r\n", modificationCount.ToString(), items.Count.ToString());
 162:  
 163: }
 164:  
 165: /// <summary>
 166: /// Applies the rule.
 167: /// </summary>
 168: /// <param name="item">The item.</param>
 169: /// <param name="ruleElement">The rule element.</param>
 170: /// <returns></returns>
 171: private static bool ApplyRule(SPListItem item, XmlElement ruleElement)
 172: {
 173:     bool modified = false;
 174:     string ruleName = ruleElement.GetAttribute("Name");
 175:  
 176:     XmlElement matchElement = (XmlElement)ruleElement.SelectSingleNode("Match");
 177:     bool isMatch = true;
 178:             
 179:     // The match element is optional and just provides some additional regular expression filtering beyond what the Query element can provide
 180:     if (matchElement != null)
 181:     {
 182:         bool isAnd = true;
 183:         if (matchElement.HasAttribute("Op"))
 184:             isAnd = matchElement.GetAttribute("Op").ToLowerInvariant() == "and";
 185:         // For "And" operations we default our starter item to true as everything must come back as true to be a match
 186:         // For "Or" operations we default our starter item to false as we only need one item to come back as true to 
 187:         // be a match and we don't want that one item to be the starter item.
 188:         bool fieldMatches = isAnd;
 189:                 
 190:         // If we have a Match element then we need at least one Field element otherwise what's the point.
 191:         if (matchElement.SelectNodes("Field").Count == 0)
 192:             throw new SPException("Missing \"Field\" node(s) which should be a child of the \"Match\" node.");
 193:  
 194:         foreach (XmlElement fieldElement in matchElement.SelectNodes("Field"))
 195:         {
 196:             // The Field element needs a Name attribute and a value to use as the search pattern string
 197:             if (!fieldElement.HasAttribute("Name"))
 198:                 throw new SPException("Missing \"Name\" attribute of \"Field\" node.");
 199:             if (string.IsNullOrEmpty(fieldElement.InnerText.Trim()))
 200:                 throw new SPException(string.Format("Missing search pattern string value for match field '{0}'", fieldElement.GetAttribute("Name")));
 201:  
 202:             // We use the internal name for all field names
 203:             SPField field = item.Fields.GetFieldByInternalName(fieldElement.GetAttribute("Name"));
 204:  
 205:             // Determine if we have a match for this field.
 206:             bool fieldMatch = Regex.IsMatch(item[field.Id].ToString(), fieldElement.InnerText);
 207:  
 208:             // Apply the match results to our fieldMatches variable to track the overall result
 209:             if (isAnd)
 210:                 fieldMatches = fieldMatches && fieldMatch;
 211:             else
 212:                 fieldMatches = fieldMatches || fieldMatch;
 213:         }
 214:         // Set the overall result
 215:         isMatch = fieldMatches;
 216:     }
 217:     if (!isMatch)
 218:     {
 219:         Log("Progress: Unable to find match for rule '{0}'.", ruleName);
 220:         return modified; // No match so evaluate the next rule
 221:     }
 222:     else
 223:         Log("Progress: Found match for rule '{0}'.", ruleName);
 224:  
 225:     // Every Rule element must have one and only one Set element
 226:     XmlElement setElement = (XmlElement) ruleElement.SelectSingleNode("Set");
 227:     if (setElement == null)
 228:         throw new SPException("Missing \"Set\" node.");
 229:  
 230:     // Every Set element must have at least one Field element
 231:     if (setElement.SelectNodes("Field").Count == 0)
 232:         throw new SPException("Missing \"Field\" node(s) which should be a child of the \"Set\" node.");
 233:  
 234:     // Loop through all the Field elements and apply the indicated values
 235:     foreach (XmlElement fieldElement in setElement.SelectNodes("Field"))
 236:     {
 237:         // Every Field element must have a Name attribute - the value can be empty which is the same as setting the field to null.
 238:         if (!fieldElement.HasAttribute("Name"))
 239:             throw new SPException("Missing \"Name\" attribute of \"Field\" node.");
 240:  
 241:         string fieldName = fieldElement.GetAttribute("Name");
 242:         string fieldData = fieldElement.InnerText;
 243:         SPField field = item.Fields.GetFieldByInternalName(fieldName);
 244:  
 245:         if (field.ReadOnlyField)
 246:         {
 247:             // We can't update read-only fields so log a warning and move on.
 248:             Log("WARNING: Field '{0}' is read only and will not be updated.", EventLogEntryType.Warning, field.InternalName);
 249:             continue;
 250:         }
 251:  
 252:         if (field.Type == SPFieldType.Computed)
 253:         {
 254:             // We can't update computed fields so log a warning and move on.
 255:             Log("Progress: Field '{0}' is a computed column and will not be updated.", EventLogEntryType.Warning, field.InternalName);
 256:             continue;
 257:         }
 258:         // If a SearchPattern attribute was provided then do a regular expression replace instead of just a straight up set.
 259:         if (fieldElement.HasAttribute("SearchPattern"))
 260:         {
 261:             if (string.IsNullOrEmpty(fieldElement.GetAttribute("SearchPattern")))
 262:                 throw new SPException(string.Format("SearchPattern attribute of Field node '{0}' is empty.", fieldName));
 263:             
 264:             if (item[field.Id] == null)
 265:             {
 266:                 // We can't do a regex on a null value so move on
 267:                 Log("Progress: Value of field '{0}' is 'null' - no replace operation will be performed.", field.InternalName);
 268:                 continue;
 269:             }
 270:             else
 271:                 fieldData = Regex.Replace(item[field.Id].ToString(), fieldElement.GetAttribute("SearchPattern"), fieldData);
 272:         }
 273:         // If the fieldData is empty then make sure it's set to null
 274:         if (string.IsNullOrEmpty(fieldData))
 275:             fieldData = null;
 276:  
 277:         
 278:         if (item[field.Id] == null || item[field.Id].ToString() != fieldData)
 279:         {
 280:             // The modified field data is different from the source so go ahead and apply the change
 281:             Log("Progress: Applying modification to field '{0}' per rule '{1}'", fieldName, ruleName);
 282:             if (field.Type == SPFieldType.URL)
 283:                 item[field.Id] = new SPFieldUrlValue(fieldData);
 284:             else
 285:                 item[field.Id] = fieldData;
 286:  
 287:             modified = true;
 288:         }
 289:         else
 290:         {
 291:             Log("Progress: No change required for field '{0}' per rule '{1}'.", fieldName, ruleName);
 292:         }
 293:     }
 294:     if (!modified)
 295:         Log("Progress: Set rules resulted in no change from existing data for rule '{0}'.", ruleName);
 296:  
 297:     return modified;
 298: }

The core thing to understand with this command is the structure of the input folder and this where things get a little more complicated.  I don't currently have an XSD for this (I may create one to aid in validation but I just didn't have the time).  So failing a good XSD here's a reasonably detailed example XML file with comments:

   1: <MetaData>
   2:     <!-- Query is an optional CAML element and is used to filter the items that are to be considered.  Anything you can do with a standard CAML Query element you can put here (be sure to include the namespace attribute) -->
   3:     <Query xmlns="http://schemas.microsoft.com/sharepoint/">
   4:         <Where>
   5:             <BeginsWith>
   6:                 <FieldRef Name="FileRef" />
   7:                 <Value Type="string">/Documents/Sub-Folder1/</Value>
   8:             </BeginsWith>
   9:         </Where>
  10:     </Query>
  11:     <!-- There must be at least one Rule element - multiple elements are processed in the order they appear -->
  12:     <!-- The Rule element may contain an optional Name attribute which is a simple label used for logging -->
  13:     <Rule Name="Set Content Type">
  14:         <!-- Every Rule element must have one and only one Set element -->
  15:         <Set>
  16:             <!-- The Set element must contain one or more Field elements -->
  17:             <!-- The Field element must have a Name attribute which corresponds to the fields internal name -->
  18:             <!-- The value of the Field element is what will be set to the list item for that field -->
  19:             <!-- A Field element may contain an optional SearchPattern attribute which can be used to update an existing value via a Regex.Replace() call -->
  20:             <!-- If no SearchPattern attribute is present then existing data is ignored -->
  21:             <Field Name="ContentType">Dublin Core Columns</Field>
  22:         </Set>
  23:     </Rule>
  24:     <Rule Name="Set English Language">
  25:         <!-- A Rule element can contain one optional Match element which is used to provide regular expression based filtering -->
  26:         <!-- The Match element can contain an optional Op attribute used to indicate whether the match logic is "AND" or "OR" (default is "AND" if not present) -->
  27:         <Match Op="OR">
  28:             <!-- The Field element must have a Name attribute which corresponds to the fields internal name -->
  29:             <!-- The value of the Field element is used in a Regex.IsMatch() call to determine whether the item should be processed -->
  30:             <Field Name="FileLeafRef">(?i:.* Eng.*|.*ENGLISH ONLY.*|.*-EN.*)</Field>
  31:             <Field Name="Title">(?i:.* Eng.*|.*ENGLISH ONLY.*|.*-EN.*)</Field>
  32:         </Match>
  33:         <Set>
  34:             <Field Name="FileLeafRef" SearchPattern="(?i: -?Eng|ENGLISH ONLY)|-EN">-English</Field>
  35:             <Field Name="Language">English</Field>
  36:         </Set>
  37:     </Rule>
  38:     <Rule Name="Set Korean Language">
  39:         <Match Op="And">
  40:             <Field Name="FileLeafRef">(?i:.* Kor.*|.*KOREAN ONLY.*|.*-KO.*)</Field>
  41:         </Match>
  42:         <Set>
  43:             <Field Name="FileLeafRef" SearchPattern="(?i: -?Kor|KOREAN ONLY)|-KO">-Korean</Field>
  44:             <Field Name="Language">Korean</Field>
  45:         </Set>
  46:     </Rule>
  47: </MetaData>

Note that I don't claim to be a regular expression expert and I've not extensively tested the regular expressions in the examples above and I know that there are issues with them for more complex data but for the purpose of a simple demonstration they do well enough.  The example above will return back all documents in the folder "/documents/sub-folder1" and will set the content type of every item to "Dublin Core Columns".  It will then standardize the name of the file (FileLeafRef) so that it only contains "*-English" or "*-Korean" using information in the filename and it will also set the Language field to English or Korean using this same information.

Probably the most important thing to remember when constructing your XML is that you need to use the internal field name and not the display name.

You can also do additional filtering using the command line parameters by restricting whether folders are recursed and by specifying a sub-folder instead of a root list folder.  The syntax of the command can be seen below:

C:\>stsadm -help gl-setmetadata

stsadm -o gl-setmetadata

Updates list field values based on the rules defined in the provided input file.  Use -test to verify your updates before executing.

Parameters:
        -url <list folder url>
        -inputfile <input file containing meta data rules>
        [-recursefolders]
        [-quiet]
        [-test]
        [-logfile <log file>]

Here's an example of how you would execute this command using the XML shown above as an input:

stsadm -o gl-setmetadata -url http://portal/documents -inputfile c:\metadata.xml -recursefolders -logfile c:\metadata.log

Like many of my commands that do batch updating you can run this command in a test mode by passing in a "-test" parameter.

5 comments:

Anonymous said...

For "Choice" fields do we use the id or name? I keep getting a "Value does not fall within the expected range." error. Thanks!

Gary Lapointe said...

Choice fields are going to have the names separated by "#;". Lookup fields will be "ID#;Name". Not sure if this helps or not.

Luke said...

I'm trying to find if you have a tool to export metadata. Do you have any tool that does that?

Gary Lapointe said...

The gl-exportlistitem2 command might work for you.

Anonymous said...

Hi Gary,

I have used this command few days ago and I have found a problem for the query. OuterXml does not work with the Xml file you give so I have modified your code to call InnerXml property instead.

Please do you confirm this problem ?

thanks