RBS BlobId Columns in Application Tables - Registering and Indexing

I recently got a question on RBS BlobId columns needing to have an index. This blog post expands on the answer. Feel free to send us any questions using either direct email, comments on this blog or the "Discussions" tab on the RBS Codeplex site.

Registering and Unregistering

An application needs to register its RBS BlobId columns by calling the stored proc rbs_sp_register_column, once for each BlobId column. This is a required step for all RBS applications and RBS behavior is undefined if this is not done. Similarly, an RBS BlobId column can be un-registered by calling rbs_sp_unregister_column. This stored proc should be called before dropping/renaming a registered column or dropping/renaming the table containing a registered column. This must be done even if the intent is to create another table/column with the same name soon after dropping them. This is because RBS stores the object_id and column_id, and these will be different for a new table even if the names are the same. The view rbs_columns shows the list of currently registered columns.

Indexing

RBS BlobId columns in application tables should be indexed - for each BlobId column there should be a separate index with that column as the only index key column. If such an index is not present for a column, the stored proc rbs_sp_register_column gives a warning (raises error with level 10): Column < ColumnName > in Table < SchemaName.TableName > is not indexed.

This index is needed for garbage collection to work efficiently. The Reference Scan phase of garbage collection does a diff: (BlobIds RBS knows about) EXCEPT (BlobIds referenced by application tables). The resulting list of Blobs are the ones that were deleted by the application and should be garbage collected. Since the number of blobs can be very large, this diff is done for a few blobs at a time. This translates to multiple range queries. For such a query to be efficient, we need indices on all BlobId columns.

Note that garbage collection will still work even if the index is not present, but it will be terribly slow on large tables. So, it is highly recommended that an index is created on RBS BlobId columns.

 

- Pradeep

Comments

  • Anonymous
    March 27, 2010
    Pradeep Thanks for the posts. Just a question. Is this post suggesting that the index should be manually set or is it set by default?

  • Anonymous
    March 28, 2010
    It is the responsibility of the application to create the index. Put another way, whoever calls rbs_sp_register_column, it is their responsibility to create the index before calling the SP. This is typically the case if you are writing a custom app or a packaged app yourself. If you are using an app written by somebody else, you need to consult with them to make any changes to the database. If the index is not already present, they should either update their product or give you a reason why they did not index it already. If you are using SharePoint, you are not allowed to modify the ContentDB (e.g. add/change indexes). SharePoint should create the index for you.