OK, so this entry should be called SharePoint Conference Day Three but to be honest I only attended one session – the rest of the day was taken up with meetings with Microsoft and Partners.
So as a rather sad indictment of the state of my life, the best session of the week for me was "Externalizing BLOB Storage in SharePoint 2010". I was absolutely enthralled with this session. Srini Acharya and Burzin Patel from the Microsoft SQL server advisory team did a very nice job of addressing the concept of RBS and how it fits in to the new SharePoint 2010 architecture.
They started with an overview of the general RBS concepts and also a demo of RBS using Microsoft’s own free file stream provider that just dumps the BLOBs on to a local file server. To be clear, RBS on its own without a provider will not do anything, Microsoft have the aforementioned file stream provider and vendors are being encouraged to develop their own. During the Q&A session they made the point that customers should not write their own providers and that vendors should work with Microsoft on their offerings.
Burzin mentioned that "BLOBs typically account for 60-70% of total content" in the SharePoint content database. This is a lower number than I’ve heard quoted previously although most of the numbers you see quoted are from vendors trying to sell externalization solutions so maybe they cannot be trusted ;-)
EBS vs. RBS
It seems obvious that the team have been making an effort to bring SharePoint awareness in to the RBS layer. They have added an ‘RBS Maintainer’ to sync certain operations between SharePoint and RBS. I think that the name of this component needs some Microsoft attention BTW.
They presented a table outlining the key advantages of RBS over EBS.
- RBS has Managed interface whereas EBS’s is unmanaged.
- The RBS BLOB store scope is per Farm, EBS is per Content DB.
- SharePoint 2010 adds the Configurable Maintainer to sync operations from SharePoint down to RBS.
- RBS also has a PowerShell-based UI within SharePoint, this was pretty rudimentary but again it shows a commitment to the integration.
- RBS supports multiple providers. I think that you will be able to register multiple providers for a farm and then select which ones are used and when – I don;t understand the scope of that decision though, I’d love to be able to enable a provider for site A and a different one for site B, (or even by document library perhaps).
- Best of all…the RBS implementation includes PowerShell-based migration between content stores. Translation? You can move BLOBs from SQL to any RBS provider, from RBS provider to RBS provider and from EBS to RBS… They are looking at a ‘shallow migration’ where the BLOB would stay where it is but the management of it would be migrated (only for already externalized BLOBs). This is a huge thing – one of the senior engineering leads was sitting by me during the session and when this was announced he turned around and gave me a huge grin…given that I’d met with him to discuss RBS earlier in the day it looks like he’d intentionally kept this nugget from me…you know who you are and I will get revenge!!
Issues According to Microsoft
I have to take care when talking about the architectural challenges with storing BLOBs in SQL because I am not in the business of criticizing SharePoint (anymore) so the next section was a blessing for me. So here it is…Microsoft’s assessment of the issues with storing BLOBs in SQL…not mine, Microsoft’s…taken verbatim from their slide!!
- Optimal Capital Expense
- Trade cost effective BLOB storage for expensive SQL Storage
- Ability to group/store BLOB separate from Metadata
- Optimal Operational Expense
- Storage management beyond SQL
- Facilitates cost effective/optimal backup/DR policies for BLOB vs. Metadata
- Take advantage of advanced storage features provided by BLOB store vendors
- Expunge, multiple storage locations, immutable writes
- Guaranteed retention, guaranteed deletion,
- Hierarchical Storage Management
- SQL and layers of BLOB stores offer more savings in CapEx and OpEx
- Efficient access patterns.
They then did a re-cap of where we are with RBS and SharePoint.
- RBS still needs a storage vendor to supply a provider and a BLOB store. OOTB RBS doesn’t do anything without a provider.
- RBS includes a RBS Maintainer and a Provider Library for SharePoint 2010. The maintainer synchronizes BLOB delete/orphan operations between SharePoint and RBS. The library is just that, a list of registered providers.
- All BLOBs in an active content DB will be externalized – the level of selectivity is still binary i.e. all or nothing.
How does it work? Defines and exposes 3 views for interaction"
- Application View
- Interacts with the SharePoint WFE, Provider Library, SQL DB
- Implemented by SharePoint 2010 - transparent to the user
- Administrator View
- PowerShell commandlets - calls set of SPs, function as in SQL
- Installation, configuration, provisioning, RBS maintainer etc.
- Provider View
- Defines an interface that should be implemented but the BLOB storage provider.
The jumped in to a really nicely done demo
- Went to a RBS enabled site
- Opened a PowerShell window
- Showed that the file stream provider was disabled
- Showed a SQL select on the WSS_Content table. There were 73 results but they all had RbsId of null so none of the objects were externalized.
- Imported an image
- Re-ran the query and saw 74 rows – still nothing with a RbsId.
- Enabled the RBS provider
- Added another image
- Re-ran the query
- 75 rows but one of the rows now has an RbsId
- Went to the local file system on the C drive
- Document was on the file system with an object name of the GUID (he opened it to show that it was the same file).
It didn’t look like much but for those customers struggling with SQL bloat this was manna from heaven.
Next they went on to describe the RBS Maintainer module. This is the piece that creates an element of awareness between SharePoint and RBS. Here are the key points:
- It tracks the deletions in SharePoint and pushes them to the BLOB store. Interestingly it includes the ability to set up rules to delay the delete from the BLOB store to make it easier to manage restores – if the BLOB is still in the external store when you restore SQL then you are probably in good shape.
- Does the clean-up of BLOBs for failed transactions - orphaned BLOBs. This and the line above are basically garbage collection management.
- Can be on the DB server or separate box
- Can be scheduled to run periodically – I guess it is a batch job rather than an inline process.
They went on to discuss How to Backup works. I think they oversimplified this section – my advice would be to think carefully about your chosen backup method. Here was their advice:
- Start SQL Backup first
- Then backup BLOB store
- First restore BLOBs
- Then Restore SQL
The team showed some performance data that I found questionable – it showed that externalization of data did not have a negative effect on the performance of SPP. I suspect that this was all they wanted to show but IMHO their data sample contained files that were too small and I’m guessing there were not suffering from the fragmentation issues that I mentioned in my earlier Blog post. I’m confident that if they used a more realistic sample they’d see an improvement in performance (reads and writes) especially directly to an equivalent file system.
There were a large number of follow-up questions in the session. Many of them were about whether RBS would make it easier to get existing file system data in to SharePoint… I’ll reserve my comments on that for a later entry I think.
Again, Srini and Burzin did a stellar job explaining this abstract concept to a packed audience.