This is part three of a thrilling series of entries related to the aggregation of SharePoint content. It relates back to reference architecture #7, RBS vs. EBS vs. Content Transfer vs. Shortcuts and An Overview of the Potential Solutions. In the previous entry I rambled on ab out the different options available to support the aggregation of data behind a SharePoint deployment however I left you hanging on for more about two recently added technologies. In this entry I’ll go in to more details about these sexy new options, why you might care and the pros and cons of each. I'll do my best to be technically accurate but bear in mind that I am a manager now so just stringing together non-monosyllabic works is difficult enough for me. My main concern is not the actual implementation details; I am more interested in whether either solution might deal with the inherent business problems.
RBS and EBS Review
The Problem...SharePoint stores everything related to an object in SQL Server. This includes the:
Content, (PDFs, PPTs, Zip files, etc.),
Metadata, (the object's title, project number, format, etc.)
Context, (which site it came from, the folder location, security details, etc.)
#2 and #3 belong in SQL Server because they are represented by structured content. Storing #1 in a database is a travesty of the highest order. I've seen system architects tarred and feathered for doing this. Why? Databases excel at managing lots of ickle bits of data but they suck when it comes to managing large binary objects, (called BLOBs - Binary Large OBjects). Go read more about these issues in the Eight Reference Architectures series. I have seen estimates that suggest that up to 96 petabytes of data will be archived from SharePoint instances over the next 5 years - that's 96 quadrillion bytes of data...not in to a database me thinks!
The solution...Bottom line, you have to get the binary objects out of the database. Not necessarily out of SharePoint but out of SQL Server. In the previous entry I mentioned 5 ways of doing this but did not dig in to options #4 and #5 - RBS and EBS.
RBS and EBS are both pretty new technologies. Given that they have very similar names it is not surprising that people get them confused so here's a primer:
RBS is implemented by SQL Server (only SQL Server 2008 and later); it is nothing to do with SharePoint directly. When you enable RBS, all BLOB streams that SQL Server would normally be compelled to store internally are spewed forth to the file system.
EBS is implemented by MOSS 2007 (available as a hot fix to MOSS 2007 SP1 and later). The EBS provider lives at the very bottom of the SharePoint stack, just above the interface in to SQL Server. Just before the BLOB is passed to SQL server the EBS provider gives your process the opportunity to optionally take ownership of the BLOB. You give SharePoint a token in exchange so it knows how to get the object back from you at a later date.
RBS vs. EBS…
There are pros and cons to both approaches and the balance will change over time according to the SharePoint product plans that we know of. Let me spoil the ending for you…I’d recommend EBS today but RBS later as it matures. Here’s the rationale:
Remote BLOB Storage (RBS)
- RBS is implemented in SQL Server and is application agnostic. That’s to say, if you turn RBS on then all BLOB objects from any SQL Server-based application will be externalized. If that’s what you want to happen then that’s great but if you need to be able to apply business logic to what is externalized and whence it goes then you are severely restricted.
- It is simple – you turn RBS on and the content is simply stored on to the local file system. If you have some kind of file system virtualization software in place then you can do some basic management tasks but only based on the file system attributes of the object.
- If you want access to the context and metadata of the object then you are going to have to dip in to SQL Server and start hunting down SharePoint based reference information; Microsoft do not recommend this - in fact they do not publicly publish the DB schema for SharePoint so it would be potentially dangerous.
- The current thinking is that RBS might have more longevity than EBS. It is likely that EBS will fade out of the stack over time – obviously this is not 100% certain but likely.
- Getting the content out of SQL Server only solves 5% of the real issues according to 9 of my 10 personalities. Seriously, getting the BLOBs out of SQL Server gives you scalability but it does not deliver any of the IT efficiencies, compliance overlays, or re-purpose/re-use benefits of managing the externalized content.
- Intelligent archiving is the key to getting this right. You need to have the BLOB, the metadata, the context and the ability to manage the object – no less than this. The RBS model only provides the BLOB – no context and no ability to manage the object.
- No business rule mapping…RBS is all or nothing – you get all BLOBs all of the time. EBS is not much better but does support certain rules. For example, in theory you could configure EBS to not externalize content from certain sites or content less than 50KB in size.
- Needs SQL Server 2008 – not a huge deal but a consideration.
External BLOB Storage (EBS)
- EBS is provided by the SharePoint team and although it is lacking in some areas it does understand the context of the BLOB that it exposes. In other words, we do know what the BLOB object is and we can track changes/deletes on the object.
- The architecture allows is to provide an intelligent process for capturing the BLOB and just as importantly for returning the BLOB on demand, (i.e. when you want to view it from SharePoint).
- Because we are interacting directly with the SharePoint processes we can perform more intelligent operations. For example, if the BLOB was deleted (with good reason) from the store then we could cascade that delete back up to SharePoint. Same with changes to the object or its status.
- It does not require SQL Server 2008.
- There are a lot of areas where I would improve EBS but for what we are doing at this point in time the only con is that EBS will probably not survive in the long term. For what it is worth, we have worked with Microsoft to ensure that a transition to RBS in the future would be seamless.
The Bottom Line
The fact that Microsoft have provided mechanisms to allow for partners to hook in to the underlying storage capabilities of SharePoint is testament to the fact that Microsoft recognize the value that other companies can add to SharePoint. I am often asked whether Microsoft might not just add all of the capabilities of a classic ECM solution to SharePoint - obviously they could but take it from me, they'd be better off focusing on usability, integrations, information worker productivity efficiencies and nailing the Office integrations - that's their sweet spot. It took us 15 years to build up the suite of ECM functionality that you see today and it was painful!
So what's next?
Not surprisingly we have a set of products that leverage all of the pros of this new architecture and that have been designed to add all of the benefits of classic ECM without taking away anything from the SharePoint user experience. Contact me if you need more information under NDA.