Building a Private Cloud – Ganeti & CEPH
(This is part 1 of an ongoing series exploring IT at the BC Libraries Cooperative.)
The Co-op is largely a Software-as-a-Service (SaaS) operation. Sitka, LibPress, NNELS – all of these are online offerings that libraries and their patrons make use of over the network without having to maintain their own versions. And like other SaaS operations, the Co-op depends on a number of critical technologies. In addition to the obvious one, the network, we rely on virtualization and cloud computing to be able to deliver these services at scale.
What is the Cloud and why is ownership important to civic entities?
Many major SaaS companies make use of commercial cloud offerings such as Amazon’s ec2, Google Compute Engine, or Microsoft’s Azure. These systems offer incredibly inexpensive virtual machine hosting and storage without the need for costly capital expenditures, allowing companies to focus on the core value offerings, namely the software and its support services.
One major reason the Co-op avoided this “cloud route” was that such offerings did not comply with BC’s FIPPA legislation, as they were typically based in the United States.
While this reality is changing with an increasing number of Canadian firms and US-based companies now hosting services on Canadian soil, there are other major reasons the Co-op continues to rely on a private cloud solution which we have built and now own. By using commodity hardware and open-source software, not only is our private cloud consistently cost-competitive with what we’re seeing in the marketplace, it is fundamentally owned and democratically governed by all members, a sovereignty we believe is critical for civic-run institutions such as the Co-op.
How does the Coop provision its private cloud
But what’s actually under the hood? Well, there are four critical pieces that make up the Co-op’s private cloud offering. The first is that we use Linux (specifically Ubuntu) everywhere. This makes the second piece – the use of Kernel-based Virtual Machines to virtualize these servers – even easier.
So far, so common: this is pretty standard stuff. Where we may differ from some other institutions is in our use of Ganeti. Ganeti is a virtual machine cluster management tool (or Kernal-based Virtual Machine, KVM) developed by Google, and it offers a convenient way for us to manage our KVM images, of which there are currently about 60 in play at the Co-op. Also, for both block and object storage, we use CEPH. CEPH is another open-source technology that provides large-scale, fault-tolerant storage across potentially distributed servers, offering an implementation of Amazon’s S3 interface that we employ with both the NNELS.CA and Library Toolshed projects for the backend storage of resources.
There are a lot of other pieces that go into a hosting environment including networking & security, backups and disaster recovery elements, and a host of other smaller technologies. In sticking with our open philosophy and practice, the Co-op uses open-source technologies almost exclusively for all of these pieces. They aren’t magic bullets – every technology choice has its tradeoffs – but to date we’ve been able to create a hosting environment serving hundreds of libraries with over 99.8% uptime, and at a very reasonable cost and with full control of our data.