Feb 11, 2016

Deploying MongoDB Replica Sets with Docker

I recently set up a 3 Node MongoDB cluster replica set with each Node running MongoDB as a Docker image as the first step to get the same scenario working in Kubernetes. It wasn't very intuitive so I thought I'd write it up so that I could share the recipe.

I started by deploying 3 EC2 instances running Fedora 23 (ami-7b8afa11) and Docker 1.9.1 (the docker version that ships with F23). Make sure your security group has an inbound rule for TCP port 27017 as this is required for MongoDB

On each node, do the following:
1. docker pull mongo 
This pulls the official MongoDB image from the Docker Hub
2. mkdir /mnt/mongo 
This creates a directory on the Host which we will use to persist the Mongo container instances database state. This allows the container to be stopped and restarted without losing any data.
3. chcon -Rt svirt_sandbox_file_t /mnt/mongo
This sets the SELinux label so that the container has permission to access the /mnt/mongo directory on the host.
4. docker run --net=host -d -v /mnt/mongo:/data/db mongo --replSet rs0
This starts the Mongo instance in a replica set called rs0, mounts your /mnt/mongo directory from the host into the container and binds the host's IP to the container instance.

Designate one of your nodes to be your Replica Set Primary, and do the following on that node only:

1. Docker exec into the node: i.e. docker ps, get the container ID and exec in using that ID
[root@node-1] $ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED
85690249e5b3        mongo               "/entrypoint.sh --rep"   32 minutes ago

[root@node-1] $ docker exec -it 85690249e5b3 bash

2. Run "mongo" at the container prompt
root@ip-172-31-148-167:/# mongo
MongoDB shell version: 3.2.1
connecting to: test
Welcome to the MongoDB shell.

3. Enter the following command to initiate the replica set
> rs.initiate()
 "info2" : "no configuration specified. Using a default configuration for the set",
 "me" : "ip-172-31-148-167.ec2.internal:27017",
 "ok" : 1

4. Enter the following to validate the replica set configuration
> rs.conf()

5. Then add the FQDNs of the other 2 EC2 Nodes that you started the MongoD Docker Image on:
> rs.add("ip-172-31-148-218.ec2.internal")
> rs.add("ip-172-31-159-148.ec2.internal")

6. Check the status of Replica Set
> rs.status()

7. Congrats. That's it. You can now import data and test your Replica Set. Keep in mind that only the Primary Node can handle writes but all the other members of the replica set (the secondary nodes) can do reads.

If you want to test the read functionality of the replica set on a secondary node, docker exec into the container on that node, run mongo at the prompt and run the command below before you attempt to run any queries (reads).
rs0:SECONDARY> rs.slaveOk()

Sep 9, 2015

Running Storage Platforms inside Kubernetes

I thought it would be interesting to explore a converged scenario where Kubernetes was running the containers for the compute runtimes as well as the containers for the storage runtimes. This is useful in that it allows you to have a single infrastructure for the full stack. Given that Container Orchestration is an emerging technology, a lot of folks just want to set up a simple POC where they can build some applications and get a feel for how it all works. Your POC installation may not have access to managed storage infrastructure and thus its handy to be able to deploy your own storage solution within the same cluster. This is also useful for analytics if you are looking to collocate compute and storage for improved I/O performance.

I thought I'd write up what I did to get it working given that its a new scenario for Kubernetes and also uses some aspects of Kubernetes that aren't documented in much detail.

You can see the full walkthrough here on github.

Jul 26, 2015

Manage Disruption by Building Your Own Emerging Technologies Function - Part I

CC flickr image courtesy of @boegh

One hears a lot about disruptive theory and while the data and examples of disruption are compelling, there aren’t a lot of concrete examples about what to do if you’re an incumbent looking to proactively manage the impact of disruption to your business. I have worked in and led Emerging Technology teams at IBM, HP and Red Hat and based on my experiences I’ll discuss both why and how to build a function within your company (an Emerging Technologies team or CTO Office) to manage the impact of disruption. I’ll be doing this across several blog posts. It will be useful if you are somewhat familiar with disruptive theory, diffusion theory and chasm theory but its not required. This first post is focused on why an Incumbent should have an Emerging technologies team. 

The Lifecycle of Creative Destruction

Creative Destruction describes a phenomenon whereby a company or a technology that delivers an improved benefit, or performance, displaces a previous company or technology.  Examining the rate of Creative Destruction is interesting in that it can give Incumbents (successful, established companies) a quantitative metric about their susceptibility to competition. A popular example to illustrate this point is the average duration of a company within the S&P 500 Index. A company entering the index in 1935 was able to maintain its position in the index for an average of 90 years, whereby a company entering the index in 2005 is forecasted to only remain there for 15 years. Clearly, this trend is accelerating which suggests that Incumbents are becoming increasingly susceptible to competition. Disruption is a common cause of Creative Destruction and thus learning how to successfully manage the impact of disruption is something that every incumbent should have in their playbook.

Year Entered S&P 500  Average Duration until Exit
193590 Years
195555 Years
197530 Years
199522 Years
200515 Years
Data courtesy of Ralph Katz, Professor at MIT Sloan School of Management

Managing your competition

Most incumbents have sophisticated product management and marketing functions that keep a close eye on the competition, so why is this not enough?

To better explore this question, lets start with a scenario. I am an incumbent. I have a company and it is very successful. I have a product that provides a benefit, using a particular technology, and there is a market that is willing to pay for the benefit. In terms of diffusion theory, I'm in a particular technology S-Curve. I have competitors, but we are all within the same S-Curve, as we’re all using the same core technology to provide the benefit. For simplicity, lets assume both my own and my competitors' sales teams are equally competent and successful and that we both have wide market reach.

In order to differentiate myself from these competitors, I need to build a better product. To do this, Product Management and Marketing meets with customers and builds a multi-year product roadmap with the belief that as long as engineering can deliver on the roadmap, on the intended timeline, my company will stay differentiated and remain the market leader and the dominant vendor within my S-Curve.  Engineering Plans are created, streamlined and staffed according to this roadmap. The Sales team find that the differentiation is appreciated by their customers and the product or service is selling well. Per innovation theory, this model is known as sustaining or continuous innovation. 

Enter the disruption - Lately, I’ve started noticing that new and non-traditional competitors have emerged that are addressing the same opportunities in the market, but in a different manner using a different technology. A new S-Curve has emerged creating a discontinuous or disruptive innovation. This new technology has demonstrable performance advantages over the technology my product is based on. The companies commercializing this technology begin moving upmarket, resulting in a significant loss of my market share and revenue.  

Threats and Opportunities that fly under your radar

In the scenario I described above, the Incumbent was well equipped in their ability to deliver on continuous and sustaining innovation models, but the competitive analysis functions within those teams were not equipped or skilled in identifying, managing and responding to disruptive threats and opportunities. Why were they ill-equipped? There are a number of organizational and personal factors that prevent teams involved in sustaining innovations to successfully respond to threats and opportunities around disruptive technology. While Christensen’s disruptive theory describes these factors in much more detail, here are a few that I run into fairly regularly:

Bias 1 - Optics. This is the first time you’ve ever encountered competition from a technology in a different S-Curve. Previously you’ve only dealt with competition from the same S-Curve using the same core technologies as you. These new competitors don’t look at all like the competitors you usually deal with. "I mean, come on, they’re a startup and they’re only 50 people.  We’re a company thats 100x or 1000x that size. How on earth are they getting into our accounts?"

Bias 2 - Pride. I’m the head of the product team at the incumbent and I know this market inside out. I’m right, they are wrong and they’ll go out of business.

Bias 3 - Sticking your head in the sand. If I admit we’re being disrupted, the consequences are too painful. I’m just going to hope this goes away. As an incumbent, I have millions of dollars (or hundreds of millions) and years of investment in the roadmap based on our current technology strategy. Going back to the drawing board and changing my strategy, product roadmaps and engineering plans to adjust to the new technology will be extremely inconvenient and costly, for everyone. 

Bias 4 - Efficiency - Those customers don’t matter. Per Christensen’s disruptive theory, disruptive Innovations enter the sectors of the market in sectors which the incumbent traditionally sees as low margin and then diffuses upmarket from there. The incumbent is not incentivized to defend these low margin sectors, and to quote, "Moving up the trajectory into successively higher-margin tiers of the market and shedding less-profitable products at the low end is something that all good managers must do in order to keep their margins strong and their stock price healthy. This ultimately means that in doing what they must do, every company prepares the way for its own disruption."

Bias 5 - Incentives - I’m not interested in this battle. I see the disruption but Im just a cog in the plans for the current strategy and I’m too busy executing on that. My annual performance review is measured against how well I execute on the current strategy. I don’t have the time or energy to become a champion for revisiting the strategy. 

What can you do to manage the impact of disruption to your company?

The first thing is that you have to come to grips with the fact that disruption is inevitable. If you concede this point, you have to decide whether you want to anticipate and manage disruption or just roll the dice and attempt an ad-hoc response. If you want to anticipate, identify and manage it, it helps to have a framework and a function with your company that is skilled at handling this. The issues around motivation and incentive I described above are key. If you want a response to disruption to be successful (or even happen in the first place) you need to put in the appropriate structures that will increase the chances of success beforehand. In my subsequent blog posts I will describe how to build an emerging technologies function with your company that can identify and manage disruption ahead of your own product teams and your competitors.

Jul 22, 2015

Running Apache Spark in Kubernetes

Yesterday, the Kubernetes community officially launched version 1.0 at OSCON. This is a pretty big milestone for us and version 1.0 offers a lot of very useful features for those looking for a container orchestration solution. I thought a good way to demonstrate some of the cool features that 1.0 provides is to show how easily and simply Apache Spark can be deployed in Kubernetes and connected to a variety of network storage providers to build analytical applications.

Thanks to Matt Farrellee and Tim St. Clair, Red Hat Emerging Technologies have already contributed a set of Pods, ReplicationControllers and Services to run Apache Spark in Kubernetes. This can be found in the github repo under examples/spark. To deploy Spark, one just needs a Spark Master Pod, a Spark Master Service and  a Spark Worker Replication Controller. 

However, the current solution is not configured to mount storage of any kind into the Spark Master or Spark Workers so it can be a little difficult to use to analyze data. The good news is that Red Hat Emerging Technologies are also actively contributing towards Kubernetes Volume Plugins which allow Pods to declaratively mount various kinds of network storage directly into a given container. This means that you can connect your Spark Containers (or any containers for that matter) to a variety of network storage using one of the Kubernetes Volume Plugins. To date, we have presently contributed Volume Plugins for Ceph, GlusterFS, ISCSI, NFS  (incl. NFS with NetApp and NFS with GlusterFS) and validated GCE Persistent Disks with SELinux and Amazon EBS Disks with SELinux, all in Kubernetes version 1.0. We also have FibreChannel, Cinder and Manila Kubernetes Volume Plugins in the works.

I've provided a demo below that shows how to run Apache Spark against data mounted using a Kubernetes Volume Plugin. Given that Apache Spark is typically used in conjunction with a Distributed File System, I've used the GlusterFS Volume Plugin as the exemplar.  I have a Pull Request submitted to Merge this example into Kubernetes but we’ve temporarily frozen Kubernetes in anticipation of our version 1.0 launch. In the interim, you can follow the guide off of my personal branch. The video below provides a walkthrough of the solution. 

May 5, 2015

Using ceph-ansible to deploy and test a multi-node ceph cluster

Our team is presently working on building a Ceph Block Volume Plugin for Kubernetes. As such, I wanted a quick and easy way for everyone to be able to deploy a local Ceph Cluster in Virtual Machines so we can test the plugins ability to provision a Ceph Block Device and mount it to a given Docker Host (or Kubernetes Node) from a development environment or another virtual machine.

After spending a few days trying to find the most convenient solution, I settled on ceph-ansible as it uses a combination of vagrant (to provision the VMs) and ansible (to configure them) and the entire cluster is launched with literally one command (vagrant up). So here's how it works:

On the Developer's Machine:

1) Install Vagrant and install your Vagrant compatible Hypervisor of choice (I use VirtualBox as I find it has the broadest vagrant box support and I can't use KVM because I am on a Mac)

2) Install Ansible

3) Clone the ceph-ansible repository
  # git clone https://github.com/ceph/ceph-ansible.git
  # cd ceph-ansible

4) Edit the  ceph-ansible/roles/ceph-common/defaults/main.yml and set the following values to “false" in the CEPH CONFIGURATION section.
  cephx_require_signatures: false
  cephx_cluster_require_signatures: false
  cephx_service_require_signatures: false

5) Deploy the Ceph Cluster
  # vagrant up

6) Check the Status of the Ceph Cluster you just deployed
  # vagrant ssh mon0 -c "sudo ceph -s"

7) Copy the ceph configuration file and ceph keyring to each server you plan to mount ceph block devices onto (such as a Fedora 21 server demonstrated in the diagram above).
  # vagrant ssh mon0
  # cd /etc/ceph/
  # sudo scp ceph.client.admin.keyring ceph.conf root@{IP of Fedora VM}:/etc/ceph/

Configuring the Ceph Client (Fedora 21 VM)

This section assumes that you have already provisioned another server to use as the means to create, format and mount a ceph block device onto. In the diagram above, this is the Fedora 21 VM.

1) Install the ceph client libraries
  # yum -y install ceph-common

2) Create this directory or you will see exceptions when using rbd commands
  # mkdir /var/run/ceph/

3) Disable and Stop firewalld
  # systemctl disable firewalld;  systemctl stop firewalld

4) Create a block device called "mydisk"
  # rbd create mydisk --size 4096

5) Map the block device from the server into your local block device list
  # rbd map mydisk --pool rbd --name client.admin

6) Verify that a new block device (rbd0) has been added 
  # ls -l /dev/rbd?
  brw-rw----. 1 root disk 252,  0 May  5 15:34 /dev/rbd0

7) Format the Block Device
  # sudo mkfs.ext4 -m0 /dev/rbd0 

8) Mount the Block Device for use
  # mkdir /mnt/mydisk
  # mount /dev/rbd0 /mnt/mydisk/

Feb 19, 2015

Enabling Docker Volumes and Kubernetes Pods to use NFS

Docker volumes allow you to mount a directory from your Host onto a path within your container. This is great if the data or path that you want to provide to your container is on your host, but what if what you really wanted to do is provide access to your container to a centralized storage location?

In my previous post, I described how one could create a network FUSE mount of a Distributed FileSystem onto the Host of your Docker Container or Kubernetes Pod. This provides a local directory to access your distributed filesystem. One then passes in that local directory as a Docker Volume or a HostDir in your Kubernetes Pod file and your containers have access to a Distributed FileSystem.

I spent some time today getting NFS (which is another common central storage scenario) working using the same Host mount model and I thought I'd share the process I used in case there were other folks interested in setting it up. The diagram below provides an overview of the configuration and the instructions follow after it.

Setting up the NFS Server (Fedora 21)

- Designate the server that that will be storing your data as the NFS Server, such as nfs-1.fed. Identify a path on that server that will be storing the data, such as /opt/data. In this example we'll later serve a file from this location called helloworld.html

- Install the required NFS Packages
# yum -y install nfs-utils

- Create the /etc/exports file which specifies which path is being shared over NFS
# vi /etc/exports
/opt/data *(rw,sync,no_root_squash)

- Start the appropriate services
# systemctl start rpcbind 
# systemctl start nfs-server 
# systemctl enable rpcbind 
# systemctl enable nfs-server

- Flush iptables
# iptables -F

Setting up the Docker Host (Fedora 21)

- Create the directory upon which to mount the NFS Share
# mkdir /mnt/nfs

- Flush iptables
# iptables -F

- Mount the NFS Share onto the local directory
# mount -t nfs nfs-1.fed:/opt/data /mnt/nfs 

- Run a directory listing and make sure that the local NFS mount is working
# ls -l /mnt/nfs

Using the NFS Share with Docker

- Launch a container and pass in the local NFS mount as the Docker volume

# docker run -v /mnt/nfs:/var/www/html/ php:5.6-apache
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using Set the 'ServerName' directive globally to suppress this message
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using Set the 'ServerName' directive globally to suppress this message
[Thu Feb 19 00:06:11.833478 2015] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.10 (Debian) PHP/5.6.5 configured -- resuming normal operations
[Thu Feb 19 00:06:11.833601 2015] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND' - - [19/Feb/2015:00:06:31 +0000] "GET /hello.html HTTP/1.1" 200 282 "-" "curl/7.37.0" - - [19/Feb/2015:00:51:52 +0000] "GET /hello.html HTTP/1.1" 200 282 "-" "curl/7.37.0"

- Verifying that the volume mount worked
# curl
Hello World, this is being served by the NFS Share

Feb 9, 2015

Building Distributed Containerized Applications using Kubernetes and GlusterFS

This post is a follow-on from Building a Simple LAMP Application using Docker

The Kubernetes project originated from Google and is an Apache Licensed platform for managing clustered containerized applications. While the project page provides a lot more detail, in the interest of time, I thought I'd provide a quick summary. Kubernetes is a distributed Master/Worker architecture and calls its workers "Minions". Containerized applications are described and deployed by a Kubernetes Pod. A Pod typically contains one or more containers that are generally intended to reside on the same Minion. Kubernetes makes it easy to ensure that a certain amount of Pod Replicas exist via a runtime component called the ReplicationController. Lastly, Kubernetes ships with its own Load Balancer called a "Service" and this will round robin requests between the POD Replicas that are running in the cluster. 

Lets walk through an actual use case (explicit instructions follow later). If I had a PHP Docker Image and I wanted to deploy it in my Kubernetes Cluster, I would write and submit a JSON or YAML Pod file that describes the intended deployment configuration of my PHP Container. I would then write and submit a JSON or YAML ReplicationController that specifies that I want exactly 2 PHP Pods running at one time and then I would finish by writing and submitting a JSON or YAML Service file that species how I want my PHP POD Replicas load balanced. This use case is demonstrated in the diagram below. Note that on Minion 3 the PHP Pods are not running because I specified in the ReplicationController that I only want 2 PHP Pod Replicas running.

As you may have noticed, this really is a simple architecture. Now that we've covered how to deploy containerized runtimes, lets take a look at what options are available within Kubernetes to gives PODs access to data. At present, the following options are provided:

- The ephemeral storage capacity that is available within the container when it is launched

- EmptyDir, which is temporary scratch space for a container that is provided by the Host

- HostDir, which is a Host directory that you can mount onto a directory in the container.

- GCEPersistentDisk, which are block devices that are made available by the Google Compute Engine Block Service.

Given that using GCE Block Devices is really only something that you would consider if you were running in GCE, this only really leaves the HostDir option as a means to obtain durability for any kind of Kubernetes persistence on premise.

To explore how the HostDir option might be used, lets assume that you want to build the same load balanced, clustered PHP use case. One approach would be to copy the web content you want each PHP container to serve to the same local directory (/data) on each and every single Kubernetes Minion. One would then specify that directory as the HostDir parameter that is mounted onto /var/www/html in the container. This works well, but it swiftly becomes operationally onerous when you have to make updates to the web content as you now have to copy it out to every single minion in the cluster. In this scenario, it would be much easier if you could store the web content in one central place in the cluster and then provide a mount of that central place as the HostDir parameter.

One way to do this is to store the web content in a distributed file system and then mount the distributed file system onto each Minion. To demonstrate this example, we are going to use GlusterFS, which is a POSIX Compliant Distributed Filesystem. This means it looks just like your local ext4 or XFS filesystem to the applications that are using it. The diagram below displays how each Kubernetes Minion (and therefore Docker Host) has its own FUSE mount to the GlusterFS Distributed File System.

Great, so how do I do this?

1) Firstly, you're going to need a GlusterFS volume. If you don’t have one you can build one reasonably quickly by using this vagrant recipe from Jay Vyas.

2) You're going to need a working Kubernetes cluster. If you don't have one, you can follow this tutorial for how to set one up in Fedora.

3) On each Minion within the cluster, create a FUSE mount of the GlusterFS volume by running the following commands (This assumes gluster-1.rhs is the FQDN of a server in your Gluster Storage Pool provisioned by Vagrant and that your volume is called MyVolume):

# mkdir -p /mnt/glusterfs/
# yum install -y glusterfs-fuse
# mount -t glusterfs gluster-1.rhs:/MyVolume /mnt/glusterfs  

4) On a Kubernetes Minion, copy your web content to /mnt/glusterfs/php/ in the Distributed File System. If you just want a simple test you can create a helloworld.html in /mnt/glusterfs/php. Then shell into a different Kubernetes Minion and validate that you can see the file(s) you just created in /mnt/glusterfs/php. If you can see these files, it means that your Gluster volume is properly mounted.

5) Build and Submit a ReplicationController that produces a 2 Node PHP Farm that serves content from the distributed FileSystem by running the following commands on your Kubernetes Master:

# wget https://raw.githubusercontent.com/wattsteve/kubernetes/master/example-apps/php_controller_with_fuse_volume.json
# kubectl create -f php_controller_with_fuse_volume.json

6) Build and Submit the Load Balancer Service for the PHP Farm

# wget https://raw.githubusercontent.com/wattsteve/kubernetes/master/example-apps/php_service.json
# kubectl create -f php_service.json

7) Query the available services to obtain the IP that the Load Balancer Service is running on and submit a web request to test your setup.

# kubetctl get services
NAME                LABELS                                    SELECTOR            IP                  PORT
kubernetes          component=apiserver,provider=kubernetes   <none>           443
kubernetes-ro       component=apiserver,provider=kubernetes   <none>          80
php-master          name=php-master                           name=php_controller         80

# curl

Thanks to Bradley Childs, Huamin Chen, Mark Turansky, Jay Vyas and Tim St. Clair for their help in putting this solution together.