Sunday, August 10, 2014

Cluster management tools

Cluster management tools provide abstractions to run software applications on a collection of hardware (physical machines or VMs in a cloud). The tools allow you to declaratively specify the resources - e.g. tasks, services - required by the application. The mapping of resources to processes on the hardware is the responsibility of the tools.

Kubernetes

Kubernetes currently provides the following abstractions: pods, replication controllers and services. Pods are collections of containers that are akin to application virtual hosts. Services are used to setup proxies pointing to pods. The pods a service proxies are identified by labels.  Replication controllers are used to monitor a population of pods using labels. Kubernetes will ensure that the specified number of replicas are available. Kubernetes does not yet allow you to specify resource requirements (CPU, memory) for the application, but there are plans to support this.  This talk by Brandon Burns provides a good introduction to Kubernetes.  Kubernetes is still under active development.

Mesos

Mesos implements sharing of computing resources using resource offers. A resource offer is a list of free resources on multiple slaves. The decision about which resources to use are made by the programs sharing the resources offered by Mesos.  Mesos does not collect any resource requirements from these programs, instead the programs can “reject” offers. Mesos resources use OS isolation mechanisms like Linux containers which allow resources to be constrained by CPU and memory.  The isolation mechanism is pluggable via external isolator plugins like the Diemos plugin for Docker.  Diemos allows Mesos to use Docker containers with a specific image along with cpu and memory constraints as resources.  Mesosphere is a startup offering mesos on ec2 among other platforms. Currently, there does not seem to be any auto-scaling support for a Mesos cluster on ec2 - you preallocate ec2 instances to the cluster and Mesos will offer resources (e.g docker containers) from the set of ec2 instances.  This talk on Mesos for Cluster Management by Ben Hindman is a good introduction to Mesos.  For an academic perspective, see this Mesos paper from Berkeley.

There are several schedulers available for Mesos: Aurora, Marathon and soon .. Kubernetes-Mesos.

Aurora

Apache aurora was open-sourced by Twitter. Job definitions are written in python.
  import os 
  hello_world_process = Process(name = 'hello_world', cmdline = 'echo hello world')

 hello_world_task = Task(
  resources = Resources(cpu = 0.1, ram = 16 * MB, disk = 16 * MB),
  processes = [hello_world_process])

 hello_world_job = Job(
  cluster = 'cluster1',
  role = os.getenv('USER'),
  task = hello_world_task)

 jobs = [hello_world_job]

 Jobs are submitted using a cmd line client:
 aurora create cluster1/$USER/test/hello_world hello_world.aurora

Marathon

Marathon - by Mesosphere. It has a well documented REST API to create, start, stop and scale tasks. When creating tasks, you specify the number of instances to run and Marathon will ensure that those instances are available on the Mesos cluster.  Marathon allows for constraints to indicate that tasks should run only on certain slaves etc. You would have to run a load-balancer (e.g. HAProxy) on each host to proxy traffic from outside to the tasks.Jobs are defined in JSON format:

{
    "container": {
    "image": "docker:///libmesos/ubuntu",
    "options" : []
  },
  "id": "ubuntu",
  "instances": "1",
  "cpus": ".5",
  "mem": "512",
  "uris": [ ],
  "cmd": "while sleep 10; do date -u +%T; done"
}
Submit the job to Marathon via the REST API:
curl -X POST -H "Content-Type: application/json" localhost:8080/v2/apps -d@ubuntu.json

Kubernetes-Mesos

Kubernetes-Mesos - Work has just started for running Kubernetes pods on Mesos.

Omega

Google has been working on the problem of scheduling jobs on a cluster to maximize utilization.  Here is an overview of their work on the Omega scheduler.

Other tools

Apache Helix (https://github.com/linkedin/helix/) - opensourced by LinkedIn.
Autoscaling on EC2: http://aws.amazon.com/autoscaling/

No comments:

Post a Comment