Using MapReduce: MapReduce
is a a software framework introduced by Google to support
distributed computing on large data sets on clusters of
computers. The framework is inspired by map and reducte functions
commonly used in functional programming (Wikipedia).. MapReduce
is heavily used by Google. One of the goals of the Hadoop
project is to provide an open source implementation of MapReduce that
can run on a clluster or on rented hardware, e.g., Amazon EC2 cluster.
A project could involve the development of a parallel application
using Hadoop and Amazon EC2 cluster. Examples include the
following:
- Provide useful information about a set of
documents. For example you can use MapReduce to analyze a set of
log files. An example of this can be found here. Other sets of documents include blogs and news articles.
- Develop and run a scientific application on Amazon EC2 cloud. Describe your experience.
Implementation of a Subset of MapReduce: MapReduce
uses a master process and multiple worker processes for parallel
processing. The master process provides work to the work processes.
A project could invovle this aspect of MapReduce with support for
electing a new master process in the case that the master process goes
down. More information on implementation issues related to
MapReduce can be found
here.
Monitoring Virtual Machines: Dynamic
replication is where the number of servers allocated to an application
dynamically changes. This requires suitable monitoring
techniques. The use of virtual machines, however, complicates
monitoring since the host OS only sees to VM but not necessarily the
processes running in the VM. Approaches for dealing with this are
found in the paper
"Black-box and Gray-box Strategies for Virtual Machine Migration".
A possible project could be based on an implementation of one or
more of the monitoring techniques described in this paper.
New Computing Environment for a Department of Computer Science: In
this project, you would design a system (and partially implement)
that enhances the computing environment for education purposes.
For example, different courses have different software needs.
Virtual machines can be specifically designed for a course.
Based on this, can we radically change the computing environment
found in most computer science departments (at least the one at UWO).