Projects
Due date: December 14
Here a list of possible topics for your project. Please select one and send me
a message indicating your choice. You can also propose a different topic,
however, you should get my approval on your topic before you start working on it.
Every student is required to give a presentation and write a
report.
We will schedule the presentations later.
Your report and presentation must reflect
your understanding of the problems and algorithms that you have studied.
Here are some of the things
that I might look for when reading your report:
 Critical analysis of the importance of the problems and of the algorithms
studied.
 Your personal explanation of how algorithms work, why they work, and how to analyze
their performance.
 Potential new applications for the algorithms studied.
 Critical analysis of the strengths and weaknesses of the algorithms and solution
methodologies studied.
 Generalizations, extensions, and/or modifications of the algorithms and
solution methods studied to
make them applicable to new problems, or to overcome some of their shortcomings.
 Simplifications of the algorithms that you have studied or simplified analyses for them.
 New algorithms or ways of analyzing them.
 New problems and applications.
Below there is some information that might help you write your report
and prepare your presentation.
Topics
 Survey Papers
For each one of the following projects you are required to read, understand,
and critique several papers (at least 3) on the proposed topic,
and write a report on them. For some of the topics I list some papers that
might help you get started.
Part of your job is to find papers relevant to the
topic of your choice.
You must use a unified
notation and style when describing the different problems, solutions, and
ideas discussed in the selected papers. Your study must focus on the algorithmic
aspects of your topic.
The report should contain a clear description of relevant algorithms and
a sketch of their complexity analyses and proofs of correctness.
Several of the papers that you will read requires some effort on your part to
understand them. Your report should
contain clear explanations of the algorithms along with intuition explaining why the
algorithms work, or why they perform as claimed.
As mentioned above, a very important part of the report is your critical
analysis of the problems and algorithms
that you study. Try to propose ways in which the algorithms can
be improved. It is not essential that you give a formal proof that your ideas
will improve the algorithms, but you should explain the intuition behind
your ideas and why you think that they might work.
Discuss open problems
related to the ones that you study, or other problems that could be
solved by using the algorithms that you study.
The mark that you will get will depend on
how clearly your report reflects your understanding of the topic that you
have selected. If your report is a mere transcription/copy
of portions of the papers that you read,
then your mark will no be good.
 Game Theory and the Internet Roberto Barboza
The Internet is used and administered by a multitude of entities, all of them
competing to try to get the maximum performance out of it. Hence, the most appropriate
tools for designing efficient algorithms for it might come not only from
algorithmics, but also from Game Theory. In this topic you will explore the
relationship between Internet problems and Game Theory.
References
 Koutsoupias and Papadimitriou,
Worstcase equilibria,
STACS 99.
 Papadimitriou
Algorithms, Games, and the Internet, STOC 2001 (extended abstract in ICALP 2001).
 Roughgarden, Tardos. How bad is selfish routing? JACM 49(2), 2002, 236259.
 Cole, Dodis, Roughgarden.
How Much Can Taxes Help Selfish Routing?,
ACM Conference on Electronic Commerce, 2003, 98107.
 Akella, Seshan, Karp, Shenker,
Papadimitriou. Selfish
behaviour and stability of the Internet: a gametheoretic analysis of TCP.
SIGCOMM 2002, 117130.
 Energy efficient algorithms for wireless networks
One of the main concerns of the designers of wireless networks is battery life. Many
of the components of a wireless network are small mobile devices, whose only source of energy
are batteries which store a limited amount of charge. Routing algorithms must be
designed so that these mobile devices use the least possible amount of energy, as this would
maximize the lifetime of the network. For this project you are going to study some of these
algorithms and various methods that have been devised for minimizing the use of energy in
wireless networks.
 Structure of the Web Graph
Mahnaz Rabbani, Mengxue Zhou, Dongwook Lee
The web graph is a very interesting object of study with its billions of nodes and
edges, and seemingly endless exponential growth. There are many interesting
structural properties of the Web graph that we can exploit to design better
Internet applications. In this topic you will read about the properties
of the Web graph and you must indicate how they are measured and why they are
relevant to Internet
applications.
References
 Kleinbeg, Kumar, Raghavan, Rajagopalan, and Tomkins, The web as a graph:
measurements, models, and methods, International Conference on Combinatorics
and Computing, 1999.
 Kleinberg,Lawrence.
The Structure of the Web. Science 294(2001), 1849.
 Laura, Leonardi, Millozzi, Meyer, and Sibeyn, Algorithms and experiments for the webgraph, Proceedings of ESA 2003, LNCS 2832, 703714, 2003.
 Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata, Tomkins, and Wiener,
Graph structure in the web, Proceedings of the 9th WWW conference, 2000.
 Prefetching
Yi Zhang
Prefetching is used to reduce the time needed to download data from the
Internet. In this topic you will study algorithms that are used
to prefetch web pages.
There are many papers on web prefetching. Use a web search engine to find some
of them. An interesting current topic is how to incorporate prefetching on
interactive Web 2.0 applications.
 Locating and storing information in peertopeer networks
Masoud Ataei, Parisa Alvandi
We have discussed this topic in class and I described a solution that has been
proposed for finding information in peertopeer networks based on the use of
consistent hashing. For this project you will read about other
solutions that have been proposed for the problem of deciding where to store data
and how to find information in a peertopeer network.
References
 Ratnasamy, Francis, Handley, Karp, Shenker, A Scalable ContentAddressable Network,
ACM Sigcomm 2001,
 Zhou, Joseph, Kubiatowicz, Tapestry: a fault tolerant wide area network infrastructure,, Sigcomm 2001 and UC Berkeley Tech. Report UCB/CSD011141.
 Adamic, Puniyani, Lukose, and Huberman, Search in PowerLaw Networks
 Hildrum, Kubatowicz, Rao, and Zhao, Distributed object location in a dynamic network, Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures, 2002.
 Search engines for multimedia information.
Elham Harirpoush, Eeda Naresh, Harsimranjit Singh
The search engines that will be discussed in class can only look for textual information
on the web. When looking for images, video, or sound, new algorithms and approaches
must be used. For this project you will study some of the algorithms that have
been proposed for searching for multimedia content in the Web.
References
 Yoshitaka and Ichikawa, A survey on contentbased retrieval for multimedia databases. IEEE Transactions on Knowledge and Data Engineering, 11 (1), 1999, 8193.
 Flickner et al, Query by image and video content: the OBIC system,
Computer, 1995, 2332.
 Lew, Next generation web searches for visual content, IEEE Computer, 33, 2000, 4653.
 Smith and Chang, Visually searching the web for content, IEEE
MultiMedia, 1997, 1220.
 Gevers and Smeulders, Pictoseek: combining color and shape invariant features for image retrieval, IEEE Transactions on Image Processing, 9(1), 2000, 102119.
 Thong, Moreno, Logan, Fidler, Maffey, and Moores, Speechbot: an experimental speech based search engine for multimedia content on the web, IEEE Transactions on Multimedia, 4 (1), 2002.
 Improving website design
Wenxiang Xu
Websites are sometimes poorly designed and they force users to follow many
links before they find the information that they need. For this
topic you will study some algorithms that can be used to redesign a
website so that information can be found faster.
References
 Bose, Czywizowicz, Gasieniec, Kranakis, Krizanc, Pelc, and Martin, Strategies for hotlink assignments, Proceedings of the 11th Symposium on Algorithms and Computation, 2000, 2334.
 Kranakis, Krizanc, and Shende, Approximating hotlink assignments, Proceedings of the 12th Symposium on Algorithms and Computation, 2001, 756767.
 Gerstel, Kutten, Matichin, and Peleg, Enhancement algorithms for Web directories, Proceedings of the 14th Symposium on Algorithms and Computation, 2003, 756767.
 Aldous, Reorganizing large websites, American Mathematical Monthly, 108 (2001), 1627.
 Fuhrmann, Krumke, and Wirth, Multiple hotlink assignment ,
Proceedings of the Workshop on GraphTheoretic Concepts in Computer Science, 2001, 189200.
 Gerstel, Kutten, Laber, Matichin, Peleg, Pessoa, Souza,
Reducing human interactions in Web directory searches. ACM Transactions on Information Systems, 25 (4), 2007.
 kMedian problem Madhavi Mohan
Given a graph with nonnegative distances
on the edges and a value k, the problem is to select a set S of at
most k vertices such that the sum of distances from the vertices
to S is minimized. This is a fundamental clustering problem,
and it is known to be NPhard.
This problem has many potential Internet applications. Mention some of these
applications, and overview the
different techniques that have been proposed for approximately solving the problem.
References
 K. Jain and V. Vazirani, Primal dual approximation algorithms for
metric facility location and kmedian problems, Proceedings of FOCS 1999.
 V. Arya, A. Meyerson, V. Pandit, N. Garg, K. Munagala, R. Khandekar,
Local search heuristics for kmedian and facility location problems,
Proceedings of STOC 2001.
 M. Charikar, S. Guha, E. Tardos, and D. Shmoys, , Proceedings of STOC 1999,
110.
 J. Lin and J. Vitter, Approximation algorithms for geometric median
problems, Information Processing Letters, 44 (1992), 245249.
 Uncapacitated Facility location problem
This problem is like the k median problem, except that now vertices
have costs. Given a graph with nonnegative distances
on the edges, and costs on the vertices,
the problem is to select a set S of vertices such that the sum of
distances from the vertices
to S plus the cost of the vertices in S is minimized.
Find some Internet related applications for the problem. Overview the
different techniques that have been proposed for approximately solving the problem.
References
 K. Jain and V. Vazirani, Primal dual approximation algorithms for
metric facility location and kmedian problems, Proceedings of FOCS 1999.
 V. Arya, A. Meyerson, V. Pandit, N. Garg, K. Munagala, R. Khandekar,
Local search heuristics for kmedian and facility location problems,
Proceedings of STOC 2001.
 Chudak and Shmoys, Improved approximation algorithms for uncapacitated facility location problems, SIAM Journal on Computing, 33 (1), 2003, 125.
 Guha and Khuller, Greedy strikes back: improved facility location algorithms, Journal of Algorithms, 31 (1), 1999, 228248.
 kCenter problem
Kyle Windsor
The kcenter problem is as follows. Given a graph
G=(V,E) with positive lengths on the edges, it is desired to choose
a set S of p vertices so that the maximum distance from a vertex in
VS to S is minimized.
Describe the problem and some of its moststudied variations: the asymmetric
kcenter problem, the fault tolerant kcenter problem, the
capacitated kcenter problem, the pneighbor kcenter
problem... Mention some of the Internet
applications of the problem. Overview the
different techniques that have been proposed for approximately
solving the problem.
References
 Hochbaum and Shmoys, A best possible heuristic for the kcenter
problem, Mathematics of Operations Research, 10 (1985), 180184.
 Khuller and Sussmann (1996), The capacitated kcenter problem,
Proc. 4th Ann. European Symp. on Algorithms, Lecture Notes in Comput. Sci. 1136,
SpringerVerlag, (1996) 152166.
 Vishwanathan, An $o(\log^*n)$ approximation algorithm for the
asymmetric pcenter problem,
Proc. 7th Ann. ACMSIAM Symp. on Discrete Algorithms, ACMSIAM, (1996) 15.
 Plesnik, A heuristic for the pcenter problem in graphs,
Disc. Appl. Math. (1987) 17, 263268.
 Khuller, Pless, and Sussmann,
Fault tolerant kcenter problems,
Proc. 3rd Italian Conf. on Algorithms and Complexity, Lecture Notes in
Comput. Sci. 1203, SpringerVerlag, (1997)3748.
 Web search algorithms
Yao Shi
In class we will discuss the page ranking algorithm used by Google
to look for relevant information on the Web. This algorithm has several
drawbacks that will not be discussed in class. In this project you will learn
about other algorithms that have been proposed for finding information on the
Web. You are expected to discover some of the drawbacks of page rank and
related algorithms, discuss situations where these algorithms
do not work very well, and propose possible ways for improving them.
References
 Kleinberg,
Authoritative sources in a hyperlinked environment, Proc. 9th ACMSIAM Symposium
on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999).
 Chakrabarti, Dom, Gibson, Kleinberg, Kumar, Raghavan, Rajagopalan, Tomkins,
Mining the link
structure of the World Wide Web, IEEE Computer, August 1999.
 Krishna Bharat and Monika R. Henzinger, Improved Algorithms for Topic Distillation in Hyperlinked Environments, Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1998, pp. 104111.
 Lempel and Moran, The stochastic approach for linkstructure analysis (SALSA) and the TKC effect, Computer Networks 33 (2000), 387401.
 Dimitris Achlioptas, Amos Fiat, Anna R. Karlin, Frank McSherry, Web Search via Hub Synthesis, FOCS 2001: 500509.
 Minimum weight spanning tree
Yiming Guan
Finding a minimum spanning tree is a classical graph algorithm that has been widely
studied for both sequential and distributed models of computation. For this project
you will learn of different distributed algorithms for computing a minimum spanning
tree and you will make a comparison of those algorithms in terms of their simplicity,
communication and time complexities. Indicate in which situations one algorithm
is superior to the other ones and propose ways in which some of these
algorithms can be improved.
References
 Garay, Kutten, Peleg, A sublinear
time distributed algorithm for minimumweight spanning trees.
 Kutten, Peleg, Fast distributed construction of
kdominating sets and applications. PODC 1995, 238249.
 Lotker, PattShamir, Pavlov, Peleg, Minimumweight
spanning tree construction in O(log log n) communication rounds. SIAM J. Comput.
35 (2005), 120131.
 Elkin, A faster distributed protocol for
constructing a minimum spanning tree.

Experimental Evaluation of Algorithms
Experimental evaluation of algorithm requires implementing algorithms and fine
tuning them so that they perform well on "typical" instances. For each one of
these projects you must implement several algorithms (at least 2) and compare their
performance (either their running time or approximation ratios) over a carefully
selected set of test inputs.
You must explain how the test cases were selected and generated. Argue that
the results of theses tests are representative of the true performance of the
algorithms that you have implemented. From the tests, try to discover ways of
making the implementations faster/better. Report your findings. Also, show the
instances for which your algorithm has the worst running time/approximation
ratio, and explain why this is the case.
Some of the algorithms that you can evaluate are for the following problems. Look
at the references given above for details on the algorithms.
 Facility Location problem
Here is a
document giving some details on the implementation of the Jain and Vazirani
algorithm.
 kMedian problem Haoze Yuan
 kCenter problem

Implementations
 Simulator of an asynchronous distributed system
This system will receive as input a
set of parameters such as: number of processors in the system, set of
neighbours for each processor, the algorithm A that the processors
must execute, speed of each processor, delay of each communication link, probability
that each processor will fail, probability that each link will fail, whether
Bizantine failures are allowed, and so on.
The system must execute the specified distributed algorithm on each processor,
taking care of delivering all messages
that the processors exchange among themselves.
It might be possible, for example, that the speeds of processors change over time
or that the delay of communication links also varies as the algorithm is being
executed.
In Java one might implement this system, for example, by creating one thread for
each one of the processors. The code that the thread executes is the algorithm
A. Each thread has associated a queue of messages, and every time that
a message is sent to it, the message is simply stored in the queue. A receive
operation removes the next message from the queue.
Your algorithm should be able to execute a variety of simple algorithms,
like the ones that we discussed in class. Look at the simulator for
synchronous distributed systems posted on the website to have an idea of
what is expected from you in this project. You have to design a graphical user
interface and you should design your simulator so that it is easy to use.
 Implementation of a parallel web crawler
Nikita Sokolov, Jordan Gould
For this topic you will implement a parallel web crawler that performs a
specialized task: constructing a representation of the Web graph and computing
some of its properties.
The crawler must create several threads, or parallel processes,
to make the crawling process more efficient.
The information collected by each thread must be gathered and processed by your main
program, which will build a graph with the Web pages downloaded by the
threads. Each node of the graph corresponds to one Web page and it will store
in it the URL of that page. Each edge in the graph will correspond to a
hyperlink.
Your program will receive as input the maximum number of nodes in your graph, as
otherwise, the crawler might run indefinitely. As the graph is built you will
compute some of its properties, like the distribution of the number of incoming links to a
node, and of the number of outgoing links from a node, the diameter of the
graph, the average distance between two nodes of the graph,
the number of strongly connected components, and any other information that you think
is useful to understand
the structure of the Web graph.
To help you get started, here is a simple Java program that
downloads
a web page and prints its contents.
 Implementation of a peertopeer system
Sakif Pritom
This system will receive as input a list of processors in the system, the
processor identifiers, and a set of keys to be stored in the nodes of the network.
The user can then specify some commands to interact with the network:
 find key to find the location of a particular key or to determine
that the
key is not stored anywhere in the system.
 add processorID adds a new processor to the system. The system must show
which keys are moved to the new processor.
 end processorID removes the specified processor. The keys stored at
the processor must be moved elsewhere so they are not lost.
 crash processorID makes the specified processor crash. Note that
the crashed processor does not have a chance to send messages to the other processors
indicating that it is leaving, hence these processors need to perform some actions
to discover that the processor has crashed so the system can keep working correctly.
You can implement the Chord protocol, or some of the other protocols with provable performance like CAN, Pastry, or Tapestry.
References
 Ratnasamy, Francis, Handley, Karp, Shenker, A scalable contentaddressable network.
 Stoica,Morris,Karger,Kaashoek,Balakrishnan, Chord: A scalable peertopeer lookup service for Internet applications. PODC 1995, 238249.
 Zhao, Huang, Stribling, Rhea, Joseph, Kubiatowicz, Tapestry: A resilient globalscale overlay for service deployment.
 Implementation of a search engine
Christopher Brogly, Anas Ibrahim
For this project you will build a simple search engine that implements the
page rank algorithm or other of the existing algorithms to rank pages. The search
engine will receive a set of user keywords and it will present to the user
a set of pages matching the query listed in decreasing order of rank.
You can either use a crawler to collect a set of Web pages to populate the
database of your search engine, or you can use an artificially built test set
of pages for the database. For the later case you will have to think about
the link structure of your test set of pages so there is a good mixture of high
quality pages and low quality ones.
In either case you need to design a data structure to store the information
from the database so the search engine can respond to user queries in a
timely fashion.
References

Algorithm Design
For the following problems you are required to design algorithms
as described. You must prove that your algorithm is correct and you must analyze it
by computing its communication complexity and its time complexity. You must
design the most efficient algorithm that you can.
 Fast Paths Andy Yu
When sending information between two nodes of the Internet, it is desirable to
use a path formed only by high speed links, because then the message can be
delivered quickly. Assume that each link of the Internet is labelled with
a positive value giving its speed. Given a path p between two nodes
u and v the speed of the path is defined as the slowest speed
of the edges in the path. For this topic you are required to design efficient
distributed synchronous algorithms for:
 finding a fastest path between any given pair of vertices, and
 finding a multicast tree giving fastest paths from a vertex s to
a set of vertices T.
 Wireless Transmission
Consider a wireless network composed of n computers and a set of
wireless communication channels that allow computers to exchange information.
This network is represented as a graph G = (V,E), where each node
in V corresponds to a computer and each edge in E is a
communication channel. Note that communication
channels are not physical connections between computers, rather a
communication channel (u,v) will be specified by two computers u,
v and a frequency f that the two computers can use to communicate
with each other.
Each computer has a wireless transmitter that can simultaneously send several
messages at different frequencies. If a computer u has edges in G
connecting it to several other computers, each one of the edges must be
assigned a different frequency, to prevent interference when u sends
messages to several neighbours. For example, for the
following figure processor 1 has three edges incident on it, which means that
it can exchange messages with three other processors: 2, 3, and 5.
These edges have
been assigned different frequencies (the numbers beside the edges), so
processor 1 can send simultaneously messages to all its neighbours by
choosing frequency 3 for the message to processor 2, frequency 2 for the
message to processor 3 and frequency 1 for the message to processor 5.
For this project you are required to design an efficient synchronous
distributed algorithm for assigning frequencies to the edges of a
network so that all edges incident on the same vertex receive different
frequencies. Your algorithm should not use more than 2D2 different
frequencies for all the edges of a network, where D is the
degree of the network. The degree of a node is the number of its
neighbours and the degree of a network is the maximum degree of its
nodes. For example, for the network above, its degree is 4.
A more challenging problem is to use at most D+1 different
frequencies to label all the edges of a network.
 Network Maintenance
Consider a computer network in which hardware can be installed in some
of the computers so a computer can check that the communication links
attached to it are working correctly. For example, for the network above
we could install link checking hardware in computers 1, 2, 3, 4, and 5:
computer 1 would check links (1,2), (1,3), and (1,5), computer 2 would
check links (2,1), (2,3), and (2,4), and so on. Notice that each link is
monitored by at least one computer. Since the hardware to check the communication
links is expensive, it is desired to choose the minimum number of
locations where it should be installed so that it checks the working
condition of every communication link. Therefore, a better solution than
the above would be to place the hardware at computers 1, 2, 4, and 5.
Notice than in this new placement every communication link is monitored,
and it requires installing specialized hardware in only 4 computers, as
opposed to in 5 as in the first solution.
For this project you are required to design a synchronous communication
algorithm that selects a minimal set of computers where link
checking hardware can be installed so every communication link is monitored.
A set of computers is minimal if removing any computer from the set
would not give a set of locations from which all communication links
can be monitored. For example, the set {1,2,3,4,5} is not minimal because
if we remove computer 3, the resulting set {1,2,4,5} of computers can still
check every communication link. This latter set is minimal, as removing
any computer from it would not allow the remaining computers to check
all links.