## Projects

Due date: December 7

Here a list of possible topics for your project. Please select one and send me a message indicating your choice. You can also propose a different topic, however, you should get my approval on your topic before you start working on it.

Every student is required to write a report. Your report must reflect your understanding of the problems and algorithms that you have studied. Here are some of the things that I might look for when reading your report:

• Critical analysis of the importance of the problems and of the algorithms studied.
• Your personal explanation of how algorithms work, why they work, and how to analyze their performance.
• Potential new applications for the algorithms studied.
• Critical analysis of the strengths and weaknesses of the algorithms and solution methodologies studied.
• Generalizations, extensions, and/or modifications of the algorithms and solution methods studied to make them applicable to new problems, or to overcome some of their shortcomings.
• Simplifications of the algorithms that you have studied or simplified analyses for them.
• New algorithms or ways of analyzing them.
• New problems and applications.

Below there is some information that might help you write your report.

## Topics

1. Survey Papers

For each one of the following projects you are required to read, understand, and critique several papers (at least 3) on the proposed topic, and write a report on them. For some of the topics I list some papers that might help you get started. Part of your job is to find papers relevant to the topic of your choice.

You must use a unified notation and style when describing the different problems, solutions, and ideas discussed in the selected papers. Your study must focus on the algorithmic aspects of your topic.

The report should contain a clear description of relevant algorithms and a sketch of their complexity analyses and proofs of correctness. Several of the papers that you will read requires some effort on your part to understand them. Your report should contain clear explanations of the algorithms along with intuition explaining why the algorithms work, or why they perform as claimed.

As mentioned above, a very important part of the report is your critical analysis of the problems and algorithms that you study. Try to propose ways in which the algorithms can be improved. It is not essential that you give a formal proof that your ideas will improve the algorithms, but you should explain the intuition behind your ideas and why you think that they might work. Discuss open problems related to the ones that you study, or other problems that could be solved by using the algorithms that you study.

The mark that you will get will depend on how clearly your report reflects your understanding of the topic that you have selected. If your report is a mere transcription/copy of portions of the papers that you read, then your mark will no be good.

• Game Theory and the Internet
The Internet is used and administered by a multitude of entities, all of them competing to try to get the maximum performance out of it. Hence, the most appropriate tools for designing efficient algorithms for it might come not only from algorithmics, but also from Game Theory. In this topic you will explore the relationship between Internet problems and Game Theory.

References

• Energy efficient algorithms for wireless networks Arshin Rezazadeh
One of the main concerns of the designers of wireless networks is battery life. Many of the components of a wireless network are small mobile devices, whose only source of energy are batteries which store a limited amount of charge. Routing algorithms must be designed so that these mobile devices use the least possible amount of energy, as this would maximize the lifetime of the network. For this project you are going to study some of these algorithms and various methods that have been devised for minimizing the use of energy in wireless networks.
• Efficient routing techniques in wireless body area networks. Fu Chi Chen

• Structure of the Web Graph
The web graph is a very interesting object of study with its billions of nodes and edges, and seemingly endless exponential growth. There are many interesting structural properties of the Web graph that we can exploit to design better Internet applications. In this topic you will read about the properties of the Web graph and you must indicate how they are measured and why they are relevant to Internet applications.

References
• Kleinbeg, Kumar, Raghavan, Rajagopalan, and Tomkins, The web as a graph: measurements, models, and methods, International Conference on Combinatorics and Computing, 1999.
• Kleinberg,Lawrence. The Structure of the Web. Science 294(2001), 1849.
• Laura, Leonardi, Millozzi, Meyer, and Sibeyn, Algorithms and experiments for the webgraph, Proceedings of ESA 2003, LNCS 2832, 703--714, 2003.
• Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata, Tomkins, and Wiener, Graph structure in the web, Proceedings of the 9th WWW conference, 2000.

• Pre-fetching
Pre-fetching is used to reduce the time needed to download data from the Internet. In this topic you will study algorithms that are used to pre-fetch web pages.

There are many papers on web pre-fetching. Use a web search engine to find some of them. An interesting current topic is how to incorporate pre-fetching on interactive Web 2.0 applications.

• Locating and storing information in peer-to-peer networks Shahryar Monavvari, Chengyuan Zhang
We have discussed this topic in class and I described a solution that has been proposed for finding information in peer-to-peer networks based on the use of consistent hashing. For this project you will read about other solutions that have been proposed for the problem of deciding where to store data and how to find information in a peer-to-peer network.

References

• Search engines for multimedia information.
The search engines that will be discussed in class can only look for textual information on the web. When looking for images, video, or sound, new algorithms and approaches must be used. For this project you will study some of the algorithms that have been proposed for searching for multimedia content in the Web.

References
• Yoshitaka and Ichikawa, A survey on content-based retrieval for multimedia databases. IEEE Transactions on Knowledge and Data Engineering, 11 (1), 1999, 81-93.
• Flickner et al, Query by image and video content: the OBIC system, Computer, 1995, 23-32.
• Lew, Next generation web searches for visual content, IEEE Computer, 33, 2000, 46-53.
• Smith and Chang, Visually searching the web for content, IEEE MultiMedia, 1997, 12-20.
• Gevers and Smeulders, Pictoseek: combining color and shape invariant features for image retrieval, IEEE Transactions on Image Processing, 9(1), 2000, 102-119.
• Thong, Moreno, Logan, Fidler, Maffey, and Moores, Speechbot: an experimental speech based search engine for multimedia content on the web, IEEE Transactions on Multimedia, 4 (1), 2002.

• Improving website design Alex Santalov
Websites are sometimes poorly designed and they force users to follow many links before they find the information that they need. For this topic you will study some algorithms that can be used to re-design a website so that information can be found faster.

References
• Bose, Czywizowicz, Gasieniec, Kranakis, Krizanc, Pelc, and Martin, Strategies for hotlink assignments, Proceedings of the 11th Symposium on Algorithms and Computation, 2000, 23-34.
• Kranakis, Krizanc, and Shende, Approximating hotlink assignments, Proceedings of the 12th Symposium on Algorithms and Computation, 2001, 756-767.
• Gerstel, Kutten, Matichin, and Peleg, Enhancement algorithms for Web directories, Proceedings of the 14th Symposium on Algorithms and Computation, 2003, 756-767.
• Aldous, Reorganizing large websites, American Mathematical Monthly, 108 (2001), 16-27.
• Fuhrmann, Krumke, and Wirth, Multiple hotlink assignment , Proceedings of the Workshop on Graph-Theoretic Concepts in Computer Science, 2001, 189-200.
• Gerstel, Kutten, Laber, Matichin, Peleg, Pessoa, Souza, Reducing human interactions in Web directory searches. ACM Transactions on Information Systems, 25 (4), 2007.

• k-Median problem
Given a graph with non-negative distances on the edges and a value k, the problem is to select a set S of at most k vertices such that the sum of distances from the vertices to S is minimized. This is a fundamental clustering problem, and it is known to be NP-hard. This problem has many potential Internet applications. Mention some of these applications, and overview the different techniques that have been proposed for approximately solving the problem.

References

• Uncapacitated Facility location problem
This problem is like the k median problem, except that now vertices have costs. Given a graph with non-negative distances on the edges, and costs on the vertices, the problem is to select a set S of vertices such that the sum of distances from the vertices to S plus the cost of the vertices in S is minimized. Find some Internet related applications for the problem. Overview the different techniques that have been proposed for approximately solving the problem.

References

• k-Center problem
The k-center problem is as follows. Given a graph G=(V,E) with positive lengths on the edges, it is desired to choose a set S of p vertices so that the maximum distance from a vertex in V-S to S is minimized.

Describe the problem and some of its most-studied variations: the asymmetric k-center problem, the fault tolerant k-center problem, the capacitated k-center problem, the p-neighbor k-center problem... Mention some of the Internet applications of the problem. Overview the different techniques that have been proposed for approximately solving the problem.

References
• Hochbaum and Shmoys, A best possible heuristic for the k-center problem, Mathematics of Operations Research, 10 (1985), 180-184.
• Khuller and Sussmann (1996), The capacitated k-center problem, Proc. 4th Ann. European Symp. on Algorithms, Lecture Notes in Comput. Sci. 1136, Springer-Verlag, (1996) 152-166.
• Vishwanathan, An \$o\log^*n)\$ approximation algorithm for the asymmetric p-center problem, Proc. 7th Ann. ACM-SIAM Symp. on Discrete Algorithms, ACM-SIAM, (1996) 1-5.
• Plesnik, A heuristic for the p-center problem in graphs, Disc. Appl. Math. (1987) 17, 263-268.
• Khuller, Pless, and Sussmann, Fault tolerant k-center problems, Proc. 3rd Italian Conf. on Algorithms and Complexity, Lecture Notes in Comput. Sci. 1203, Springer-Verlag, (1997)37-48.

• Web search algorithms Mennatallah Massoud, Mahsa Kazeminooreddinvand
In class we will discuss the page ranking algorithm used by Google to look for relevant information on the Web. This algorithm has several drawbacks that will not be discussed in class. In this project you will learn about other algorithms that have been proposed for finding information on the Web. You are expected to discover some of the drawbacks of page rank and related algorithms, discuss situations where these algorithms do not work very well, and propose possible ways for improving them.

References

• Minimum weight spanning tree Sitong Li, Ningwei Shi
Finding a minimum spanning tree is a classical graph algorithm that has been widely studied for both sequential and distributed models of computation. For this project you will learn of different distributed algorithms for computing a minimum spanning tree and you will make a comparison of those algorithms in terms of their simplicity, communication and time complexities. Indicate in which situations one algorithm is superior to the other ones and propose ways in which some of these algorithms can be improved.

References
2. ### Experimental Evaluation of Algorithms

Experimental evaluation of algorithm requires implementing algorithms and fine tuning them so that they perform well on "typical" instances. For each one of these projects you must implement several algorithms (at least 2) and compare their performance (either their running time or approximation ratios) over a carefully selected set of test inputs.

You must explain how the test cases were selected and generated. Argue that the results of theses tests are representative of the true performance of the algorithms that you have implemented. From the tests, try to discover ways of making the implementations faster/better. Report your findings. Also, show the instances for which your algorithm has the worst running time/approximation ratio, and explain why this is the case.

Some of the algorithms that you can evaluate are for the following problems. Look at the references given above for details on the algorithms.

• Facility Location problem
Here is a document giving some details on the implementation of the Jain and Vazirani algorithm.
• k-Median problem

• k-Center problem Linxiao Wang
3. ### Implementations

• Simulator of an asynchronous distributed system Mathias Babin
This system will receive as input a set of parameters such as: number of processors in the system, set of neighbours for each processor, the algorithm A that the processors must execute, speed of each processor, delay of each communication link, probability that each processor will fail, probability that each link will fail, whether Bizantine failures are allowed, and so on. The system must execute the specified distributed algorithm on each processor, taking care of delivering all messages that the processors exchange among themselves. It might be possible, for example, that the speeds of processors change over time or that the delay of communication links also varies as the algorithm is being executed.

In Java one might implement this system, for example, by creating one thread for each one of the processors. The code that the thread executes is the algorithm A. Each thread has associated a queue of messages, and every time that a message is sent to it, the message is simply stored in the queue. A receive operation removes the next message from the queue.

Your algorithm should be able to execute a variety of simple algorithms, like the ones that we discussed in class. Look at the simulator for synchronous distributed systems posted on the website to have an idea of what is expected from you in this project. You have to design a graphical user interface and you should design your simulator so that it is easy to use.

• Implementation of a parallel web crawler Wang Lyu, Konstantinos Tsiounis, Wenbo Yu, Anthony Caradillo
For this topic you will implement a parallel web crawler that performs a specialized task: constructing a representation of the Web graph and computing some of its properties.

The crawler must create several threads, or parallel processes, to make the crawling process more efficient. The information collected by each thread must be gathered and processed by your main program, which will build a graph with the Web pages downloaded by the threads. Each node of the graph corresponds to one Web page and it will store in it the URL of that page. Each edge in the graph will correspond to a hyperlink.

Your program will receive as input the maximum number of nodes in your graph, as otherwise, the crawler might run indefinitely. As the graph is built you will compute some of its properties, like the distribution of the number of incoming links to a node, and of the number of outgoing links from a node, the diameter of the graph, the average distance between two nodes of the graph, the number of strongly connected components, and any other information that you think is useful to understand the structure of the Web graph.

To help you get started, here is a simple Java program that downloads a web page and prints its contents.

• Implementation of a peer-to-peer system Mingda Sun, Zaid Albirawi
This system will receive as input a list of processors in the system, the processor identifiers, and a set of keys to be stored in the nodes of the network. The user can then specify some commands to interact with the network:
• find key to find the location of a particular key or to determine that the key is not stored anywhere in the system.
• add processorID adds a new processor to the system. The system must show which keys are moved to the new processor.
• end processorID removes the specified processor. The keys stored at the processor must be moved elsewhere so they are not lost.
• crash processorID makes the specified processor crash. Note that the crashed processor does not have a chance to send messages to the other processors indicating that it is leaving, hence these processors need to perform some actions to discover that the processor has crashed so the system can keep working correctly.
You can implement the Chord protocol, or some of the other protocols with provable performance like CAN, Pastry, or Tapestry.

References

• Implementation of a search engine
For this project you will build a simple search engine that implements the page rank algorithm or other of the existing algorithms to rank pages. The search engine will receive a set of user keywords and it will present to the user a set of pages matching the query listed in decreasing order of rank.

You can either use a crawler to collect a set of Web pages to populate the database of your search engine, or you can use an artificially built test set of pages for the database. For the later case you will have to think about the link structure of your test set of pages so there is a good mixture of high quality pages and low quality ones. In either case you need to design a data structure to store the information from the database so the search engine can respond to user queries in a timely fashion.

References
• ### Algorithm Design

For the following problems you are required to design algorithmssw as described. You must prove that your algorithm is correct and you must analyze it by computing its communication complexity and its time complexity. You must design the most efficient algorithm that you can.

• Fast Paths Andrew Bloch-Hansen
When sending information between two nodes of the Internet, it is desirable to use a path formed only by high speed links, because then the message can be delivered quickly. Assume that each link of the Internet is labelled with a positive value giving its speed. Given a path p between two nodes u and v the speed of the path is defined as the slowest speed of the edges in the path. For this topic you are required to design efficient distributed synchronous algorithms for:

• finding a fastest path between any given pair of vertices, and
• finding a multicast tree giving fastest paths from a vertex s to a set of vertices T.

• Wireless Transmission Santiago Gomez Rosero
Consider a wireless network composed of n computers and a set of wireless communication channels that allow computers to exchange information. This network is represented as a graph G = (V,E), where each node in V corresponds to a computer and each edge in E is a communication channel. Note that communication channels are not physical connections between computers, rather a communication channel (u,v) will be specified by two computers u, v and a frequency f that the two computers can use to communicate with each other.

Each computer has a wireless transmitter that can simultaneously send several messages at different frequencies. If a computer u has edges in G connecting it to several other computers, each one of the edges must be assigned a different frequency, to prevent interference when u sends messages to several neighbours. For example, for the following figure processor 1 has three edges incident on it, which means that it can exchange messages with three other processors: 2, 3, and 5. These edges have been assigned different frequencies (the numbers beside the edges), so processor 1 can send simultaneously messages to all its neighbours by choosing frequency 3 for the message to processor 2, frequency 2 for the message to processor 3 and frequency 1 for the message to processor 5.

For this project you are required to design an efficient synchronous distributed algorithm for assigning frequencies to the edges of a network so that all edges incident on the same vertex receive different frequencies. Your algorithm should not use more than 2D-2 different frequencies for all the edges of a network, where D is the degree of the network. The degree of a node is the number of its neighbours and the degree of a network is the maximum degree of its nodes. For example, for the network above, its degree is 4.

A more challenging problem is to use at most D+1 different frequencies to label all the edges of a network.

• Network Maintenance
Consider a computer network in which hardware can be installed in some of the computers so a computer can check that the communication links attached to it are working correctly. For example, for the network above we could install link checking hardware in computers 1, 2, 3, 4, and 5: computer 1 would check links (1,2), (1,3), and (1,5), computer 2 would check links (2,1), (2,3), and (2,4), and so on. Notice that each link is monitored by at least one computer. Since the hardware to check the communication links is expensive, it is desired to choose the minimum number of locations where it should be installed so that it checks the working condition of every communication link. Therefore, a better solution than the above would be to place the hardware at computers 1, 2, 4, and 5. Notice than in this new placement every communication link is monitored, and it requires installing specialized hardware in only 4 computers, as opposed to in 5 as in the first solution.

For this project you are required to design a synchronous communication algorithm that selects a minimal set of computers where link checking hardware can be installed so every communication link is monitored. A set of computers is minimal if removing any computer from the set would not give a set of locations from which all communication links can be monitored. For example, the set {1,2,3,4,5} is not minimal because if we remove computer 3, the resulting set {1,2,4,5} of computers can still check every communication link. This latter set is minimal, as removing any computer from it would not allow the remaining computers to check all links.