A Java Programmer's Guide to Reading C++ -- Part 3

This is part 3. There are two other parts: part 1 and part 2.

As mentioned in part 02, here I want to look at a program that is larger than just one file. This is different in C++ than in Java, because in Java, the compiler can look at all the .java and .class files and see what is necessary for methods to be called with the right parameter types and to have their return value's type match the expectations of the invoking code. However, in C and C++, this is not the case -- this information isn't available in the .o files that are ultimately linked together to build a C or C++ program. We have already seen a bit of this with the example of a forward declaration in the previous lesson. When a program extends across multiple files, there is much more opportunity for this issue to arise. The C/C++ solution is to have header files (usually ending in .h) for all the information that other files/modules need to know in order to make use of the information in the corresponding .c or .cpp file. We will see an example of this shortly.

There are many of examples of programs that have multiple source files. According to list of C++ applications, we could look at the Chromium web browser or Windows XP, but I was looking for something more interesting (all right, something more manageable). As an example of a reasonable small multi-file C++ program, let us consider tinisat. The program tinisat is a SAT solver, (i.e., it reads in a collection of Boolean equations in conjunctive normal form and it tries to find a binding of the variables that cause all the clauses to be true). This is a classic NP-complete problem, but much progress has been made on it in the past decade or so. Many of these newer solvers are quite large and complicated, so tinisat was written to show that in roughly 550 lines of C++ code, a solver could be written that performed nearly as well as its more complicated cousins. By the way, there is tinisat-j, a java implementation of tinisat, for those interested in seeing how the same algorithms work out in different languages.

To download tinisat:

   mkdir zz
   cd zz
   wget http://users.cecs.anu.edu.au/~jinbo/tinisat/tinisat0.22.tar.gz
   gunzip tini*.gz
   tar xvf tini*.tar
   cd *22
where you now have the files:
 Cnf.cpp
 Cnf.h
 CnfManager.cpp
 CnfManager.h
 LICENSE
 main.cpp
 Makefile
 README
 SatSolver.cpp
 SatSolver.h
LICENSE is just the 1991 version of the GNU Public License. README just tells us to run make to build the program and give it a CNF formatted file to run the program. The makefile
 CC = g++ -Wall -O3 -DNDEBUG
 HEADERS = Cnf.h CnfManager.h SatSolver.h
 OBJS = Cnf.o CnfManager.o SatSolver.o main.o

 tinisat: $(OBJS) 
	$(CC) $(OBJS) -o tinisat  

 $(OBJS): $(HEADERS) Makefile

 .cpp.o: 
	$(CC) -c $< 
shows us that the author used the assert package discussed in Lesson 02 (grep assert *.cpp | wc shows us that it was used exactly 14 times). Typing make, shows us that the actual build sequence is:
  g++ -Wall -O3 -DNDEBUG -c Cnf.cpp 
  g++ -Wall -O3 -DNDEBUG -c CnfManager.cpp 
  g++ -Wall -O3 -DNDEBUG -c SatSolver.cpp 
  g++ -Wall -O3 -DNDEBUG -c main.cpp 
  g++ -Wall -O3 -DNDEBUG Cnf.o CnfManager.o SatSolver.o main.o -o tinisat  
Wall is turning on all warnings (we get a few about unused function results, but nothing important). If we look up CNF format, we get given the example CNF formatted text file:
 c This is a comment
 c This is another comment
 p cnf 6 3
 1 -2 3 0
 2 4 5 0
 4 6 0
which represents the Boolean equations:
  A or not B or C
  B or D or E
  D or F
where 1 represents A, 2 represents B, 3 represents C, 4 represents D, 5 represents E, and 6 represents F. This set of equations has many solutions (for example, when A, B, and D are true, all three of these equations are true). When we try this out with tinisat, we get:
prompt: cat > x.cnf
c This is a comment
c This is another comment
p cnf 6 3
1 -2 3 0
2 4 5 0
4 6 0

prompt: ./tinisat x.cnf
c Tinisat 0.22
c solving x.cnf
c 6 variables, 3 clauses
s SATISFIABLE
c 1 decisions, 0 conflicts, 0 restarts
c solved in 0.00s
which doesn't tell us what the solutions are, but it tells us that they exist, which is good enough for our purposes here.

To see how this fits together as a program, we first look at main.cpp (where we would expect the program to start). Ignoring the i/o, the core of main is the lines:

        #include "SatSolver.h"
        ...
	Cnf *cnf = new Cnf(argv[1]);
        ...
	SatSolver solver(*cnf); 
	delete cnf;
        ...
       	solver.printStats();

The work of the program is actually invoked indirectly by creating a SatSolver object for the Cnf object that had just been read in (the reading of the Cnf file was implicit in constructing the Cnf object cnf). When we look at SatSolver.h, we see that it includes CnfManager.h and defines to classes, the struct Luby and the class SatSolver (which makes use of the struct Luby). The struct Luby contains the code for the methods associated with it. But the class SatSolver is just a shell:

  class SatSolver: public CnfManager{
	unsigned nVars;		// num of variables in varOrder
	Luby luby;		// restart scheduler
	unsigned lubyUnit;	// unit run length for Luby's
	unsigned nextDecay;	// next score decay point
	unsigned nextRestart;	// next restart point

	int selectLiteral();
	bool verifySolution();
  public:
	SatSolver(){};
	SatSolver(Cnf &cnf);
	bool run();
	void printStats();
	void printSolution(FILE *); 	
  };
Note that the class SatSolver inherits from CnfManager. Inheritance should be familar to Java programmers, but as with many things, C++ adds a twist in that it is possible for a C++ class to inherit from more than one class (see multiple inheritance [wikipedia]).

When we look at SatSolver.cpp, we see the definitions for the methods declared above, e.g.,

  void SatSolver::printSolution(FILE *ofp){
	for(unsigned i = 1; i <= vc; i++)
		if(vars[i].value == _POSI) fprintf(ofp, "%d ", i);
		else if(vars[i].value == _NEGA) fprintf(ofp, "-%d ", i);
	fprintf(ofp, "0\n");
  }
tells us how the printSolution method in SatSolver should be implemented. In the method header:
   SatSolver::SatSolver(Cnf &cnf): CnfManager(cnf){
we see CnfManager being run on the parameter cnf before going into the body of the method. In the code for this method, we see the line:
   	assertLiteral(-i, NULL);
This is not defined in SatSolver.cpp. However, recall that SatSolver.cpp includes SatSolver.h. Looking in SatSolver.h, we find it isn't declared there either, but SatSolver.h includes CnfManager.h. We find that it is declared (but not defined) in CnfManager.h as part of the class CnfManager. Looking in CnfManger.cpp (visible because the class SatSolver inherited from the class CnfManager), we find it is defined there. It turns out that the interesting algorithmic part of this program resides mostly in CnfManager.cpp.

Incidently, CnfManager.h included Cnf.h. Cnf.h includes the declaration for the struct Cnf (which is used to hold the information read in from the CNF formatted input file). The declaration of Cnf looks like:

 struct Cnf{
	unsigned vc;	// var count
	unsigned cc;	// clause count
	int **clauses;	// 2-dim. array with entries same as in cnf file
	unsigned lc;	// literal count
	unsigned *cl;	// clause length
	Cnf(char *fname);
	~Cnf();
 };
The last method ~Cnf is some new C++ notation for us. Looking in Cnf.cpp, we find it is defined as:
 Cnf::~Cnf(){
	if(clauses){ 
		for(unsigned i = 0; i < cc; i++) free(clauses[i]);
		free(clauses);
	}
	free(cl);
 }
As discussed in the wikipedia entry on destructor, ~Cnf is C++ notation for the routine that is used to destroy an object (like Cnf is used to construct the object). If the object was created as a local object in a method, then the destructor is invoked when you leave that method (or whatever scope the object was declared in). If the object was created using a new (so it is on the heap rather than on the stack), then it is invoked when delete is performed on the object. In our case, the object was created using new back in main.cpp and delete is invoked on it there as well. Note that in the code of ~Cnf, free is used rather than delete. That is because those parts of the object being freed were created (see the source for the constructor Cnf) using calloc and realloc (traditional C memory allocators -- note that new is not part of traditional C syntax). The C++ FAQ has an interesting entry on destructors.

Well, that is enough for these lessons. If you want to learn more about C++, I recommend taking a look at the links in the Appendix to lesson 1. Another source of C++ code that you could look at to see what people are doing in this language is the C++ section of code.google.com. It contains over 900 examples of C++ code, ranging from the V8 JavaScript interpreter, googletest (Google's C++ Testing Framework), and googleMock (Google's C++ Mocking Framework) to cppbtree (a C++ B-tree template library) and rsa (an implementation in C++ of the RSA encryption algorithm).