A Java Programmer's Guide to Reading C++

This is part 1, which demonstrates the basic mechanics of working with a program written in C++. An appendix of useful links related to C++ is also included. Part 2 goes over a bunch of short C++ programs looking at how different features of the language are used. Part 3 looks at how larger programs work in C++ (the example is roughly 550 lines of C++ code spread across 4 code files and three header files). As a pdf file, part 1 prints off the Chrome web browser as 10 pages (the last two being the appendix); part 2 prints off as 9 pages; and part 3 prints off as 4 pages. Although many code excerpts are quoted in this document, links are provided to the full programs (as well as to where their specifications can be found) which are worthwhile to study. A few exercises for the reader are also included (marked **).

The Rosetta Stone [wikipedia] was a stone tablet where a particular government announcement was written in three languages: Ancient Egyptian hieroglyphs, Demotic script, and Ancient Greek. As scholars were already familiar with Ancient Greek, this multi-language tablet allowed them to learn Ancient Egyptian. It turns out that there is a similar multi-language collection of programs that have been written in both Java and in C++. These are many of the solutions in the USC Programming Contest.

The posted solutions in this contest will give us problem tasks to work on as well as known solutions in Java and C++ to see these two languages relate. Both languages have aspects that don't have ready correspondance in the other, but these contest problems don't usually raise such issues -- most of the solutions being rather straightfoward code a few pages in length. While we will go further than this, it gives us a good place to start.

Let us start with the Spring 2005 contest. All six contest problems are stated in a single pdf document.

Lets start with the first problem, School Colors. Here the problem is to take in a list of colors (defined as R, G, B triplets) and return the pair that are furthest apart (using Euclidean distance). We are provided with sample test data colors.in and the corresponding results colors.out as well as solutions in Java ( colors.java) and C++ ( colors.cpp).

When I downloaded the various textfiles and viewed them using cat -A, I found that they were handling carriage return wrong (probably created under Windows). To fix this, I wrote a short script called deDos:

#!/bin/bash
# for options, see: man cat
# make executable with command: chmod 755 deDos
mv $1 $1.x
cat $1.x | sed -e 's/\r$//' - > $1
After cleaning up the files, we are ready to compile and run colors.cpp
  g++ -g colors.cpp -o colors
  ./colors
As is typical of the programs in this contest, colors has the input file colors.in hardwired into the program, so you don't need to tell it where to read from. It outputs to standard out. Looking at the output, it is immediately obvious that something is wrong since all the output values are 0.

To find out what is going on, we invoke gdb.

prompt: gdb colors
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
(gdb) list
56	
...
64	int main()
65	{
(gdb) list
66		ifstream f("colors.in");
67		int numTestcases;
68	
69		f>>numTestcases;
70	
71		while(numTestcases--)
72		{
73			f>>numColors;
74			init();
75			
(gdb) list
76			for(int i=1;i<=numColors;i++)
77			{
78				f>>Red[i]>>Green[i]>>Blue[i];
79			}
80	
81			calculate();
82		}
(gdb) break 80
Breakpoint 1 at 0x40101c: file colorsORIG.cpp, line 80.
(gdb) run
Starting program: /home/webber/Downloads/colorsORIG 

Breakpoint 1, main () at colorsORIG.cpp:81
81			calculate();
(gdb) print numTestcases
$1 = 11
(gdb) print numColors
$2 = 0
(gdb) quit
A debugging session is active.

	Inferior 1 [process 14360] will be killed.

Quit anyway? (y or n) y
which tells us that before starting to do the calculation, the program thinks the number of colors is 0, which is odd. Restarting gdb and going slower
prompt: gdb colors
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
(gdb) break 65
Breakpoint 1 at 0x400f3b: file colors.cpp, line 65.
(gdb) run
Starting program: colors 

Breakpoint 1, main () at colors.cpp:65
65	{
(gdb) step
66		ifstream f("colors.in");
(gdb) step
69		f>>numTestcases;
(gdb) step
71		while(numTestcases--)
(gdb) print numTestcases
$1 = 12
(gdb) step
73			f>>numColors;
(gdb) print numTestcases
$2 = 11
(gdb) step
74			init();
(gdb) print numColors
$3 = 3
So far, everything is as it should be.
(gdb) step
init () at colors.cpp:23
23		for(int i=1;i<=N;i++)
Hmmm, when we look at the program, N is a macro with the value 4002. We probably don't want to single step through all that.
(gdb) break 75
Breakpoint 2 at 0x400f9a: file colors.cpp, line 75.
(gdb) continue
Continuing.

Breakpoint 2, main () at colors.cpp:76
76			for(int i=1;i<=numColors;i++)
(gdb) print numColors
$4 = 0

Hmmm, numColors was 3 before we went into init, but now it is 0. Somehow init changed the value of numColors (which you wouldn't, in the normal scheme of things, expect it to do). Looking at the function init, we see it never mentions numColors. However, looking at the declarations of numColors, we see it comes immediately after the declarations of Red, Green, and Blue (which init does reference). In particular, Blue is defined to have size 4002, which would mean its valid indices run from 0 to 4001. However, the code in init initializes all its values running the index from 1 to 4002. Thus we have a classic C/C++ bug of running off the end of the array. What happens when this is done depends on the compiler, so apparently the contest person got lucky and this didn't cause their entry to fail for them. A lazy fix to this is to just change the array sizes from N to N+1. To be safe, I make this change for the array declarations Red, Green, Blue, Contrast, and ContrastAnswer. Then, recompile and run again. This time I get a more interesting set of results. I shove them in a file called x and diff them against colors.out

./colors > x
diff x colors.out | more
and find out that the only difference between the program's result and the posted answer is whether or not there is a : at the end of the line indicating which data set is being reported on (apparently this was close enough for the contest).

Looking at these two programs, we see that the two programmers have taken rather different approaches to the problem. The Java programmer has created 4 private classes, which just exist to create an object where values for a particular type (DataSet, RGB, OneSet, and Output) can be stored. The program holds all the data sets in an array ds (as opposed to just tracking the current data set). The mechanics of the program sit in just one method (with calcContrast being a small supporting method that computes Euclidean distances), the constructor for the class COLORS. After all the data sets are read in, the calculation is performed for each data set. The array oneSet is filled with the distance between each pair of colors. The contrast values are then copied to the array called array, which is then sorted using Java's built-in method so that the maximum value can be extracted easily from the sorted list. Then all the pairs whose contrast is the maxium value are put into the ArrayList output. Then the switched pairs are removed from output (i.e., [x,y] and [y,x] don't both appear in output). Finally, the values of output are printed.

On the other hand, the C++ programmer does not create any user-defined classes. The program looks much like a C program, with a few minor differences. The first difference is the standard headers. The C++ program starts out:

#include < iostream >
#include < fstream >
using namespace std;
[note: the spaces around the angle brackets are to keep the web browser from processing them as html commands. In C++, putting in the spaces generates an error complaining that iostream couldn't be found.] While the Java program needed to bring in a Math class because it was taking the square root, the C++ program doesn't bother since we aren't actually interested in the Euclidean distance, but rather in the comparison between two Euclidean distances (and if both x and y are positive, then the relation between x and y is the same as the relation between x squared and y squared). So all we are getting from these headers are file i/o and standard i/o.

#define N 4002
tells the compiler that where-ever it sees an identifier N in the remaining source, replace it with the integer 4002.

int Red[N];
requests space for an array of N integers. There is no `new' involved as this is something that is done by the compiler, rather than at run time (much like allocating space in an assembly program).

struct Answer
{
  int x;
  int y;
}
ContrastAnswer[N];
similarly is asking that ContrastAnswer be allocated space for N things of type Answer (which is a pair of integers). This is rather similar to the use of private classes in the Java program. In C++, struct is a class where all the members are public. The reason for the special name for this situation is that it makes things backward compatible with C, where there were no classes, but there was the notion of grouping together variables into a package (called a record). Note that this could also have been written:
struct Answer
{
  int x;
  int y;
};

Answer ContrastAnswer[N];

The `methods' init, calculate, and main exist outside of a class definition. By convention, main is the method/function executed when the program starts up.

 ifstream f("colors.in");
declares an object f of type ifstream initialized with the file name colors.in. Recall the include of fstream in the headers. You could think of this as being equivalent to setting up the BufferedReader in the Java code.

f >> numColors;
shows C++ notation for reading from a file into a value. Rather than being cast as a function call, instead the right shift operator is used. This is part of a bigger issue in C++ that allows programmers to associate their own meanings with operators (called operator overloading [wikipedia]). The types of the arguments to the operator determine what it does (in this case f is an input file stream and numColors is an integer, so an integer is read from the input file stream and stored in numColors). A list of all the operators in C and C++ can be found on wikipedia.

When we look a bit further down and see:

  f >> Red[i] >> Green[i] >> Blue[i];
we see what happens when we ask what value does the right shift operator return. If we think of it as storing a value into Red[i], but returning the value f, then the next shift operator is again a read that stores into Green[i]. Similarly, the third value gets read into Blue[i].

You will not that all this i/o is more compact than the corresponding Java code, but it is a matter of taste whether or not you like writing things this way.

Both the init and calculate functions are written much like they would have been in Java. Here, ContrastAnswer is built-up to hold the same thing as oneSet in the Java code. However, the current maximum is tracked as the constrasts are calculated and ContrastAnswer is built up for the current maximum (a new maximum just causes the ContrastAnswer index to restart at 0 overwriting the old values). Rather than eliminating symmetric answers, only half the possiblities are calculated in the first place.

The notation for output is similar to the notation for input:

cout << "Data Set " << counter << endl;
with endl being predefined to give you an end of line marker. cout is a predefined value similar to System.out in Java (analogously, cin corresponds to Java's System.in and cerr corresponds to Java's System.err).

** By now, you should be able to fix this program so that it outputs :'s the same way as colors.out expects them. Take a moment and try it.

In looking at the two programs, we see the Java program broke the task up into smaller tasks and then incrementally progressed to a solution. The C++ program collapsed knowledge such as that the distance between x and y is the same as the distance between y and x as well as worked out how to track ContrastAnswer efficently. However, the contest wasn't for the most efficient solution but rather to get through a collection of programming tasks in a fixed time limit. One can see that there are more potential places to make mistakes in the design of the C++ program (and indeed, the array overflow error was made although apparently didn't effect execution on the compiler being used). Both languages would have supported either style of program development.

The other problem from this contest that we have both Java and C++ solutions for is Problem F: Campus Buildings, which tries to figure out which, from a list of buildings, a particular abbreviation might match (see problems list for more details).

This time, the C++ headers are a bit larger, bringing in the math and string definitions, although only the string ones appear to be used in this program:

#include < iostream >
#include < fstream >
#include < string >
#include < math.h >

using namespace std;
The string type appears in the program, but is treated like a class, so when we want to do something like test against the length of a string called comp, it is written:
   if(j == comp.length())
we also see the string being treated as an array as in
  if(building[l] == comp[j] || building[l] - 32 == comp[j] || building[l] == comp[j] - 32)
Note that in C hence C++, characters can be treated as integers and that the codes for upper case characters and for lower case characters are exactly 32 steps apart -- upper case A being 65 and lower case a being 97 (see wikipedia entry for ASCII).

If we look up the C++ string library, we see that length is a public member function and that [] is an operator defined for the class as well. Like Java, the + operator handles concatenation and the shift operators have been overloaded for i/o, however, this program uses getline (also part of the string definition). If you read the discussion of both methods i/o shift operator and getline, you find that the shift operator uses blanks as string separators, so a building whose name was more than one word wouldn't get read properly, which is why getline is used in this C++ program.

Another thing that is interesting about this code is the empty function declaration

  bool findAbb(string building, string comp, int j, int fl);
that is just above the definition of main. Note that findAbb is actually defined after main
bool findAbb(string building, string comp, int j, int firstLetter)
{
but the first declaration lets the function be used by main before the implementation has been made available in the text (assuming the compiler reads the program from top to bottom). This is called a forward declaration [wikipedia].

Comparing the C++ and Java implementations, we see that in Java, the buildings are just kept in a Vector that will be the length that it needs to be, whereas the building list in C++ is an array of exactly 1000 entries (actually, the problem spec indicates at most 100 buildings and the C++ never checks for the error of too many buildings -- in general one doesn't see much error checking in the contest solutions).

There is one other problem (C: Orange Bowl) that has a C++ solution, but no Java solution. This problem tries to select a sequence of football plays with the maximum probability of success under certain assumptions. Taking a quick look at football.cpp, we see that it is mostly using things we have already seen. However, there are a few new things going on here. If we look at the headers:

#include < iostream >
#include < fstream >
#include < stdio.h >
#include < math.h >

using namespace std;
we see that C++ headers (iostream and fstream) are being mixed with C headers (stdio.h and math.h). C++ started out as rather minor extensions to C, but over time the amount of extensions has increased significantly (particularly in something like the 2011 C++ standard), but backward compatibility with C is still being maintained (which limits the usefulness of C++ in any situation where one cares about things being done correctly or securely -- the old saying being that a system can either be cheap, efficient, or correct; it is good to get one of the three, rare to get two out of three, and no one gets all three). However, like most things in C++, the backward compatibility isn't quite what it appears as detailed in the C++ FAQ entry on Is C a subset of C++? Also of interest is the C++ FAQ entry on How to mix C and C++ which shows how C code can walk around inside a C++ class (sometimes you will see programs that intensionally subvert the class concept in this manner leading to some code that looks rather strange to a Java programmer).

We see the C math libraries being used with:

double nextEff = pow( 1.0 * plays[i].p, 1.0 * yards / geff );
(see definition of pow) and the C i/o libraries being used with:
printf( "%.2f\n", prob( n ) );
(see definition of printf).

The other thing that is new in this code is the use of pointers. Following C notation, we see the line:

  play *plays;
which declares plays to contain the address of something of type play (the declaration of the play record appears a few lines above this line).

C++, like C, thinks of memory as one large array, so plays can be thought of as a variable that is large enough to hold an index into that array (where the index will indicate the start of record/class of type play). Although memory is one large array, generally different features of C++/C access different sections of that array (although enforcement of boundaries between the sections is problematic). There is generally a section of memory thought of as the stack where information regarding function calls and local variables and parameters of function instances are kept. There is generally another section of memory, called the heap, where space that is created by C++'s new or various C alloc routines is kept. The third section of memory is for space known about at compile time (such as global variables and arrays). Note that the plays declaration has not actually reserved any space for a record of type play. That happens later with:

    plays = new play[m];
which gives us a sequence of m records of type play and causes plays to contain an index for the location in memory of the start of this sequence. The C++ FAQ has an interesting entry on memory management.

This does introduce an interesting bug into the code called a memory leak [wikipedia]. The problem is that the assignment to plays is done inside a loop. So each time around the loop, we allocate space for another array of size m (where m is changing each time around the loop). However, when we are done with the array, we just lose track of it. So, if the loop goes around often enough, memory fills up and suddenly the program fails because it can't allocate yet another array as there is no room left for it. The solution is to return the memory space that has been allocated when we are through with it. This would be done by placing the line:

    delete [] plays;
at the end of the loop (i.e., just after the printf). Note: the square brackets are there because plays is an array (if plays was a non-array created by new, then one would just say delete plays). Interestingly, when I left out the square brackets, I didn't get an error message (instead something strange happened that may havve still left me with a memory leak). By the way, delete and new are operators in C++, and so can be overloaded (redefined) like other operators.

In Java, we don't do this because there is a routine called a garbage collector (see garbage collection wikipedia) that looks for blocks of memory that were allocated but are no longer being used and it recycles them to be used again for future memory requests. Although it is possible to set this up in C++, it is not a default feature of the programming language and so most C++ programs you enounter will not be using a garbage collector. Thus they will require the programmer to remember to return (delete) any memory that they are no longer using. Of course, just as forgetting to delete unused memory can result in a memory leak, returning memory for reuse when it is still in use can lead to interesting bugs in code as well.

Back to our consideration of the plays array declaration. We can read into the fields of the records of this array much as you would expect from Java:

for( int i = 0; i < m; i++ )
		{
			fin >> plays[i].g >> plays[i].p;
		}
The notion that an array is really just an index in memory to the start of a sequence of records of a certain type reminds us that C views the computer much like assembly language does and that C++ stays backward compatible with that view even as it offers other views. By declaring plays to contain an address of something that will be created later, the author of this code avoids the problem in the buildings.cpp code where array (lousy variable name) is declared to be of size 1000, rather than being declared to be the size needed.

** Go back to the buildings.cpp code and fix it so that it doesn't need a fixed size array of buildings by instead declaring array to be:

    string *array;
(note that the input file for buildings.cpp is BUILDING.IN). Don't forget to consider memory leaks. This should only require adding a couple of lines of code to buildings.cpp and the result should work the same as the original on the provided test data.

At this point you may want to consider some of the other resources for learning C++ mentioned in the Appendix below, or you may want to continue on with this approach in part 2.

Appendix: Background notes

This series of notes is aimed at helping Java programmers learn C++. Creating such introductions is something that many people have done before, for example:

More extensive coverage of C++ is also available:

In basic C++, there is little to appeal to a Java programmer. However, both Java and C++ are languages that are constantly evolving. There was a major update to C++ for 2011 called C++11 [wikipedia entry] (which can be run from gcc if you use the right options) and another expected in 2017 (called C++17). The main difference between C++ and Java is that C++ views memory like assembly programs view memory (i.e., as one big array) and so is much more error prone than Java which views memory as a bunch of isolated parts. There are attempts to handle this using a programming style that implements a smart pointer [wikipedia] data type and, in C++11, an optional `garbage collector' (see GC API), but this still doesn't provide the safety of Java. Given the number of security problems that trace down to C/C++-style memory errors, it is questionable why anyone would use C/C++. However, C++ dates back to the mid-80s, and Java didn't come along til the mid 90s, so there is a certain amount of legacy code in C++ (written in that decade) that one may have to deal with. So, ultimately, learning C++ is about managing already existing code bases rather than something one would start a new project in. As such, it makes sense to learn C++ by reading code actually written in it (rather than learning a purer textbook coding style that you won't actually encounter when reading legacy code and, perhaps, would even leave you more confused by the way people have been writing C++ code).

Incidently, while C++ is more prone to security problems than Java, writing secure code in Java can also be challenging. For those interested, as a start, see the documents:

To see what is currently going on in C++, see: talks from CppCon 2014 including: