- 21 Apr 2011: Marks have been handed in, but it is university policy
not to distribute final marks nor final exam marks before the registrars
office releases them. I would observe that two people got between 67 and
68 points when each of the 7 exam problems was marked out of 10, so other
than taking the mark out of 68 rather than 70, the final exam marking was
pretty straightforward. With respect to homework 3, 10 people handed in
working solutions (6 more handed in broken solutions which they were aware of
and 2 people who took the final handed in nothing for h3). The performance
measure I used for ranking was based on the random seed 0xBAD0095L with
max_index 63 in H3RandomTester.
37335295536 15196 * 42 * 58498 (57508 orig seed)
38030316786 15902 * 49 * 48807 (47979 orig seed)
38417899520 16640 * 32 * 72149 (70925 orig seed)
41897611338 17754 * 37 * 63781 (62701 orig seed)
47325095360 18328 * 20 * 129106 (128783 orig seed)
47791022916 18902 * 21 * 120398 (118472 orig seed)
55462690800 19514 * 45 * 63160 (63174 orig seed)
58331332280 20026 * 52 * 56015 (56015 orig seed)
71444982660 15902 * 30 * 149761 (149761 orig seed)
208626733172 17834 * 26 * 449933 (414287 orig seed)
- 12 Apr 2011: Oops, the 7 Apr list of problems from the textbook
should not have included 6.16, since it rests on understanding pipeline
material which we did not cover in class and will not be covering on the
exam.
- 8 Apr 2011: Confirmations of all handins upto and including 103 of h3
have been sent out. So far 16 people have handed in h3. If anyone is still
working on the weekend, remember that Saturday and Sunday count as two separate
days, so at 10% a day, Saturday midnight is 30% off and Sunday midnight is 40%
off. As with past homeworks, you are best off handing in whatever you got as
soon as you got it no matter how bad it is knowing you can always handin again
later if you were able to improve it more than the acrued late penalties would
subtract from it.
- 7 Apr 2011: Overview of Final Exam: Recall the Final Exam is Comprehensive. A substantial part of it will be recovering ground covered in the midterm.
Two differences are that I expect you to be better with sequential circuit
stuff now that you have had two homeworks to practice on and the other is
the coverage of chapters 4, 5, 7, and 8 (with short pieces of 6 and 10).
As mentioned in class, 4 provides
background definitions of ARC assembly -- not to be explicitly tested, but
indirectly questioned by being relevant to questions regarding chapters 5
and 7. For chapter 5, the exam focus will be on microcoding: generating
microcode bit patterns, tracing execution of microcode, and writing short
sections of microcode (see, for example exercises 5.6, 5.7, 5.8, 5.9a, 5.10a,
5.11, 5.14, 5.16, and 5.18 from text). In chapter 6, we looked at section
6.5.1 -- relevant exercises from the text are 6.16, 6.18, and 6.19. We also
looked at section 10.1.1 of the text, but where the book uses p in Amdahl's
Law (for number of processors in a parallel setting), we used s-enhanced for
speedup of the enhanced part. Question 10.2 is the closest we find to a
relevant exercise in the text. Remember that when using s-enhanced rather
than p, the same formula can be used to talk about speedup of memory
accesses or disk accesses or instruction interpretation (basically anywhere
you want to talk about how speeding up an aspect of a task impacts the
speed of the whole task). In chapter 7, our main interest was section 7.5
on caching -- a typical problem being to evaluate how different caching
methods impact the performance of a specific assembly program. Besides
the problem we worked in class, exercises 7.6, 7.7, 7.8, 7.10, and 7.11
from the text are of interest. In chapter 8, my interest was in disks,
making exercises 8.4, 8.5, 8.6, 8.7, 8.8, and 8.9 of interest.
In terms of the big questions from the midterm, short questions relating to
the material from 6, 10, and 8, will probably be merged together into one
big question.
Major themes from the midterm were (repeat of what was posted at midterm time):
- ability to determine number of transistors in a CSD circuit
and depth of a CSD circuit. note CSD assumes certain gates are
primitive and the depth and number of transistors for those gates
follows the CMOS designs in Appendix A of the text.
- you should be familiar with the basic properties of Boolean algebra
(Table A.1) and in particular, be able to use DeMorgan's Law to reduce
depth and number of transistors in circuits formulated using AND, OR,
and NOT definitions that build on the primitives of the CSD system.
- you should also be familiar with truth tables and sum of products
formula derived from such tables.
- you should be able to both read and construct combinational circuits
in the CSD notation, including making appropriate usage of the run feature
of CSD.
- you should understand how to use as well as how to implement the
digital components of Appendix A.
- you should be able to trace the negative edge-triggered D flip-flop
and show how it works
on various inputs.
- you should be familiar with the Mealy model of finite state machines
(FSM). both being able to draw them to illustrate how the controller
for a particular architecture should work and how to implement them
using negative edge-triggered D flip-flops (either as a diagram or
in CSD).
- you should be able to translate simple C programs not involving
arrays into sequential circuits. both designing an architecture appropriate
to the problem and designing a controller that would cause the architecture
to compute the same thing as the original C program.
- you should be familiar with the terminology and methods of Chapters 1,
2, and up to (and including) 3.2.1 of Chapter 3 of the textbook.
- 6 Apr 2011: Sorting out h2 marking, I decided to mark the
assignments that passed all the test data (including the delay test)
on a scale from 7 to 10 and the assignments that passed all the test
data except the delay test on a scale from 6.5 to 9. The scale is
roughly logarithmic, i.e., double the performance does not mean double
the mark. Running H2RandomTester with the seed 0xCAFF33 on 128 test
points for those passing all the test points yielded the performance
list:
935M = 5470 * 21 * 8140
1344M = 7928 * 23 * 7372
3193M = 6882 * 57 * 8141
3231M = 8254 * 32 * 12235
3683M = 11658 * 49 * 6448
10928M = 11010 * 47 * 21120
32594M = 9026 * 51 * 70808
For those who passed all the test data except delay:
927M = 6332 * 18 * 8140
1065M = 10762 * 23 * 4303
1288M = 6882 * 23 * 8143
2402M = 6814 * 82 * 4300
2907M = 9138 * 74 * 4300
84455M = 10282 * 116 * 70810
As things are running late at this point, it looks like the emails won't
go out until Thursday.
- 6 Apr 2011: With respect to the question of how do you generate
a CSD listing with line numbers, the unix utility nl does that (see
man page on nl).
- 6 Apr 2011:
A few people have noticed that as with past assignments, the h3 document
indicates that a maximum of 45% of the mark can be earned by circuits
that don't pass the posted testers. The question has been raised that
if one is handing in a non-working csd file, what can one do to maximize
the amount of that 45% one does get.
In the past, one thing you have been encouraged to hand in is a C program
that indicates the overall design/approach for handling the problem.
Initially, this meant code using if statements and assignments statements
limited to usage of components in the standard library of functions we
know how to do with CSD (i.e., whose code is in CSDTestData). With
controllers, we add a single while statement around a case statement
that switches on what state one is in. With ram, we observe that at
any given time, we can only get one value from ram, so any operation
that requires multiple values from ram would need to first store some
of them in variables, for example, if a[x] < a[y] should be written as
q = a[x] ; if (q < a[y]) . As always, the idea being that the C code
can be directly translated into CSD without further complication.
Thus, the C code is a compact way of presenting the high level design
of your code and should be completely debugged.
The next question would be how close did you get to translating your
C code into CSD. What you would want to provide on that issue would
be a line numbered listing of your CSD code and then an indication of
which lines of the CSD code correspond to each line of the C code.
For example a line like if x < y might correspond to line 45 of the
CSD code because line CSD is where the output of the comparison is
used in the calculation of next_state to transfer control in parallel
to the C code and to lines 7 and 15 where the multiplexors route the
values of x and y to the comparison circuit.
The highest level of non-working circuit would be one where plausible
CSD existed for all the parts of the C design (as described in the
previous paragraph) and it actually worked on some, but not all, of
the test data. To justify that level, providing some sample test data
to indicate what works and doesn't work would be useful.
As noted above, the most one would be aiming for here is 45% of the
assignment value. A well written C design as described in the paragraph
starting `in the past', would net 30%, so the remaining notes are about
positioning one's assignment within the remaining 15% gap.
The above documentation material can be handed in online as usual. If
you have already handed in a non-working assignment and you want to
add documentation but won't have it ready til Thursday afternoon
(after Wednesday midnight deadline), you can hand me a printed copy of
the above non-working documentation at Thursday's class (or office
hours). The latest online handin sets the time for late penalty
calculations and the non-working code documented has to be exactly the
code that was handed in online.
- 5 Apr 2011: The TA will have extra office hours for preparing
the the final exam once classes are over:
Friday: 10:30-12:30 (2hrs)
Monday: 10:30-11:30 (his regular hour)
Wednesday: 11:30-1:30 (1 additional hour following my regular)
- 4 Apr 2011: By the way, you may recall that during the week h2 was
originally due a question came up about the meaning of holding the answer
til the next request. A tester to address that issue has been put in
gaul:~webber/CS350 under the name H2Delay.java (of the 13 h2s that passed
H2PreTester, 6 failed H2Delay). At the moment it is unclear how this is
going to resolve marking wise, but, as stated originally, people whose
solutions conform to the spec should be preferred.
My solution can also be found under the name h2d in the same directory.
- 4 Apr 2011: Am currently looking at runs on H2RandomTester where
numberTestCases is set to 32 and randomSeed is set to 0xCAFF33 .
- 4 Apr 2011: Of 16 h2s that compiled, 13 passed H2PreTester.
- 4 Apr 2011: Of 17 h2s handed in, only one wouldn't compile.
- 4 Apr 2011: Regarding h2, to get a sense of what was handed in,
the following, as yet unverified claims, were made in people's README
files:
10762 * 23 * 2839 = 702 million 31/.7 = 44.3
6332 * 18 * 8377 = 954 million 1/.7 = 1.4
5470 * 21 * 8377 = 962 million 1/1 = 1
7830 * 23 * 7097 = 1,294 million 1.3/1 = 1.3
6882 * 23 * 8342 = 1,320 million 1.3/1.3 = 1
6814 * 82 * 2836 = 1,584 million 1.6/1.3 = 1.2
9138 * 82 * 2836 = 1,917 million 1.9/1.6 = 1.2
11658 * 49 * 4375 = 2.499 million 2.5/1.9 = 1.3
8254 * 32 * 10874 = 2,872 million 2.9/2.5 = 1.2
6882 * 57 * 8749 = 3,432 million 3.4/2.9 = 1.2
7156 * 45 * 12681 = 4,083 million 4/3.4 = 1.8
--------- 7 billion was the declared 75-80 mark range 7/4.1 = 1.7
9202 * 48 * 35131 = 15,517 million 15/7 = 2.1
9026 * 51 * 67111 = 30,893 million 31/15 = 2.1
10282 * 116 * ?
no README
no README
not working
- 30 Mar 2011: CSv2.110330.tar.gz has just been released in ~webber/CS350
on gaul. The only change was to include a new tester called
H3RandomTester.java that lets you test your circuit on random arrays of
size 64. My sample solution has the same performance on this tester as
it did on H3PreTester.java . I have also updated
h3.html. Among other things, I decided not to set a maximum number of
transistors for homework 3. I also added a question to the README file
asking what sort algorithm you based your circuit on. Note: at this point
no one has handed in a solution to homework 3.
- 25 Mar 2011: As emailed to class:
Finally got a chance to finish a sample solution to H3. I implemented
an O(n**2) sort using already built components from previous assignments
and ended up with 19,122 transistors, a circuit depth of 33, and to
sort a list of 64 items (max_index of 63) it took 85563 clock cycles.
Clearly optimizing components that we have been using in past assignments
and using a better sort should yield better results, so I figure this
is basically the 75 level for this assignment (19,122 * 33 * 85,563).
While debugging my solution, I noticed that H3PreTester was handling
max_index wrong (in that it was copying values to indices < max_index
rather than <= max_index). Also, I noticed that there was a place where
input data was being generated at 16 bits width rather than 17 bit.
Finally, when not in quiet mode, I found it useful to have something
printed each clock cycle so that you could tell if you were still in
data entry mode or had switched to data reading mode, so I added a few
print statements to the tester. Finally, the original tester from h2
had decreased the cycle count so that you weren't billed for resets, but
since there are no resets in this assignment, there was no need for the
decrease. All these fixes resulted in a new H3PreTester.java which is
in a new CSv2.110325.tar.gz file in gaul:~webber/CS350
p.s., as mentioned in class, the state machine I built had 19 states.
when working with that number of states, rather than having 19 wire
variables is_state_0, is_state_1, ..., it turns out to be much nicer
to have one wire array wire[19] is_state . As mentioned for homework 2,
putting your is_state values on the output line so that you can watch
your state machine move from one state to the next is very useful when
debugging.
- 17 Mar 2011: CSv2.110317.tar.gz is now available on gaul. You may
recall when discussing H3PreTester in class on Tuesday, it was noticed that
there was an error in the handing of the maxIndex instruction by the pretester
in the CSv2.110314.tar.gz version. This should now be fixed. However, I
still haven't built a sample solution and so the pretester continues to be
untested (although, RamTester , which it was based on, was tested).
- 17 Mar 2011: No new handins for last night.
- 16 Mar 2011: No new handins for last night.
- 15 Mar 2011: Have decrypted the files handed in last night and
sent emails to their senders letting them know they could be successfully
decrypted. Note: as noticed in class, H3PreTester has an error in it in
that it sets up the bit pattern for the maxIndex command but then doesn't
issue it. This will be fixed in the next softare release, sigh.
- 14 Mar 2011: Still coughing, Tuesday office hours cancelled.
- 14 Mar 2011: As of roughly 5:30pm, the most recent handin was
13:30 on 14 Mar. For everyone who has handed something in on h2 as
of then, I have been able to successfully decrypt the most recent handin
and send email to the person telling them that I did. If you haven't
gotten such an acknowledgement from me, let me know soonest and resubmit
your h2 solution since it didn't end up in the CS3350 queue. If you are
still working on h2, be sure and submit your best solution by midnight
tonight (Monday Mar 14th), to minimize the number of
late points it gets in case it isn't
fixed tommorrow. Note you can submit solutions more than once, only the
most recent submission counts (with the late penalty appropriate to its
handin time).
- 14 Mar 2011: A preliminary specification for h3 is now available
at h3.html. A new version of the CS system is
now available in my CS350 directory on gaul which supports the new
primitive RAM64X8 discussed at last Friday's class. There is also a
sample usage in CSDTestData called ram (and ram.nl). As mentioned Friday,
your task is to sort an array of 8-bit integers of length from 1 to 64.
For more details see the spec. As with h2, there is also included an
H3PreTester . Although it is set for a length of 64, initial testing should
probably be done on shorter lists (such as 4) as they should process much
faster. I have yet to create a sample solution, so have not yet set the
transistor size and/or depth and/or number of cycle limits on what is an
acceptable circuit for h3. I expect to do that later this week as well as
create a H3RandomTester which will be the basis for benchmark testing of
h3.
- 10 Mar 2011: To take into account various problems arising from me being
out the past few days, the h2 due date has been extended to the end of Monday,
March 14th. If you have any problems, please contact TA.
- 10 Mar 2011: Am still feeling a bit under the weather. My office hours
are cancelled for this week. As we get into the busy part of the semester,
you don't want to risk catching what I caught. Please raise any questions
you have with the TA.
- 10, 4,3,2,1 Mar 2011: By the way, for people wondering how fast random test
cases coverge on the theoretical average, so far on my design:
from seed: 12345L
2 runs, average 10759.5
4 runs, average 8492.25
8 runs, average 17175.375
16 runs, average 18991.5
32 runs, average 18041.344
64 runs, average 16572.453
128 runs, average 16288.5
256 runs, average 16488.348
512 runs, average 16375.031
or
from the see/bean : (long) 0xC0FFEE
2 runs, average 14448.5
4 runs, average 19310.25
8 runs, average 19125.0
16 runs, average 14348.625
32 runs, average 14917.344
64 runs, average 15212.094
128 runs, average 15083.773
256 runs, average 15525.867
512 runs, average 16191.111
(note: the theoretical average for my design should be around 16000).
In terms of run time with regards to these test cases, as I double the
number of cases, the run time doubles if the number of cycles held
steady -- since it doesn't, the runtime varies depending on how much
the number of average number of cycles increased. For example, in
going from 4 runs to 8 runs, it doubled for the number of runs, but
the average length of a run also doubled, causing the overall time to
run the simulation to increase by 4.
- 2 Mar 2011: An overview of the rest of the text/semester:
- Chpt 4: Introduces ARC (simplified version of SPARC) instruction
set. This is essentially a spec for lookinig at issues in implementing
a CPU design. As mentioned in the Preface and Appendix B, there is a
simulator for ARC implemented in Java provided by the textbook authors.
I don't expect to be using it, but it is there for anyone who is interested.
- Chpt 5: This discusses the main issues in implementing a circuit design
for a CPU. Section 5.5 introduces VHDL (VHSIC Hardware Design Language)
used by professional circuit designers (along with Verilog) who haven't
been exposed to the wonders of CSD.
- Measuring performance: Sections 6.5 and 10.1.1
- Chpt 7: Introduces how main memory works and issues relating to
hardware caching of memory accesses. Also virtual memory is discussed.
- Chpt 8: Introduces I/O devices, including internal communication
protocols and performance issues. Although having a fast CPU is important,
many programs are bottlenecked by their accesses to i/o devices and/or
main memory, so that 5, 7, and 8 all contribute important material
regarding understanding why some programs are faster than others.
- Chpt 6: The first part we will skip, but Example 6.2 gives us an
introduction to doing I/O programming in ARC. Section 6.5 talks about
how we measure the performance of a CPU. 6.6 thru 6.9 talk about some
advanced issues in improving the performance of an ARC CPU such as
pipelining, register windows, and power consumption.
- Chpt 10: introduces parallel computing and, in particular, Amdahl's Law.
- Chpt 9: Networking.
At this point, I expect to cover 4, 5, 7, and 8 as well as
sections 6.5 and 10.1.1. Beyond that, we will have to see how things
go.
- 1 Mar 2011: Regarding Homework 2. The maximum allowed circuit
size has been set to 12,000 transistors. The test program H2RandomTester
with seed 12345L is set to generate 8 test cases. Depending on the
efficiency of your solution, it may or may not be plausible to run
8 random test cases against it on Gaul. A new item has been added to
for you to record the number of test cases you were able to run.
It would be reasonable for a final timing run to target something roughly
equivlant to four times the amount of time it takes to run H1Tester on
my solution in CSDTestData/h1.nl on whatever machine you are using (on gaul,
this would mean slightly over a half hour). Note this is only an approximate
measure of the performance of your chip as I will be running it on a different
random seed and probably a longer test series for marking evaluation (using
a machine faster than gaul). In line with this, for a mark around 75 to
80, I am looking for a total performance figure (#transistors * depth of
circuit * average number of clock cycles per task) of 7 billion. Note:
this is average measured from a different random seed on a series at least 8
test cases long. The homework spec has been updated to include the above
(in bold) at can be found at h2.html.
- 1 Mar 2011: Notes on solutions to
the Feb 11 midterm
- 28 Feb 2011: An initial rough solution to h2 took 7116 transistors
with a depth of 73 and an average task cost of 33559.375 according to
H2RandomTester, for a performance value of 7116 * 73 * 33559.375 =
17,433,021,412.5 per task. A bit of restructuring brings it to
8094 transistors, 45 depth, and 33559.375 average per task or
12,223,331,156.25 per task. Starting to follow the optimization
discussed in the original h2 spec, I then produced a solution of
9054 transistors with a depth of 45 and a 17175.375 average per task
or a total performance measure of 6,997,763,036.25 per task.
- 28 Feb 2011: Regarding Tuesday's class, the midterm will be handed
back then. I expect class will be taken up with covering the exam and
fielding questions about h2, so it won't be til Thursday that we start
in on Chapter 4. I ended up marking the midterm out of 56, giving 1 100%,
1 in 90s, 4 in 80s, 3 in 70s, 3 in 60s, 5 in 50s, 2 in 30s, 2 in 20s
(21 total took midterm, average was 63.4% -- average passing mark was 71.37%).
- 28 Feb 2011: A new version of the simulator is in the usual place
on gaul under the name CSv2.110228.tar.gz . Two things of significance
about this version of the simulator: 1) on my h1 solution, it runs 4.7
times faster (old simulator took 41.3 minutes to verify on gaul and new
simulator takes 8.75 minutes to verify using H1Tester); and 2) a new
tester for H2 is included (H2RandomTester.java). The new tester,
H2RandomTester, seeds the random number generator with the long 12345L
and extracts 8 test cases from it. The significance of this tester is
that it is closer to how your h2 solution will be evaluated from a performance
point of view, and, in particular,
it should remind people that if inputs are uniformly randomly distributed,
half of them will be larger than 32,000. I expect to soon be revising
the h2 spec to include reference to this new tester. Other than the different
source of test points, it follows exactly the pattern of the H2PreTester
already available. As an added benefit, it automatically calculates
average cycles per task.
- 17 Feb 2011: With the midterm over, thoughts fondly turn to assignment
2. There is certainly another release of the CSD system coming out soon,
but I just wanted to remind you that the version released 12 Feb 2011 is
fine for getting started on assignment 2. The H2PreTester in that distribution
allows you to pick your own test points for your circuit. My goal for the
next release is to speed up the simulator, which will then let me know how
many test points can feasibly be used for the timing benchmark. Of course,
as many people observed during assignment 1, the smaller the circuit, the
faster it simulates. From the point of view of class room material, in
preparing for the midterm, everything you need to know to do assignment 2
has been explained. As mentioned before, after reading week, we will be
in Chapter 4 of the text.
- 15 Feb 2011: Looking at past exams of 2009 and earlier, they are
mostly wrapped up in the different circuit description notations we
were using at the time. However, last year, we were a bit more free form
about that and we had a bunch of homeworks and a take home final. An
indicator of which of these might be useful for practicing for the
exam can be found at oldProblems.html
- 13 Feb 2011: Working solutions to the first assignment came in
at the following performance values:
- 65,960 = 1940 * 34
- 124,852 = 2548 * 49
- 206,920 = 2956 * 70
- 264,364
- 273,504
- 276,912
- 281,561
- 301,688
- 312,060 (me) = 2972 * 105
- 459,040
- 3,258,716
- 4,454,830 = 13298 * 335
All the circuits under a million transistors had better depths than
my solution. The circuits under a quarter million combined better
depth with fewer transistors. It looks like many people gave up as
soon as they got a solution better than mine, but I suspect we still
haven't found the best performance solution possible.
There were some other large circuits that didn't work on the value
0x8000 (the most negative negative number). Since large circuits are
difficult to test, it is likely the tester wasn't run long enough to
turn this up (all claimed their circuits passed all tests).
In total, 7 circuits didn't work, of which 3 didn't compile.
- 12 Feb 2011: As the midterm nears, you may be wondering what sort
of questions you are expected to be able to answer at this point in the
course. Recalling that it is an open book/ open notes (no electronics)
exam, the questions will basically be problem solving of one sort or
another. Major themes for the exam will be:
- ability to determine number of transistors in a CSD circuit
and depth of a CSD circuit. note CSD assumes certain gates are
primitive and the depth and number of transistors for those gates
follows the CMOS designs in Appendix A of the text.
- you should be familiar with the basic properties of Boolean algebra
(Table A.1) and in particular, be able to use DeMorgan's Law to reduce
depth and number of transistors in circuits formulated using AND, OR,
and NOT definitions that build on the primitives of the CSD system.
- you should also be familiar with truth tables and sum of products
formula derived from such tables.
- you should be able to both read and construct combinational circuits
in the CSD notation, including making appropriate usage of the run feature
of CSD.
- you should understand how to use as well as how to implement the
digital components of Appendix A.
- you should be able to trace the negative edge-triggered D flip-flop
and show how it works
on various inputs.
- you should be familiar with the Mealy model of finite state machines
(FSM). both being able to draw them to illustrate how the controller
for a particular architecture should work and how to implement them
using negative edge-triggered D flip-flops (either as a diagram or
in CSD).
- you should be able to translate simple C programs not involving
arrays into sequential circuits. both designing an architecture appropriate
to the problem and designing a controller that would cause the architecture
to compute the same thing as the original C program.
- you should be familiar with the terminology and methods of Chapters 1,
2, and up to (and including) 3.2.1 of Chapter 3 of the textbook.
- 12 Feb 2011: h2 has been updated to reflect
the new version of the CSD system.
- 12 Feb 2011: A new version of the CSD system, CSv2.110212.tar.gz
is now available in my CS350 directory on gaul. Two things of interest
in it:
- In CSDTestData you can now find my solution to the first assignment
under the name h1 (C version under the name h1a.c)
- In CurrentSimulator, you can now find H2PreTester.java . H2PreTester
lets you do some preliminary testing on your assignment 2 solution.
Unlike H2Tester, H2PreTester does not generate random test points. Instead
H2PreTester tests the values used to initialize a global static array
called testData (which I initialized to the values 0 through 7).
Current H2PreTester shows you the output of every clock cycle of your
circuit. If you don't want that, then just set the global static variable
quiet to be true (when true, the only output you will get is when your
circuit computes a wrong answer). There is also a global variable called
maxClockCycles which I use to make sure the tester halts even if the
circuit is an infinite loop. How it works is explained in a comment
underneath its declaration -- you can, of course, reset it to whatever
makes sense for your circuit design.
Still on the list of things to do is to write H2Tester and to pick an
efficiency level for mid marking range. To give you an idea what things
might look like before any optimization begins, my first solution was
a circuit that required 7116 transistors, had a depth of 73, and took 59
clock cycles (not counting resets) to computer the 8 fibmod values (values
0 through 7) in the supplied H2PreTester. I did this using components
that were already in CSDTestData. The state machine had only 3 states.
The number of signals between the architecture and the controller was
smaller than in GCD as well.
- 10 Feb 2011: Delaying h1 by a week has scrunched up h2 with the
midterm and reading week. So, we might as well delay h2 by a week
(making its due date the end of the day on Wednesday Mar 9th). Anticipating
the obvious, we also might as well delay h3's due date a week (make its
due date Wednesday April 6th). Note: April 7th is the last day of the
semester, so there isn't any further room for delays.
- 9 Feb 2011: Some clarifications were made to h2
as of 9pm today concerning values being held on the input and output wires.
- 9 Feb 2011: You may also find the wikipedia page on Fibonacci numbers
of interest.
- 9 Feb 2011: Spec for h2 is now available.
It is still missing a tester for it. I still haven't done a sample
solution for it. Also, the final version of the
assignment will include a maximum circuit size (for reasons noted in
the assignment spec).
- 8 Feb 2011: A new version of the CSD system can be found
on gaul in my CS350 directory with the name CSv2.110208.tar.gz .
This one includes in CSDTestData gcd.c and gcd1 (a CSD file for the
gcd problem) which are examples we will look at in class today.
I also found a few more places where Java throws an exception due
to a syntax error in one's CSD code and fixed them to generate a
CSD-specific error message instead. It is also worth noting that
the version of the simulator we are using is different from the
one used for the first homework in that I made modifications to how
it handles flipflops to simplify things a bit.
- 7 Feb 2011: A new version of the CSD system can be found
on gaul in my CS350 directory with the name CSv2.110207.tar.gz .
This one introduces the notion of building sequential circuits using
the keyword begin_sequential_chip and the predefined FLIPFLOP primitive.
The relevant documentation has been added to the end of CSD.Manual .
A new example, counter4 has been added to CSDTestData and a new
Tester, ResetTester has been added to the simulator (it was used to
test both test9 and counter4). I expect to have another release tomorrow
with the gcd example we discussed in class last Friday.
- 7 Feb 2011: As of 9pm, nothing new has been handed in since last Friday.
- 5 Feb 2011: messages acknowledging successful decryption of all but
one handin as of 22:20:23 on Fri Feb 04 were just emailed out. So far,
18 people have handed in something -- one person handed in files, but not
yet an encryption form and another person has handed in an encryption form,
but not yet a handin.
- 5 Feb 2011: In response to CSD code that was causing the compiler
to generate a Java exception, I was able to track down the issue and
cause the compiler to give better error messages for certain types of
syntax errors (such as a wire name that starts with a digit). This
new release can be found on gaul in my CS350 directory with the name
CSv2.110205.tar.gz .
- 4 Feb 2011: messages acknowledging successful decryption of all
handins as of 5:33pm today have just been emailed out.
- 4 Feb 2011: Just tracked down an interesting error in a CSD circuit.
The problem was the person had not set a value to zero (the least significant
wire in a left shift in this case) by assigning GND() to it. Instead, they
had assumed it would default to carrying a zero. While it does actually
default to zero, it still causes a problem in the simulator. As mentioned
before, the simulator does a breadth first traversal of your circuit to
compute its output. How this actually works is that all the input wires
and all the wires that are directly connected to GND() or POWER() are put
on a list as wires whose incoming value is known. Then all the gates that
they are inputs for are visited and a count on the number of known inputs
for that gate is incremented in each case appropriately. As soon as all the
inputs to a gate are known, the output wire of that gate becomes known and
the process is continued. So, if you have a wire that never gets set to
a value, then any gate that uses that wire as an input never gets updated
meaning that the output of that gate is never known so anything looking for
that output never gets updated and so on. The effect is that an unset wire
can cause a large number of gates to never be processed, so when their
outputs are read, you always see the default value of 0.
This raises the question of how can you tell if this is happening in your
circuit. Currently there is not an easy way, but there is a new CSD release
(CSv2.110204.tar on gaul in my CS350 directory) that makes it possible.
In the CurrentSimulator subdirectory, I have added a new tester called
QuickTester . What it does is run all inputs of 0 into your circuit and
then dump the value on every wire in the circuit. When it dumps the value
of a wire in the circuit, it also prints the depth of that wire. The wires
that are causing the problem all show up with a depth of 0. Unfortunately,
there are wires whose depth actually is 0 (specifically the input wires and
the wires that are directly connected to either GND() or POWER()). So,
in order to tell if you have this problem, you need to look at the .nl
file . At the end of the .nl file there is the keyword OUTPUTS followed
by a list of wire names starting with the letter t (the list ends with
the keyword END_OF_CIRCUIT). These correspond to the wires on the output
side of the the_chip definition. If any of these wires has a depth of zero
and it is the output of a gate, then you have the problem and this wire
is part of the circuit that isn't functioning properly.
That was the easy part, now to find what is causing the problem, basically
you have to trace the wire backwards through the circuit described in the
.nl file using the output of QuickTester to tell you which wires have depth
of 0 and which don't. You are only interested in the wires with depth of
0. Eventually you will reach a wire that is not the output of any gate,
nor POWER, GND, or an input wire (those appear in the .nl file between
the keywords INPUTS and OUTPUTS). At that point, you look at the comments
in the .nl file and should should indicate where in the CSD file you need
to start trying to track the problem from. By track back, let us say
that you are worried about the wire t48. In the .nl file, if you saw
JOIN t23 | t48
this would mean that the value on t48 actually comes from the wire t23,
which you would then need to investigate. If in the .nl file you say
NOR t43 t44 | t48
then you would know that t48 was the output of a NOR gate whose inputs
were t43 and t44. At this point, you would look up the depths of t43
and t44 -- expecting that the trouble lies along the one whose depth
is zero. Eventually, tracking back through these wires of depth zero,
you should encounter one that is not on the output side of any gate
(POWER and GND are gates in the .nl file that have no inputs) and is
not an input wire. At which point, you have found the source of the
problem.
Clearly it would be best not to have such a problem in your CSD file.
I will certainly be giving some thought as to finding a better way to
deal with this sort of problem. But the above should let you get h1
done.
- 3 Feb 2011: As mentioned in the h1 spec, I have been sending out daily
confirmation emails telling people that the files they have handed in
decrypted successfully. I just finished sending out the batch covering
all files submitted up til 1:44 pm on Thursday 03 Feb 2011. So far, 16
people have submitted something and 17 people have given me encryption
key forms.
- 2 Feb 2011: Well, it appears we are the only university in Ontario
that didn't close today -- I am sure you are proud of the administration
hanging in there and not being intimidated by the storm. As you are finishing
up assignment 1 (due at the end of today, 2 Feb 2011, online via the submit
program on gaul), I thought it might be useful to remind you of a few points:
- This first assignment is worth 10% of the total mark. The late
penalty for it is 1% off the total course mark (10% of the value of the
maximum value of the assignment) per day late (a week has 7 days). There will
be part marks below 50% of assignment for non-working circuits. Working
circuits are judged according to the product of number of transistors and
depth (as described in the assignment spec).
- So far 13 people have handed in their encryption key forms for the
gaul submit program and 11 people have submitted something so far (3:30 pm).
You can still hand something in on time without the encryption form, but
in order for it to be counted as on time, you will need to get the encryption
form to me on Thursday (either at office hours 2-3pm or after class 5:30 - 6).
Be sure and read the instructions on the encryption form.
- The midterm is 3:30 - 5:30pm on Thursday Feb 17th. Roughly two weeks
from now. It counts 25% of your mark. It will involve some circuit tracing,
some CSD writing, some C translation, etc. In the textbook, we will certainly
have finished chapters 1, 2, and 3 and Appendix 1. After the midterm, we
will be moving through the textbook at a quicker pace -- at that point
we will have covered most of the stuff about how to build circuits, so
the material will be simpler.
- The second assignment will be due March 2nd and will also count 10%
of the mark. It will also use the CSD system, but I will add to CSD a
primitive for the negative edge-triggered D flipflop we talked about in
class on Tuesday. The task and software release will go out next week.
Don't forget, between now and March 2nd, there is reading week.
- After the above, we will be left with one last assignment (counting
15% of mark, also in CSD) and a cummulative final (counting 40% of mark).
- 1 Feb 2011: By the way, it is unclear what the weather is going
to be on the 2nd. You may recall in early December there were some
storms that actually caused school to close. If UWO closes school
due to snow on Wednesday, then the due date will become the first day
they open school back up.
- 1 Feb 2011: Got a complaint about the compiler generating a Java
exception instead of a reasonable error message in one situation. In
tracking down the problem, found out that expressions inside a define
weren't being checked to make sure the sizes match up. For example,
if a run wanted 6 wires and you gave it 10, it just took the first 6
and went along without any error message (even though you probably
didn't intend to give it 10 wires). If, however, it wanted 6 and you
gave it 4, then you would get a Java exception about array being out
of bounds on some line in the Vector code. I have now fixed the compiler
so that it catches these situations and gives the appropriate error messages.
The new compiler is on gaul in my CS350 directory under the name
CSv2.110201.tar.gz . If you have a circuit that isn't working right,
it would probably be worthwhile downloading this new version and see
if it gives you better error messages about your CSD description.
Because some cases end up not generating an error message from the old
compiler, it is theoretically possible that there are some CSD codes
that currently work, but don't make it through the new compiler.
I will use the old compiler for compiling circuit descriptions that
claim to work, so if you handed in something that worked under the old
compiler, that won't be a problem. If for some bizarre reason you manage
to build something that only works under the new compiler, make note of
that in the README file you hand in.
- 29 Jan 2011: Have heard some people are finding the test runs
unusually slow. There are a few possible issues involved here and
some ways to make things better.
- First thing to note is that ExhaustiveTester runs much slower
than H1Tester. This is because ExhaustiveTester outputs each row
and usually file i/o is slower than simulating a single test point.
- Both H1Tester and ExhaustiveTester have a similar program structure
(since the source is included, you can see it quite easily).
In the main of both, there is a do-while loop that runs through all
the values of a variable called value. They are set up to run from
0 to -1 on oldvalue which is showing the sign extended version of
the 16-bit value stored in value which runs from 0 up. Rather than
run through all these cases every time you are testing your circuit,
you could run through a smaller range (which would go much faster).
For example, from -8 to 8 or whatever range of numbers covers the
issue you are currently trying to fix in your circuit. Of course,
before handing in, you are going to want to run over the whole range.
- It is worthwhile noting that the larger your circuit, the slower
the simulator. The simulator is basically a breadth-first traversal
of the graph that represents the circuit, which in your case should
be a forest. The simulator has checks in it for cycles in the circuit
graph which have been used by at least three previous classes and so
should be working. All the changes I have made so far to the simulator
have been to the interface (specifically allowing the testers to be
built) and not to the basic algorithms themselves.
As you know,
the size of my circuit was roughly 3000 transistors and a depth
of roughly 100. To see how many internal nodes there were in the graph,
I grepped for NAND and came up with 291 hits
(grep NAND h1.nl | grep -v define | wc and then subtract 3 for table
definitions) and
similarly 142 hits for NOR, making it a graph of roughly 1300 internal
nodes (not counting POWER and GND).
The h1 text file containing all the definitions was
a puny 172 lines (4,678 characters) which was compiled to a larger nl
file of 11351 lines (357,184 characters).
H1Tester on it on a 2.26Ghz Quad-Core Intel Xeon took 3 minutes.
I would expect runtime to increase roughly linearly with the number of internal
nodes in the graph.
- If your circuit is very large, it is possible for java to run out
of memory. This is usally an internal java thing rather than that your
computer has actually run out of memory. Most java compilers allow you
to set the amount of memory that will be allocated to the heap (which is
were this program does most of its work). So, one thing to consider is
increasing the heapsize above your default settings -- if you are seeing
out of memory errors.
- The other way you could run out of memory is you were running
ExhaustiveTester, is to not have enough file space for the resulting
output file. On the full h1 circuit, the ExhaustiveTester output
would be 6.5 megabytes (which you didn't really want to read anyway).
- Gaul is probably a bit busy right now, but if you are running on
your own computer, you may find some benefit from parallelizing the
test run. As mentioned before, H1Tester starts at 0 and increments by
1 until it reaches -1. If you aren't doing anything else on your home
computer, then making two copies of H1Tester, for example, H1TesterEven
and H1TesterOdd, and then have H1TesterEven run value from 0 in increments
of 2 until it reaches -2 and H1TesterOdd run value from 1 in increments of
2 until it reaches -1. In theory, this might make things run twice as
fast, although actual speedup would vary with the computer system used
(and in rare cases it can actually run faster than twice as fast as we
will discuss later in the semester). One of the early usage of large
multiprocessor computers was for circuit testing since the process
parallelizes so simply (as you can imagine on chip designs involving
billions of transistors simulations can be quite slow and there are a lot
more cases to cover than in our situation).
- 25 Jan 2011: Also in class mentioned TA's email address (special
for course) on gmail, see in italics just below office hours list at
top of this web page.
- 25 Jan 2011: Mentioned that the late penalties associated with
the encryption key form mentioned on the form itself are overridden by
the penalties discussed in h1.
- 24 Jan 2011: h1 has been updated to include
a comment regarding a performance of 312,060 on a working circuit would
be worth between 75 and 80 out of 100. Also, a note on how it is ok to
hand stuff in multiple times (for example, if you realize a better way to
do the circuit after handing the circuit in early).
By the way, in working out my solution, I first built a C algorithm as
discussed in class and spent a bit of time debugging it. Having done
that, when I translated it to CSD, I had only two errors that the
compiler didn't catch. In both cases, the problem was I had the inputs
to a word mux backwards. As you have hopefully figured out by now, it
is hard to put a print statement in a circuit design. What I did to track
these down was make a copy of the CSD design into another file and then
change the output of the chip to be whichever wires I was worried about
not being right -- if you will, a print statement where you only get one
print statement per run rather than being able to drop them in at many
points in the circuit at once. While there is no limit on the number
of output wires you could use for debugging, there is a limit to the
number of 1's and 0's one can easily make sense of at a given time.
- 21 Jan 2011: By the way, I just sent out email to the class list
mentioning that the homework 1 spec was available. In two cases,
the mail program refused to deliver because the student's email account
@uwo.ca was over quota. Email communication is part of the marking
process in computer science, so I thought it worth reminding you that
you need to keep an eye on your @uwo.ca account to make sure it doesn't
go over quota and so you miss warnings, etc.
- 21 Jan 2011: The assignment spec for homework 1 is now available
at h1. There shouldn't be anything surprising
here about the actual task, but there are a lot of details about
handing stuff in that we will probably discuss a bit in next week's
classes (the remaining homeworks/assignments will have similar
handin processes). The only thing missing from the spec is the
performance value that I think is reasonable (i.e., worthy of
roughly 80 percent of the mark). Once I have solved h1, I will
post an update including that value, but hopefully you are already
starting on the assignment by now.
- 21 Jan 2011: A new version of the software is availabe as
CSv2.110121.tar.gz in the usual place (on gaul in ~webber/CS350).
The main item of interest in this release is H1Tester.java in
the CurrentSimulator subdirectory. Once you have an h1.nl file,
you can see if it works (after compiling the simulator) by typing:
java H1Tester < h1.nl
this will then print out a message saying
listing the data points that are in error:
If the next thing after this is the line "Statistics:", then there
were no errors and the circuit should be fine in terms of correctness
(this program does exhaustive testing, so it runs all 64000+ possible
inputs for this circuit). If there are inputs that don't work, then
it tells you what the input was, what the circuit produced, and what
the answer should have been (on three adjacent lines) followed by four
dashes to separate it from the next error (if there are more than one
errors). At the moment, I still haven't started to implement the h1
circuit. So first I built another tester called AdderTester.java
(also in same directory) from ExhaustiveTester.java and used it to
test adder and a new file called
adderBroken (where I introduced an error into the working adder file).
Then I made minimal modifications to AdderTester.java to produce
H1Tester.java . Finally, I created a new circuit called h1Broken
(also in CSDTestData directory) which takes the 16-bit input and
zero-fills it to make it a 32-bit output result -- very efficient,
zero transistors and zero depth :-) . Like a stopped
clock, this circuit only gets the right answer on one test point.
Anyway, I ran it through H1Tester.java . As expected, I got a list
of all the 16-bit shorts (except 0) and the corresponding floating point values
(where it tells me what the answer should have been). So I am pretty
confident that H1Tester works.
- 20 Jan 2011: A new version of the software is available as
CSv2.110120.tar.gz in the usual place. The only changes are some new
examples, many prepared specifically for today's class.
- 19 Jan 2011: By the way, soon there will be a detailed spec for homework
1 (the central part of the spec can be found below dated 14 Jan 2011 (the
message furthest down the page with that date)). One issue with the homework
is handing it in. The only mechanism allowed for handing the homework in is
the submit program on gaul (this will allow us to do our own testing of your
assignment). As usual, this is done by:
submit cs3350 stuff
what is a bit unusual is that when you do this, it will ask you for your
encryption key. This way assignments don't lay around in plaintext where
anyone could read them. In order for this to work, it is necessary for
me to know the encryption key you used. Online mechanisms are pretty useless
for this sort of thing, so it involves you filling out a paper form.
I hope to bring copies of the form to class Thursday. In case you lose
the form or need another, both pdf
and postscript versions are available
(it should print to just one page). The form is only valid if handed directly
to the professor (not TA or random other trustworthy looking person) and the
person handing in the form presents valid identification (school id or
driver's license). After class and office hours are good times to handle this.
- 19 Jan 2011: Software that should allow you to do homework one is
now available. It can be found on gaul in the directory /faculty/webber/CS350
under the name CSv2.110119a.tar.gz . First you are going to want to create
a private directory to work in and copy this file into it. On gaul, you
would then open it by:
gunzip CSv2.110119a.tar.gz
tar xvf CSv2.110119a.tar
You can then enter it by
cd CSv2.110119a
There you will find the java source for the circuit language compiler.
You want to build the compiler by typing
javac *.java
Also in this directory, you will find a copy of the manual I handed out
in class in the file CSD.Manual . You would probably describe this as
an alpha level software release. All the testing for it so far can be
found in the directory CSDTestData . Currently in that directory are
nine simple test files and the more interesting file adder (which is a
circuit that does four-bit addition). If $t refers to the CSDTestData
directory, then the testing of this file went as follows. From the
main directory I did
javac *.java
java CSDDesign < $t/adder > $t/adder.nl
I use the suffix nl to indicate a netlist, which is a low level description
of your circuit suitable for simulation (or potentially manufacture).
Then I simulated it by
cd CurrentSimulator
javac *.java
java ExhaustiveTester < $t/adder.nl > $t/adder.result
The file adder.result essentially is a truth table for the boolean function
corresponding to your circuit. Since adder had two 4-bit inputs and one 4-bit
output, it is 256 rows showing 8 columns of input and 4 columns of output.
At the end of the file, you will find the performance statistics for the
circuit, in this case: 280 transistors and a depth of 17 (so, I would view
the quality of this implementation as 4760, if you look at the circuit, you
should find much room for improvement).
The program ExhaustiveTester automatically adapts to circuit size as long
as the number of input wires is less than 32. Thus it could be used for
testing your homework 1. However, the table of 64000+ rows would be a bit
large. So, in the next release, I expect to have a special tester for
homework 1 that will just output the results that are wrong. Making special
purpose testers is rather straightforward, as you can see if you look at
the ExhaustiveTester.java file. The `complicated' simulation stuff happens
in the other files that ExhaustiveTester users as libraries.
At the moment, I only know of three problems that you need to be aware
of:
- If you use a name that is not defined anywhere in the program, the
compiler assumes it is a define that it will see later. If this occurs
in a place where an expression of some sort is expected, the error message
you generally see is that the next token is not a left parenthesis (which
is what should follow a define name). Looking at the code, sometimes this
message can be puzzling, if you aren't aware of what is causing it.
- Right now, there is no check to see if you forgot to put a value on
the input side a wire. I hope to fix that in a later version. I think
wires whose value are undefined currently behave as if they were grounded,
but don't rely on it as that is subject to change.
- The other wiring mistake is to put two different inputs on the same
wire. Right now that is not caught by the compiler. So it is not until
you try to run the simulation that you get an error message referring to
one of the cryptic wires in the netlist file. Hopefully a later version of
the compiler will catch this error sooner, but for now your best bet is
to open up the netlist file and try to figure out which line(s) of your
code are relevant to the problem. There are comments in the netlist code
to help with this process.
- 14 Jan 2011: One operation that is relevant to homework 1 is shifting.
On Tuesday, I will spend some time talking about shifting. If you are trying
to get started this weekend, the textbook isn't really good on this particular
circuit, you may find
- 14 Jan 2011: As noted above, we now have office hours for both prof
and TA. Also, as noted, the TA office hours don't start til the Wednesday
the 19th -- on Wednesday, he will hold an extra office hour so that it will
run from 11:30 - 1:30 on the 19th only.
- 14 Jan 2011: As mentioned in yesterday's class, the problem for
homework 1 will ask you to design a circuit that has 16 input wires
carrying values to be interpreted as 16-bit 2's complement signed numbers
and 32 output wires that your circuit will put values on corresponding to
the 32-bit IEEE 754 single-precision floating point number with the same
`value'. The size of the input was specifically chosen so that there
wouldn't be any rounding involved -- but normalization is still an issue.
All you need to know about IEEE 754 for this assignment is in Chapter 2 of
your textbook. The notation that you will use to describe the circuit is
the CSD system discussed in yesterday's class. Details on how to run
a simulation and test the circuit design should appear next week. As per
the 10 Jan 2011 announcement below, the due date for this is 2 Feb 2011.
As per the course outline, these are individual (not group) homeworks.
As discussed, the quality of an answer will be measured by its correctness
and its efficiency (where efficiency is number of transistors required times
depth of circuit as computed by the simulator).
- 10 Jan 2011: Looking at the way things are developing, we would
be better off if the due date for H1 were moved from 26th of Jan (a
Wednesday) to the 2nd of Feb (Wednesday a week later). While I still
plan on announcing the assignment this week, all the tools for people
to do the assignment may not be in place til next week.
- 7 Jan 2011: Set office hours for prof (see top of this page).