The University of Western Ontario
London, Canada

Department of Computer Science 

cs 9864b

Software Engineering for Big Data Applications and Analytics

 
Course Outline – Winter 2017

 

Logistics and Instruction:

 

Class Venue

MC 316

Day and Hours

Fri 9.30 am – 12.30 pm

Instructor

Nazim H. Madhavji (last-name <<aatt>> geee-may-l J)

Office Hours:

Anytime you can catch me

 

Sessional Dates

Term begin

Mon 5th January, 2017.

First class

Fri 13th January, 2017.

Last class

Fri 7th  April, 2017.

Term end

Fri 7th April, 2017.

Reading week

Mon 20th  – Fri 24th  February, 2017.

 

 

Important Announcements

 

NEW/RECENT

OLDER

Date

Description

 

o   For all the course resources, please click here (restricted access):  click here.

 

 

 

12-Jan-2017

Please bring your laptops in the class, we will need them!

 

 

 

 

 

 

1. Introduction

The focus in this course is on the development, maintenance and evolution of applications dealing with large volumes of data (called “Big Data”). Data has generally been everywhere! This is therefore not new. With recent advances in technologies (e.g., ubiquitous computing, internet of things, cloud computing, etc.), however, it has become more practical to capture and process large volumes of both structured and unstructured data (e.g., patient records, traffic data, video data, images, sporting statistics, events, logistics data, on and on). Also, new kind of data, not previously existing, has become available with the advent of mass use of internet and communication technologies (e.g., online access, e-commerce, social media data, mobile data, and others). There is thus considerable and growing interest amongst organisations and institutions to analyse such data for their purposes. We are only at the beginning of this paradigm shift.

In this course, we shall focus on such topics as:

·       models of lifecycle processes in the context of Big Data environments;

·       technical processes for the development, maintenance, and evolution of Big Data applications;

·       underlying technologies for operational support for Big Data applications;

·       scalable data analytics; and

·       business models centered on Big Data.

This is an emerging area in the field of software engineering.  

Big Data refers to data sets on a massive scale, usually produced by, or obtained from, different sources. Initially, such data was characterised by the attributes: volume (amount of data), variety (different types of data), and velocity (speed with which data arrives and is processed). Subsequently, other data characteristics have emerged: variability (inconsistency in the data set); veracity (accuracy of the data upon which analysis depends greatly); complexity (inherent complexity involved in linking, connecting, and correlating data items from different sources, so that meaningful information can be inferred and conveyed to the stakeholders); validity (relevance for intended use); volatility (retention aspects, including change and length of life issues), and value (to the stakeholders). The first three characteristics are popularly referred to as “the 3 V’s of Big Data” and there are debates about inclusion/exclusion of other Vs in the core set of attributes.

Traditional database management and processing systems do not have the capacity or processing power to deal with Big Data. Thus, this called for creation of novel algorithms and system architectures to store, curate, manage and process Big Data.  The field of Big Data provides new opportunities for the analysis of such data and for discovering interesting trends and unknown or unforeseen relations among the data items. This technical domain is referred to as Data Analytics.

Researchers and practitioners in the software engineering and technologies community have recognised that there is a need to create novel architectures and frameworks to support the storage, curation, management and processing of big data sets. Thus, distributed reference architectures and programming paradigms have been developed recently, giving rise to concepts such as the Map-Reduce processing model. Examples of frameworks supporting this processing model are Hadoop and MongoDB. In addition, Big Data encompasses unstructured data, the modelling, searching and processing of which requires novel techniques such as Latent Semantic Indexing (LSI) and its variants.  

In parallel, the past few years have seen a paradigm shift in enterprise computing, moving from static, in-house, systems to large clusters of private, public, or hybrid systems (referred to as clouds). Furthermore, resources (e.g., infrastructures, platforms, and software applications) are provisioned as services (over virtual platforms) that are referred to, respectively, with the acronyms: IaaS, PaaS, and SaaS. Such services are provisioned under the paradigm of “service oriented” systems embodying service oriented architectures (SOA).

Given that “data” is at the centre of systems, it is inevitable that the “Data as a service” (DaaS) would be added as a model for conducting business. Examples services include: continuous data security for clients, provision of data, facilitating data sharing among collaborating partners, etc. With SOA, data may reside on any of the platforms, and services can be provided on demand to geographically remote clients.

This type of deployment gives rise to new architectures referred to as System-of-Systems (SoS) or Ultra-Large-Scale (ULS) systems. These systems produce massive amounts of transactions and internal data for monitoring purposes. The analysis of such data is critical for maintenance (perfective, corrective, and adaptive) and verification of such systems. For instance, Big Data Solution (BDS) components talk to each other and their subcomponents distributed across a cluster of computers. In this arrangement, for example, database failure to access data might be caused not by a defect in the database but by corruption in the underlying distributed storage. Detailed logs generated by the Big Data system can reach tens of terabytes. Data gathered from multiple systems require petascale storage. Manually analysing such data is not practical, thus calling for innovative techniques for supporting system recovery and maintenance.

To date, little has been accomplished as to how to use big data analytics to support SoS and ULS maintenance and evolution. Issues related to the collection, modelling, storage and processing of massive amounts of logged data for maintenance, evolution, compliance and verification purposes is a subject of emerging research.      

Big Data also needs to be represented and denoted by formalisms that facilitate efficient storage and processing. Conceptual modelling as well as knowledge management research has produced a number of frameworks that permit structural representation and semantic interpretation of large data sets. Also, the software engineering community has created novel meta-languages so that large data sets can be efficiently modelled. Example meta-languages include: the Meta-Object Facility (MOF), the Resource Description Framework (RDF), and the Web Ontology Language (OWL) that give rise to efficient data representation models such as Linked Data. Tools have emerged to support such modelling activities. Examples of tool frameworks for defining domain models and schemas include: the Eclipse Modelling Framework (EMF) and the XML MetaData Interchange (XMI).

While progress is being made in the technological areas of Big Data and Analytics, there is little movement in the area of disciplined development of “applications” for processing and generating Big Data. For example, little is known about lifecycle processes for: engineering requirements and architectures, and for testing applications with particular focus on Big Data.

 

2. Style of Course

This course relies almost exclusively on published papers and third-party reports. Students will be expected to have read scheduled material prior to attending classes where sessions will be driven primarily by discussions and questions and answers (both assessed throughout the term). Guest speakers will be invited as appropriate. In groups, students will be conducting an in-depth search of relevant literature, conducting critical analysis of this, and presenting their findings. In addition, there will be an application development class project focused on Big Data.

 

3. Learning outcomes

The following learning outcomes are anticipated:

·     Domain understanding of Big Data and Data Analytics

·     Literature review and analysis skills

·     Presentation skills and defence

·     Identification of research gaps

·     Reading and comprehension of literature on Big Data and Data Analytics

·     Understanding of engineering, maintenance and evolution of Big Data applications software

 

 

4. Course Evaluation

Description

% marks

Deadline

·       Individual work: Weekly readings of assigned literature.

·       summary of readings (weekly deliverables throughout the term)

 

20%

 

weekly

·       Group work: In-class discussions (Q&A)

 

15%

weekly

·       Group work: Topic presentation:

§  literature search on an approved topic

·       Approx. 5 (or more?) substantive papers expected

§  creating an analytic spreadsheet

§  creating a powerpoint presentation

§  presentation

 

 

 

20%

 

 

To be

 scheduled

·       Group work: Software development project

§  Preliminary: Core idea and core requirements

§  Intermediate-1: Refined requirements; Preliminary Architecture

§  Intermediate-2: Refined architecture; preliminary demo.

§  Intermediate-3: Near final demo.

§  Final: Delivery of documentation and system.

45%

To be

 scheduled

·       Attendance: mandatory

 

Minus 5 %

per class missed

 

 

 

 

Note:

(1)  The instructor reserves the right to adjust (lower or raise) a particular student’s marks for the listed components based on his judgment of the student’s participation in the course and on the articulated knowledge and understanding of the subject matter during the term.

(2)  In group work, each member is expected to contribute equitably. There will be peer reviews which will be considered in moderating an individual’s mark. See separate presentation on group and individual responsibility.

(3)  The grading criteria and detailed conditions, as applied to each evaluation component, will be described on the assignment/project/test/exam as appropriate.

(4)  Late submissions of assignments and projects will not be accepted, so please be forewarned to commence tasks upon assignment.

(5)  If for any reason any evaluation component cannot be adhered to by the instructor, the rest of the marks will be prorated.

(6)  For individual assignments, you are encouraged to engage in problem understanding with other class students or staff; however, the solution and the actual details of the work must be your own individual effort.

(7)  Attendance is mandatory! University regulated exceptions apply such as illness.

 

 

5. Prerequisite

·       Registration in a graduate program.

·       Undergraduate level course on Software Engineering (instructor’s discretion for equivalent background and experience).

6. Course Material

Selected weekly readings for in-class discussion.

 

7. Other

Email Contact

We will occasionally need to send email messages to the whole class, or to students individually. By default, email will be sent to the UWO email address assigned to students by Information Technology Services (ITS), i.e. your email address @uwo.ca or to your Computer Science Departmental email address. It is each student’s responsibility to read this email on a frequent and regular basis, or to have it forwarded to an alternative email address if preferred. See the ITS website for directions on forwarding email. 

However, note that email at ITS (your UWO account) and other email providers such as hotmail.com or yahoo.com establish quotas or limits on the amount of space available to you. If you let your email accumulate there, your mailbox may fill up and you may lose important email from your instructors.  Losing email is not an acceptable excuse for not knowing about the information that was sent.

Accessibility

Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.

Support Services

Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.

Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.

Additional student-run support services are offered by the USC, http://westernusc.ca/services.

The website for Registrarial Services is http://www.registrar.uwo.ca.

 

Academic Accommodation for Medical Illness

If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or other supporting documentation to your Dean's office as soon as possible and contact your instructor immediately.  It is the student's responsibility to make alternative arrangements with their instructor once the accommodation has been approved and the instructor has been informed. In the event of a missed final exam, a "Recommendation of Special Examination" form must be obtained from the Dean's Office immediately. For further information please see:

 

https://www.uwo.ca/sci/undergrad/academic_counselling/resources_and_self_service/forms.html

 

 

 

 

A student requiring academic accommodation due to illness should use the Student Medical Certificate when visiting an off-campus medical facility or request a Record's Release Form (located in the Dean's Office) for visits to Student Health Services. The form can be found here: https://studentservices.uwo.ca/secure/medical_document.pdf

 

Students who are in emotional/mental distress should refer to

 Mental Health@Western 

for a complete list of options about how to obtain help.

This section only applies if there is a Mid-term Exam in the current course schedule. There will be no makeup Midterm Exam, except for students requesting a Special Midterm Exam for religious reasons. These students must have notified the course instructor and filed documentation with their Dean's office at least 2 weeks prior to the Midterm Exam. 

If you miss the Midterm Exam for any other reason, follow the procedure for Academic Accommodation for Medical Illness given above. If accommodation is approved by your Dean’s office, your Final Exam mark will be reweighted to include the weight of the Midterm Exam. 

Assignments

Submission of Assignments: 

-        Instruction for submission of assignments will be described on the assignment description documents.

Assignment Marking:

-        All assignments will be marked primarily by teaching assistants, following guidelines developed by the instructor.

Appeals of Assignment Marks:

-        Appeals of assignment marks should be addressed to the TA first. If you and the TA cannot agree then the TA will discuss the situation with the instructor.

-        Appeals must occur within one week from the first day that the marked assignments were made available to students. After that one week period has gone by, no more appeals will be considered.

Assignment Backups:

-        It is your responsibility to keep up-to-date backups of assignment files in case of system crashes or inadvertently erased files. Keep electronic copies of all material handed in, as well as the actual graded assignment, to guard against the possibility of lost assignments or errors in recording marks.

Ethical Conduct

Scholastic offences are taken seriously  and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at the following Web site: 

http:// www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf

Plagiarism: Students must write their essays and assignments in their own words. Whenever students take an idea, or a passage from another author, they must acknowledge their debt both by using quotation marks where appropriate and by proper referencing such as footnotes or citations. Plagiarism is a major academic offence.

You may discuss approaches to problems among yourselves; however, the actual details of the work (coding, answers to concept questions, etc.) must be an individual effort.

The standard departmental penalty for assignments that are judged to be the result of academic dishonesty is, for the student's first offence, a mark of zero for the assignment, with an additional penalty equal to the weight of the assignment also being applied. You are responsible for reading and respecting the Computer Science Department's policy on Scholastic Offences  and Rules of Ethical Conduct .

The University of Western Ontario uses software for plagiarism checking. Students may be required to submit their written work and programs in electronic form for plagiarism checking.