Difference between revisions of "Introduction to Data Science I"

From Introduction to Data Science
Jump to: navigation, search
(Timeline (Tentative))
(Important Dates)
 
(22 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
 
 
== Course outline for COMPSCI 4414A/9637A/9114A ==
 
== Course outline for COMPSCI 4414A/9637A/9114A ==
 
'''The University of Western Ontario<br />'''
 
'''The University of Western Ontario<br />'''
 
'''London, Ontario, Canada<br />'''
 
'''London, Ontario, Canada<br />'''
 
'''Department of Computer Science<br />'''
 
'''Department of Computer Science<br />'''
'''Course Outline - Fall (September - December) 2017<br />'''
+
'''Course Outline - Fall (September - December) 2018<br />'''
 +
 
 +
<span style="color:#EE0000">Note that this course is in high demand. Now that those who submitted a proposal successfully have been registered, the course is open to all computer science students (who can register themselves online) subject to space availability. If there is space remaining after 21 September, students from other Departments and Faculties may be admitted. Those interested should attend lectures anyway.''</span>
 +
 
 +
<span style="color:#EE0000">'''Note that Master of Data Analytics students are exempt from this and will be registered in 9114A.'''</span>
  
'''From Dan:''' This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. <span style="color:#EE0000">Because of the volume of requests I receive, I am not able to manage a wait list. Students will have to monitor the registration website for available spots. However, all are welcome to sit in the room if there is space.</span>'''
+
=== Prerequisites ===
<!-- <span style="color:#EE0000">Therefore, '''all ''graduate'' students who are ''not'' in the MSc or PhD programme within the Department of Computer Science, and who are not in the MDA programme, must e-mail me a 1/2 page proposal sketch on the project they would like to pursue. (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 15 December 2016 and does not guarantee enrolment. Enrolment will be decided based on space available and quality of the proposal sketches.</span>''' -->
 
  
=== Objective ===
+
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''
 +
 
 +
=== Instructor Information ===
 +
 
 +
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363
 +
* '''Teaching Assistant''': Nathan Phelps - nphelps3 at uwo dot ca
 +
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM
 +
* '''Place''': Talbot College [http://www.music.uwo.ca/pdf/resources/TC-03.pdf '''TC-205''']
 +
<!-- * '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' -->
 +
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.
 +
 
 +
=== Course Description and Objectives ===
  
 
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which specific DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their findings to their peers in the class. '''Although this course does not assume prior machine learning or visualization  knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''
 
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which specific DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their findings to their peers in the class. '''Although this course does not assume prior machine learning or visualization  knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''
Line 18: Line 30:
 
* Like to '''read''' - have a desire to understand substantive problems
 
* Like to '''read''' - have a desire to understand substantive problems
 
* Like to '''think''' - make connections between methods and problems
 
* Like to '''think''' - make connections between methods and problems
* Like to '''hack''' - be willing to [http://en.wikipedia.org/wiki/Data_munging munge] data into usability
+
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability
 
* Like to '''speak''' - teach us about what you found
 
* Like to '''speak''' - teach us about what you found
 
=== Prerequisites ===
 
 
At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students.
 
 
=== Logistics ===
 
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363
 
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)
 
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM
 
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']
 
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' <!-- in Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC320''']-->
 
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.
 
  
 
===Important Dates===
 
===Important Dates===
* Pick Brainstorming Slot by Friday, 6 Oct at 5pm <!-- End of 4th Week -->
+
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week -->
* Project Proposal Due Friday, 27 Oct at 5pm <!-- End of 7th Week -->
+
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week -->
* Project Draft Due Friday, 17 Nov at 5pm <!-- End of 11th Week -->
+
* Project Draft Due Friday, 23 Nov at 5pm <!-- End of 11th Week -->
* Project Report Due Friday, 8 Dec at 5pm <!-- Last Day of Class -->
+
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class -->
* Paper Reviews Due Friday, 15 Dec at 5pm <!-- Week after Last Day of Class -->
+
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class -->
  
 
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)
 
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)
Line 44: Line 44:
 
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.
 
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.
  
=== Materials ===
+
=== Course Materials ===
 
* '''Required Texts'''
 
* '''Required Texts'''
 
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]
 
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]
Line 53: Line 53:
 
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.
 
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.
 
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse
 
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]
+
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' (Available through UWO Library)
 
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]
 
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]
 
* '''Other Resources'''
 
* '''Other Resources'''
Line 133: Line 133:
 
=== Evaluation ===
 
=== Evaluation ===
  
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].
+
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].
  
 
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].
 
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].
  
==== Daily Quizzes – 5% ====
+
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====
 
 
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''
 
 
 
==== Midterm - 35% ====
 
  
 
Assessing competencies from the fundamentals taught in the first half of the class.
 
Assessing competencies from the fundamentals taught in the first half of the class.
  
==== Brainstorming Session – 5% ====
+
==== Brainstorming Session – 10% ====
  
 
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.
 
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.
  
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====
+
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====
  
 
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.
 
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.
Line 161: Line 157:
 
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.
 
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.
  
==== Peer Review – '''9637 only:''' 5% ====
+
==== Peer Review – '''9637 only:''' 10% ====
  
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.
+
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.
  
 
==== Participation and Effort ====
 
==== Participation and Effort ====
Line 169: Line 165:
 
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.
 
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.
  
=== Accessibility and Support Available at Western ===
+
=== Accommodation and Accessibility ===
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.
+
 
Support Services
+
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in NCB 280, and can be contacted at scibmsac@uwo.ca.
 +
 
 +
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.
 +
 
 +
=== Academic Policies ===
 +
 
 +
The website for Registrarial Services is http://www.registrar.uwo.ca.
 +
 
 +
In accordance with policy, http://www.uwo.ca/its/identity/activatenonstudent.html,
 +
the centrally administered e-mail account provided to students will be considered the individual’s official university e-mail address. It is the responsibility of the account holder to ensure that e-mail received from the University at his/her official university address is attended to in a timely manner.
 +
 
 +
Electronic devices are not permitted for the midterm.
 +
 
 +
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at the following Web site: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf.
 +
 
 +
All required papers may be subject to submission for textual similarity review to the commercial plagiarism detection software under license to the University for the detection of plagiarism. All papers submitted for such checking will be included as source documents in the reference database for the purpose of detecting plagiarism of papers subsequently submitted to the system. Use of the service is subject to the licensing agreement, currently between The University of Western Ontario and Turnitin.com (http://www.turnitin.com).
 +
 
 +
Computer-marked multiple-choice tests and exams may be subject to submission for similarity review by software that will check for unusual coincidences in answer patterns that may indicate cheating.
 +
 
 +
=== Support Services ===
 +
 
 +
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Student Accessibility Services (SAS) at 661-2147 if you have any questions regarding accommodations.
 +
 
 +
The policy on Accommodation for Students with Disabilities can be found here: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_disabilities.pdf
 +
 
 +
The policy on Accommodation for Religious Holidays can be found here:
 +
http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_religious.pdf
 +
 
 
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.
 
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.
 +
 
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.
 
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.
 +
 
Additional student-run support services are offered by the USC, http://westernusc.ca/services.
 
Additional student-run support services are offered by the USC, http://westernusc.ca/services.
The website for Registrarial Services is http://www.registrar.uwo.ca.
 
  
=== Missed Course Components ===
+
=== Timeline (Tentative) ===
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible.
+
 
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.
+
* 6 Sep - Lectures:
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an
+
** 11 Sep - Lectures:
off-campus medical facility.
+
* 13 Sep - Lectures:
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.
+
** 18 Sep - Lectures:
 +
* 20 Sep - Lectures:
 +
** 25 Sep - Lectures:
 +
* 27 Sep - Lectures:
 +
** 2 Oct - Lectures:
 +
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures:
 +
** ''9 Oct - '''Fall Reading Week''' ''
 +
* ''11 Oct - '''Fall Reading Week''' ''
 +
** 16 Oct - Lectures:  
 +
* 18 Oct -  Lectures:  
 +
** 23 Oct - Lectures:
 +
* 25 Oct - '''Project Proposal Due 26 Oct at 5pm''' - Lectures:  
 +
** 30 Oct - Lectures:
 +
* 1 Nov - Lectures:
 +
** 6 Nov - Lectures:
 +
 
 +
* 8 Nov - Midterm Review and Q&A
 +
** 13 Nov - Midterm
  
== Timeline (Tentative) ==
+
* 15 Nov - GUEST LECTURE
  
* 7 Sep - Lectures: Welcome
+
** 20 Nov - Brainstorming: 1,2,3,4,5,6
** 12 Sep - Lectures: Data Preparation, Introduction to Statistics
+
* 22 Nov - '''Project Draft Due 23 Nov at 5pm''' - Brainstorming:  1,2,3
* 14 Sep - Lectures: Introduction to Statistics
+
** 27 Nov - Brainstorming: 1,2,3,4,5,6
** 19 Sep - Lectures: Supervised Learning
+
* 29 Nov - Brainstorming: 1,2,3
* 21 Sep - Lectures: Supervised Learning, Performance Evaluation
+
** 4 Dec - Brainstorming: 1,2,3,4,5,6
** 26 Sep - Lectures: Performance Evaluation, Model Selection
+
* 6 Dec - Brainstorming: 1,2,3
* 28 Sep - Lectures: Classification
 
** 3 Oct - Lectures: Classification, Performance Evaluation for Classification
 
* 5 Oct - '''Pick Brainstorming Slot by 6 Oct 5pm''' - Lectures: Nonlinear Classification
 
** ''10 Oct - '''Fall Reading Week''' ''
 
* ''12 Oct - '''Fall Reading Week''' ''
 
** 17 Oct - Lectures:
 
* 19 Oct - Lectures: '''Guest Lecture by Amanda Holden''' of SAS. Topic TBA.
 
** 24 Oct - Lectures:
 
* 26 Oct - '''Project Proposal Due 27 Oct at 5pm''' - Lectures: '''Guest Lecture by Dr. Kemi Ola''' on Visualization
 
** 31 Oct - Lectures:
 
* 2 Nov - Lectures: Midterm Review/Q&A
 
** 7 Nov - '''Midterm'''
 
* 9 Nov - Brainstorming: Ethan Jackson, *Zaid Albirawi* <br />'''9637 Slots 3:30pm-4:30pm''': Mahtab Ahmed, *Nick DelBen*
 
** 14 Nov - Brainstorming: Ashutosh Mishra, Brandon Glied-Goldstein, Jonathan Tan, Duff Jones, Patrick Carnahan, Nathan Phelps
 
* 16 Nov - Brainstorming: *slot1*, Gurpreet Singh, Erica Yarmol-Matusiak<br />'''9637 Slots 3:30pm-4:30pm''': Ruoxi Shi, Valeria Cesar, Mingda Sun, Xindi Wang
 
** 21 Nov - Brainstorming: Cole Fisher, Xiaoyu Yang & Sachi Elkerton, Felipe Urra, Tianzhi Zhu
 
* 23 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming:  Nanditha Rao, Jumayel Islam, Sabyasachi Patjoshi<br />'''9637 Slots 3:30pm-4:30pm''': *Hao Jiang*, *Abdelkareem Jaradat*, *Debanjan Guha Roy*
 
** 28 Nov - Brainstorming: *Yancong Wang & Jiayi JI*, Mohammad, Angela Zhao & Yanbing Zhu, Yu Zhu, *Gagan Verma*, *Zeyu Wang*
 
* 30 Nov - Brainstorming: *Marios-Stavros Grigoriou*, *slot*, *Paul Bartlett*<br />'''9637 Slots 3:30pm-4:30pm''': '''CANCELLED'''
 
** 5 Dec - Brainstorming: (Sanjay Ghanathey, Jenna Le, Tanvi Kumar), *Kun Xie*, *Nasim Samei*, *Jacob Hunte*, *Rifayat Samee*
 
* 7 Dec - Brainstorming: *Nima khairdoodt*, *Sana Ahmadi*, *Mohsen shirpour*<br />'''9637 Slots 3:30pm-4:30pm''': *Hengyu Yue*, *Zhongwen Zhang*, *Yifang Liu*, *Andrew Bloch-Hansen*
 
  
* '''Project Document Due Friday 8 December 5pm'''
+
* '''Project Document Due Friday 7 December 5pm'''
* '''Reviews (graduate students only) Due Thursday 15 December 5pm'''
+
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''

Latest revision as of 20:52, 10 September 2018

Course outline for COMPSCI 4414A/9637A/9114A

The University of Western Ontario
London, Ontario, Canada
Department of Computer Science
Course Outline - Fall (September - December) 2018

Note that this course is in high demand. Now that those who submitted a proposal successfully have been registered, the course is open to all computer science students (who can register themselves online) subject to space availability. If there is space remaining after 21 September, students from other Departments and Faculties may be admitted. Those interested should attend lectures anyway.

Note that Master of Data Analytics students are exempt from this and will be registered in 9114A.

Prerequisites

0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; and written permission of the Department obtained by applying as above.

Instructor Information

  • Instructor: Dan Lizotte – dlizotte at uwo dot ca – Office MC363
  • Teaching Assistant: Nathan Phelps - nphelps3 at uwo dot ca
  • Time: Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM
  • Place: Talbot College TC-205
  • Communication: We will be using OWL for electronic communication.

Course Description and Objectives

The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which specific DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their findings to their peers in the class. Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The lectures give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.

This course is designed for students who:

  • Like to read - have a desire to understand substantive problems
  • Like to think - make connections between methods and problems
  • Like to wrangle - be willing to wrangle data into usability
  • Like to speak - teach us about what you found

Important Dates

  • Pick Brainstorming Slot by Friday, 5 Oct at 5pm
  • Project Proposal Due Friday, 26 Oct at 5pm
  • Project Draft Due Friday, 23 Nov at 5pm
  • Project Report Due Friday, 7 Dec at 5pm
  • Paper Reviews Due Friday, 14 Dec at 5pm

Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)

Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of Friday, 6 Oct at 5pm or Dan will pick a slot for you.

Course Materials

  • Required Texts
  • JWHT: James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York: Springer. [Free through Western]
  • HTF: The Elements of Statistical Learning by Hastie, Tibshirani and Friedman. Expanded version of required text. [Free online]
  • LW: Leland Wilkinson's The Grammar of Graphics (2005). [Free from Springer]
  • ggplot2 book by creator Hadley Wickham (2016). [Free through Western]
  • Review if you need to catch up:
  • Calculus Review from Penn State University. Includes basic mathematical notation.
  • linear algebra review - up to and including Section 3.7 - The Inverse
  • Larry Wasserman's All of Statistics. (Available through UWO Library)
  • Devore, J. L., & Berk, K. N. (2007). Modern mathematical statistics with applications. 2nd ed. Springer. [Free through Western]
  • Other Resources

Topics (anticipated)

  • Introduction to Data Science
    • Definitions
    • Components
    • Relationships to Other Fields
  • Data Munging
    • Working with structured data: selecting, filtering, joining, aggregating
    • Web scraping
    • Simple visualizations
    • Sanity checking
  • (Re)-introduction to Statistics
    • Data Summaries
    • Randomness, Sample Spaces and Events, Probability
    • Random Variables, CDF, PMF, PDF
    • Expectation
    • Estimation
    • Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap
    • Inference: Hypothesis testing, P-values, Confidence Intervals
    • Multivariate Statistics: conditional probability, correlation, independence
  • Supervised Machine Learning, Predictive Models
    • Supervised Learning
      • Regression
      • Classification
    • Reinforcement Learning and Sequential Decision Making
  • Evaluation
    • Variance: Test set, cross-validation, bootstrap
    • Bias: Confounding, causal inference
  • Unsupervised Machine Learning, Representations, and Feature Construction
    • Clustering
    • Dimensionality reduction
    • Domain-specific Feature Development
      • Images
      • Sounds
      • Text
  • Visualization
    • Topics to be determined

Evaluation

There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. Graduate students (9637) will additionally submit peer reviews of other class projects. For detailed requirements, see Project Guidelines.

Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [2].

Midterm - 4414/9114: 35% 9637: 30%

Assessing competencies from the fundamentals taught in the first half of the class.

Brainstorming Session – 10%

Each student will prepare a presentation explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be no more than 10 minutes. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback from the brainstorming session.

Project Proposal – 4414/9114: 15% 9637: 10%

Document detailing the plan for the project. See Project Guidelines for detailed requirements.

Report Draft – 5%

A draft of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.

Project Report – 35%

Each student will prepare a research paper detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.

Peer Review – 9637 only: 10%

Each graduate student enrolled in CS9637 will prepare two reviews of their classmates' work.

Participation and Effort

Success of the course as a useful learning experience hinges on active participation and effort of the students. Students are expected to attend all classes and are expected to actively participate in the brainstorming sessions.

Accommodation and Accessibility

If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in NCB 280, and can be contacted at scibmsac@uwo.ca.

For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.

Academic Policies

The website for Registrarial Services is http://www.registrar.uwo.ca.

In accordance with policy, http://www.uwo.ca/its/identity/activatenonstudent.html, the centrally administered e-mail account provided to students will be considered the individual’s official university e-mail address. It is the responsibility of the account holder to ensure that e-mail received from the University at his/her official university address is attended to in a timely manner.

Electronic devices are not permitted for the midterm.

Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at the following Web site: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf.

All required papers may be subject to submission for textual similarity review to the commercial plagiarism detection software under license to the University for the detection of plagiarism. All papers submitted for such checking will be included as source documents in the reference database for the purpose of detecting plagiarism of papers subsequently submitted to the system. Use of the service is subject to the licensing agreement, currently between The University of Western Ontario and Turnitin.com (http://www.turnitin.com).

Computer-marked multiple-choice tests and exams may be subject to submission for similarity review by software that will check for unusual coincidences in answer patterns that may indicate cheating.

Support Services

Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Student Accessibility Services (SAS) at 661-2147 if you have any questions regarding accommodations.

The policy on Accommodation for Students with Disabilities can be found here: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_disabilities.pdf

The policy on Accommodation for Religious Holidays can be found here: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_religious.pdf

Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.

Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.

Additional student-run support services are offered by the USC, http://westernusc.ca/services.

Timeline (Tentative)

  • 6 Sep - Lectures:
    • 11 Sep - Lectures:
  • 13 Sep - Lectures:
    • 18 Sep - Lectures:
  • 20 Sep - Lectures:
    • 25 Sep - Lectures:
  • 27 Sep - Lectures:
    • 2 Oct - Lectures:
  • 4 Oct - Pick Brainstorming Slot by 5 Oct 5pm - Lectures:
    • 9 Oct - Fall Reading Week
  • 11 Oct - Fall Reading Week
    • 16 Oct - Lectures:
  • 18 Oct - Lectures:
    • 23 Oct - Lectures:
  • 25 Oct - Project Proposal Due 26 Oct at 5pm - Lectures:
    • 30 Oct - Lectures:
  • 1 Nov - Lectures:
    • 6 Nov - Lectures:
  • 8 Nov - Midterm Review and Q&A
    • 13 Nov - Midterm
  • 15 Nov - GUEST LECTURE
    • 20 Nov - Brainstorming: 1,2,3,4,5,6
  • 22 Nov - Project Draft Due 23 Nov at 5pm - Brainstorming: 1,2,3
    • 27 Nov - Brainstorming: 1,2,3,4,5,6
  • 29 Nov - Brainstorming: 1,2,3
    • 4 Dec - Brainstorming: 1,2,3,4,5,6
  • 6 Dec - Brainstorming: 1,2,3
  • Project Document Due Friday 7 December 5pm
  • Reviews (graduate students only) Due Friday 14 December 5pm