Difference between revisions of "Introduction to Data Science I"
Dan Lizotte (talk | contribs) (Initial copy-and-paste of old cs4437 wiki) |
|||
Line 1: | Line 1: | ||
− | |||
− | + | View source for CS4437 CS9637 - Introduction to Data Science | |
+ | ← CS4437 CS9637 - Introduction to Data Science | ||
+ | You do not have permission to edit this page, for the following reasons: | ||
− | == | + | The action you have requested is limited to users in the group: Users. |
− | * [https://www. | + | This page has been protected to prevent editing or other actions. |
− | * [https://www. | + | You can view and copy the source of this page: |
− | * [https:// | + | |
− | * [https://www. | + | |
− | * [https://www. | + | == Course outline == |
+ | |||
+ | '''From Dan:''' This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. <span style="color:#EE0000">Therefore, '''all ''graduate'' students who are ''not'' in the MSc or PhD programme within the Department of Computer Science must e-mail me a 1/2 page proposal sketch on the project they would like to pursue. (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 15 December 2016 and does not guarantee enrolment. Enrolment will be decided based on space available and quality of the proposal sketches.</span>''' | ||
+ | |||
+ | === Objective === | ||
+ | |||
+ | The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The lectures give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.''' | ||
+ | |||
+ | This course is designed for students who: | ||
+ | |||
+ | * Like to '''read''' - have a desire to understand substantive problems | ||
+ | * Like to '''think''' - make connections between methods and problems | ||
+ | * Like to '''hack''' - be willing to [http://en.wikipedia.org/wiki/Data_munging munge] data into usability | ||
+ | * Like to '''speak''' - teach us about what you found | ||
+ | |||
+ | === Prerequisites === | ||
+ | |||
+ | At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students. | ||
+ | |||
+ | === Logistics === | ||
+ | * '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363 | ||
+ | * '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below) | ||
+ | * '''Time''': Tuesday from 11:30AM – 1:30PM, and on Thursday from 3:30PM – 4:30PM | ||
+ | * '''Place''': Talbot College [http://accessibility.uwo.ca/doc/floorplan/bf-tc.pdf '''TC342'''] | ||
+ | * '''Question and Collaboration Hour:''' Thursday from 4:30pm - 5:30pm in Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC320'''] | ||
+ | * '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. | ||
+ | |||
+ | ===Important Dates=== | ||
+ | * Pick Brainstorming Slot by Friday, 3 February at 5pm <!-- End of 4th Week --> | ||
+ | * Project Proposal Due Friday, 17 Feb at 5pm <!-- End of 6th Week --> | ||
+ | * Project Draft Due Friday, 17 Mar at 5pm <!-- End of 10th Week --> | ||
+ | * Project Report Due Friday, 7 Apr at 5pm <!-- Last Day of Class --> | ||
+ | * Paper Reviews Due '''Thursday''', 13 Apr at 5pm <!-- Week after Last Day of Class --> | ||
+ | |||
+ | Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.) | ||
+ | |||
+ | Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 3 Feb at 5pm''' or Dan will pick a slot for you. | ||
+ | |||
+ | === Materials === | ||
+ | * '''Required Texts''' | ||
+ | :* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]] | ||
+ | :* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://www-stat.stanford.edu/~tibs/ElemStatLearn/ online]] | ||
+ | :* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]] | ||
+ | :* ggplot2 book by creator Hadley Wickham (2009). ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387981406 Western]] | ||
+ | * '''Review''' if you need to catch up: | ||
+ | :* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]] | ||
+ | :* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]] | ||
+ | :* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse | ||
+ | * '''Other Resources''' | ||
+ | :* Cheat Sheets | ||
+ | :** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet | ||
+ | :** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet | ||
+ | :* Texts | ||
+ | :** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ] | ||
+ | :** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup. | ||
+ | :** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup) | ||
+ | :** C. M. Bishop, Pattern Recognition and Machine Learning (2006) | ||
+ | :** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998) | ||
+ | :** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004. | ||
+ | :** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003. | ||
+ | :** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001. | ||
+ | :* Other Links | ||
+ | :** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception] | ||
+ | :** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism] | ||
+ | :* Software | ||
+ | :** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good. | ||
+ | :** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/] | ||
+ | |||
+ | === Topics (anticipated) === | ||
+ | * '''Introduction to Data Science''' | ||
+ | ** Definitions | ||
+ | ** Components | ||
+ | ** Relationships to Other Fields | ||
+ | |||
+ | * '''Data Munging''' | ||
+ | ** Working with structured data: selecting, filtering, joining, aggregating | ||
+ | ** Web scraping | ||
+ | ** Simple visualizations | ||
+ | ** Sanity checking | ||
+ | |||
+ | * '''(Re)-introduction to Statistics''' | ||
+ | ** Data Summaries | ||
+ | ** Randomness, Sample Spaces and Events, Probability | ||
+ | ** Random Variables, CDF, PMF, PDF | ||
+ | ** Expectation | ||
+ | ** Estimation | ||
+ | ** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap | ||
+ | ** Inference: Hypothesis testing, P-values, Confidence Intervals | ||
+ | ** Multivariate Statistics: conditional probability, correlation, independence | ||
+ | |||
+ | * '''Supervised Machine Learning, Predictive Models''' | ||
+ | ** Supervised Learning | ||
+ | *** Regression | ||
+ | *** Classification | ||
+ | ** Reinforcement Learning and Sequential Decision Making | ||
+ | |||
+ | * '''Evaluation''' | ||
+ | ** Variance: Test set, cross-validation, bootstrap | ||
+ | ** Bias: Confounding, causal inference | ||
+ | |||
+ | * '''Unsupervised Machine Learning, Representations, and Feature Construction''' | ||
+ | ** Clustering | ||
+ | ** Dimensionality reduction | ||
+ | ** Domain-specific Feature Development | ||
+ | *** Images | ||
+ | *** Sounds | ||
+ | *** Text | ||
+ | |||
+ | * '''Visualization''' | ||
+ | ** Topics to be determined | ||
+ | |||
+ | === Evaluation === | ||
+ | |||
+ | There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]]. | ||
+ | |||
+ | Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf]. | ||
+ | |||
+ | ==== Daily Quizzes – 5% ==== | ||
+ | |||
+ | Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 2 Mar. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.''' | ||
+ | |||
+ | ==== Midterm - 35% ==== | ||
+ | |||
+ | Assessing competencies from the fundamentals taught in the first half of the class. | ||
+ | |||
+ | ==== Brainstorming Session – 5% ==== | ||
+ | |||
+ | Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session. | ||
+ | |||
+ | ==== Project Proposal – '''4437:''' 15% '''9637:''' 10% ==== | ||
+ | |||
+ | Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements. | ||
+ | |||
+ | ==== Report Draft – 5% ==== | ||
+ | |||
+ | A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project. | ||
+ | |||
+ | ==== Project Report – 35% ==== | ||
+ | |||
+ | Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem. | ||
+ | |||
+ | ==== Peer Review – '''9637 only:''' 5% ==== | ||
+ | |||
+ | Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work. | ||
+ | |||
+ | ==== Participation and Effort ==== | ||
+ | |||
+ | Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''. | ||
+ | |||
+ | === Accessibility and Support Available at Western === | ||
+ | Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation. | ||
+ | Support Services | ||
+ | Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling. | ||
+ | Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help. | ||
+ | Additional student-run support services are offered by the USC, http://westernusc.ca/services. | ||
+ | The website for Registrarial Services is http://www.registrar.uwo.ca. | ||
+ | |||
+ | === Missed Course Components === | ||
+ | If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. | ||
+ | If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html. | ||
+ | A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an | ||
+ | off-campus medical facility. | ||
+ | For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf. | ||
+ | |||
+ | == Timeline (Tentative) == | ||
+ | |||
+ | * 7 Sep - Lectures: Introduction to Data Science, Data Cleaning | ||
+ | ** 12 Sep - Lectures: Re-introduction to Statistics | ||
+ | * 14 Sep - Lectures: Re-introduction to Statistics | ||
+ | ** 19 Sep - Lectures: Supervised Learning | ||
+ | * 21 Sep - Lectures: Supervised Learning | ||
+ | ** 26 Sep - Lectures: Supervised Learning | ||
+ | * 28 Sep - Lectures: Cancelled | ||
+ | ** 3 Oct - Lectures: Cancelled | ||
+ | * 5 Oct - '''Pick Brainstorming Slot''' - Lectures: Linear Models | ||
+ | ** 17 Oct - Lectures: Linear Models | ||
+ | * 19 Oct - Lectures: Linear Models / Nonlinear Models | ||
+ | ** 24 Oct - Lectures: Nonlinear Models | ||
+ | * 26 Oct - '''Project Proposal Due 17 Feb at 5pm''' - TBA | ||
+ | ** 31 Oct - TBA | ||
+ | * 2 Nov - TBA | ||
+ | ** 7 Oct - '''Midterm''' | ||
+ | * 9 Nov - Brainstorming: *slot1*, *slot2*, *slot3* | ||
+ | ** 14 Nov - Brainstorming: *slot1*, *slot2*, *slot3*, *slot4*, *slot5*, *slot6* | ||
+ | * 16 Nov - Brainstorming: *slot1*, *slot2*, *slot3* | ||
+ | ** 21 Nov - '''Project Draft Due 17 Mar at 5pm''' - Brainstorming: *slot1*, *slot2*, *slot3*, *slot4*, *slot5*, *slot6* | ||
+ | * 23 Nov - Brainstorming: *slot1*, *slot2*, *slot3* | ||
+ | ** 28 Nov - Brainstorming: *slot1*, *slot2*, *slot3*, *slot4*, *slot5*, *slot6* | ||
+ | * 30 Nov - Brainstorming: *slot1*, *slot2*, *slot3* | ||
+ | ** 5 Dec - Brainstorming: *slot1*, *slot2*, *slot3*, *slot4*, *slot5*, *slot6* | ||
+ | * 7 Dec - Brainstorming: *slot1*, *slot2*, *slot3* | ||
+ | |||
+ | * '''Project Document Due Friday 7 April 5pm''' | ||
+ | * '''Reviews Due Thursday 13 April 5pm''' | ||
+ | |||
+ | Return to CS4437 CS9637 - Introduction to Data Science. | ||
+ | |||
+ | Navigation menu | ||
+ | Log inPageDiscussionReadView sourceView history | ||
+ | |||
+ | Search | ||
+ | Go | ||
+ | Main page | ||
+ | Project Guidelines | ||
+ | Data and Software | ||
+ | Lecture Materials | ||
+ | Recent changes | ||
+ | Help | ||
+ | Tools | ||
+ | What links here | ||
+ | Related changes | ||
+ | Special pages | ||
+ | Page information | ||
+ | Privacy policyAbout CS 4437/CS 9637 Introduction to Data ScienceDisclaimersPowered by MediaWiki |
Revision as of 19:26, 1 August 2017
View source for CS4437 CS9637 - Introduction to Data Science ← CS4437 CS9637 - Introduction to Data Science You do not have permission to edit this page, for the following reasons:
The action you have requested is limited to users in the group: Users. This page has been protected to prevent editing or other actions. You can view and copy the source of this page:
Contents
Course outline
From Dan: This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. Therefore, all graduate students who are not in the MSc or PhD programme within the Department of Computer Science must e-mail me a 1/2 page proposal sketch on the project they would like to pursue. (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 15 December 2016 and does not guarantee enrolment. Enrolment will be decided based on space available and quality of the proposal sketches.
Objective
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The lectures give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.
This course is designed for students who:
- Like to read - have a desire to understand substantive problems
- Like to think - make connections between methods and problems
- Like to hack - be willing to munge data into usability
- Like to speak - teach us about what you found
Prerequisites
At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students.
Logistics
- Instructor: Dan Lizotte – dlizotte at uwo dot ca – Office MC363
- Teaching Assistant: Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)
- Time: Tuesday from 11:30AM – 1:30PM, and on Thursday from 3:30PM – 4:30PM
- Place: Talbot College TC342
- Question and Collaboration Hour: Thursday from 4:30pm - 5:30pm in Middlesex College MC320
- Communication: We will be using OWL for electronic communication.
Important Dates
- Pick Brainstorming Slot by Friday, 3 February at 5pm
- Project Proposal Due Friday, 17 Feb at 5pm
- Project Draft Due Friday, 17 Mar at 5pm
- Project Report Due Friday, 7 Apr at 5pm
- Paper Reviews Due Thursday, 13 Apr at 5pm
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of Friday, 3 Feb at 5pm or Dan will pick a slot for you.
Materials
- Required Texts
- JWHT: James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York: Springer. [Free through Western]
- HTF: The Elements of Statistical Learning by Hastie, Tibshirani and Friedman. Expanded version of required text. [Free online]
- LW: Leland Wilkinson's The Grammar of Graphics (2005). [Free from Springer]
- ggplot2 book by creator Hadley Wickham (2009). [Free through Western]
- Review if you need to catch up:
- Larry Wasserman's All of Statistics. [Free from Springer]
- Devore, J. L., & Berk, K. N. (2007). Modern mathematical statistics with applications. 2nd ed. Springer. [Free through Western]
- linear algebra review - up to and including Section 3.7 - The Inverse
- Other Resources
- Cheat Sheets
- ggplot2 cheat sheet
- Data Wrangling cheat sheet
- Texts
- Phil Spector. (2008). Data Manipulation with R New York: Springer. [ Free through Western ]
- probability review from Stanford University by way of Doina Precup.
- List of resources from COMP-652 at McGill (courtesy Doina Precup)
- C. M. Bishop, Pattern Recognition and Machine Learning (2006)
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)
- Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
- David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.
- Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.
- Other Links
- Software
- The dplyr package documentation. The "vignettes" are particularly good.
- The Tensorflow Library (Python, C++) [1]
- Cheat Sheets
Topics (anticipated)
- Introduction to Data Science
- Definitions
- Components
- Relationships to Other Fields
- Data Munging
- Working with structured data: selecting, filtering, joining, aggregating
- Web scraping
- Simple visualizations
- Sanity checking
- (Re)-introduction to Statistics
- Data Summaries
- Randomness, Sample Spaces and Events, Probability
- Random Variables, CDF, PMF, PDF
- Expectation
- Estimation
- Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap
- Inference: Hypothesis testing, P-values, Confidence Intervals
- Multivariate Statistics: conditional probability, correlation, independence
- Supervised Machine Learning, Predictive Models
- Supervised Learning
- Regression
- Classification
- Reinforcement Learning and Sequential Decision Making
- Supervised Learning
- Evaluation
- Variance: Test set, cross-validation, bootstrap
- Bias: Confounding, causal inference
- Unsupervised Machine Learning, Representations, and Feature Construction
- Clustering
- Dimensionality reduction
- Domain-specific Feature Development
- Images
- Sounds
- Text
- Visualization
- Topics to be determined
Evaluation
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. Graduate students (9637) will additionally submit peer reviews of other class projects. For detailed requirements, see Project Guidelines.
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [2].
Daily Quizzes – 5%
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 2 Mar. The lowest quiz mark will be dropped. Quiz marks will only be excused for medical reasons.
Midterm - 35%
Assessing competencies from the fundamentals taught in the first half of the class.
Brainstorming Session – 5%
Each student will prepare a presentation explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be no more than 10 minutes. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback from the brainstorming session.
Project Proposal – 4437: 15% 9637: 10%
Document detailing the plan for the project. See Project Guidelines for detailed requirements.
Report Draft – 5%
A draft of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.
Project Report – 35%
Each student will prepare a research paper detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.
Peer Review – 9637 only: 5%
Each graduate student will prepare two reviews of their classmates' work.
Participation and Effort
Success of the course as a useful learning experience hinges on active participation and effort of the students. Students are expected to attend all classes and are expected to actively participate in the brainstorming sessions.
Accessibility and Support Available at Western
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation. Support Services Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling. Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help. Additional student-run support services are offered by the USC, http://westernusc.ca/services. The website for Registrarial Services is http://www.registrar.uwo.ca.
Missed Course Components
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html. A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an off-campus medical facility. For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.
Timeline (Tentative)
- 7 Sep - Lectures: Introduction to Data Science, Data Cleaning
- 12 Sep - Lectures: Re-introduction to Statistics
- 14 Sep - Lectures: Re-introduction to Statistics
- 19 Sep - Lectures: Supervised Learning
- 21 Sep - Lectures: Supervised Learning
- 26 Sep - Lectures: Supervised Learning
- 28 Sep - Lectures: Cancelled
- 3 Oct - Lectures: Cancelled
- 5 Oct - Pick Brainstorming Slot - Lectures: Linear Models
- 17 Oct - Lectures: Linear Models
- 19 Oct - Lectures: Linear Models / Nonlinear Models
- 24 Oct - Lectures: Nonlinear Models
- 26 Oct - Project Proposal Due 17 Feb at 5pm - TBA
- 31 Oct - TBA
- 2 Nov - TBA
- 7 Oct - Midterm
- 9 Nov - Brainstorming: *slot1*, *slot2*, *slot3*
- 14 Nov - Brainstorming: *slot1*, *slot2*, *slot3*, *slot4*, *slot5*, *slot6*
- 16 Nov - Brainstorming: *slot1*, *slot2*, *slot3*
- 21 Nov - Project Draft Due 17 Mar at 5pm - Brainstorming: *slot1*, *slot2*, *slot3*, *slot4*, *slot5*, *slot6*
- 23 Nov - Brainstorming: *slot1*, *slot2*, *slot3*
- 28 Nov - Brainstorming: *slot1*, *slot2*, *slot3*, *slot4*, *slot5*, *slot6*
- 30 Nov - Brainstorming: *slot1*, *slot2*, *slot3*
- 5 Dec - Brainstorming: *slot1*, *slot2*, *slot3*, *slot4*, *slot5*, *slot6*
- 7 Dec - Brainstorming: *slot1*, *slot2*, *slot3*
- Project Document Due Friday 7 April 5pm
- Reviews Due Thursday 13 April 5pm
Return to CS4437 CS9637 - Introduction to Data Science.
Navigation menu Log inPageDiscussionReadView sourceView history
Search Go Main page Project Guidelines Data and Software Lecture Materials Recent changes Help Tools What links here Related changes Special pages Page information Privacy policyAbout CS 4437/CS 9637 Introduction to Data ScienceDisclaimersPowered by MediaWiki