https://www.csd.uwo.ca/~dlizotte/teaching/IDS/api.php?action=feedcontributions&user=Dan+Lizotte&feedformat=atomIntroduction to Data Science - User contributions [en]2018-10-23T02:58:05ZUser contributionsMediaWiki 1.29.0https://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Project_Guidelines&diff=168Project Guidelines2018-10-18T17:57:11Z<p>Dan Lizotte: Adjusted "student" to "team" in brainstorming section.</p>
<hr />
<div>== Goal ==<br />
<br />
The goal of this project is for students to gain experience in understanding a substantive problem/question, acquiring data relevant to the problem/question, and applying appropriate data science techniques in an effort to address the problem/question. The ''substantive field'' refers to the field of science (not data science) containing the problem to be addressed. Example substantive fields include medicine, chemistry, astronomy, and computer networks. All project must include a visualization component, which may be static or dynamic.<br />
<br />
== Structure and Regulations ==<br />
<br />
*Projects are to be completed in groups of two or three individuals. <br />
*The project will be submitted as three deliverables, a project [[#Proposal|proposal]] early in the term, a [[#Report Draft|draft]] partway through the term, and a final research [[#Final Report|report]] at the end of the term. '''All of these must be submitted as pdfs generated by Markdown, LaTeX, or Word; see instructions below.''' After this, each '''9637''' student will [[#Review Guidelines|review]] a subset of projects; reviews are due one week after final project submission.<br />
*All projects ''must'' be based on a dataset that is '''sufficiently interesting''' for our purposes as judged by the instructor. Note that any [http://archive.ics.uci.edu/ml/ UCI] dataset that was donated prior to 2007 is considered '''un'''interesting and is therefore disallowed.<br />
*You are encouraged to contact the instructor at any point to determine if your project topic is suitable<br />
*'''No Spam Filters. Furthermore, the Enron-Spam datasets are explicitly forbidden'''<br />
<br />
== Proposal ==<br />
<br />
For the proposal, each group will identify an applied problem (or a few related problems) that could be solved using data science methods, identify an appropriate dataset, and give a detailed plan for analyzing the data that includes what pre-processing will be required, what kind of feature development will be necessary, and what analysis and visualization methods might be applied. Don't forget to include details for how you will assess the performance of any models you build. The proposal should have '''three main headings''':<br />
<br />
* Description of Applied Problem<br />
* Description of Available Data<br />
* Plan for Analysis and Visualization<br />
<br />
The main body of the proposal document should be 2 pages long, single spaced. Page 3 and after may only contain references, tables, and figures. If you are using LaTeX, use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ CS4637/CS9637 style files], which are based on the ICML style files. There is no style file for markdown, but keep in mind that if you use Markdown, you still need to have proper references. [http://www.chriskrycho.com/2015/academic-markdown-and-citations.html This resource] may help, as might a bit of Google/StackExchange searching, but in the end the onus is on you. If using word, use 3/4" margins and a 12 point serif font.<br />
<br />
Include a brief abstract of a few sentences. '''At least two appropriate references''' must be listed for works (papers or books) that discuss and describe the applied problem, '''at least one reference''' that describes the available data (may be URL(s)) and '''at least two references''' that describe the methods you plan to explore in your analysis and visualization plan.<br />
<br />
'''Whether you are using LaTeX, Markdown, or Word, submit your proposal as a PDF file. Proposals must submitted through OWL. Late submissions will not be accepted.'''<br />
<br />
== Report Draft ==<br />
<br />
A draft of the final report will be due approximately 2/3 of the way through the term. Use Word, Markdown, or LaTeX with the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], just as you must for the final report. To ensure you get useful feedback, the draft should have a complete abstract, background section, and analysis and visualization plan. The rest of the paper should at least be sketched in, perhaps in point form, to give a sense of the final shape of the document. '''The precise content of the draft is not specified, but the more you provide, the better feedback you will get.'''<br />
<br />
'''Report drafts must be submitted <!-- to EasyChair [https://www.easychair.org/conferences/?conf=amlf14 https://www.easychair.org/conferences/?conf=amlf14] --> through OWL by 5pm on the due date. *Do not e-mail the instructor your draft.*''' Late submissions will not be accepted. <!-- Later, to submit your final report, you will simply "Update" your draft submission with a new .pdf (and maybe title.) --><br />
<br />
== Final Report ==<br />
<br />
The report must be no more than 6 pages long, single spaced, not including references. '''If you wish''', you may also include an additional appendix with an unlimited number of pages that contain '''only figures, figure captions, and tables'''. Use Word, or use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], which are based on the ICML style files, or use Markdown. Include a brief abstract. As mentioned above, all reports must include a visualization component.<br />
<br />
An outstanding report might resemble an application-focussed publication in a workshop at one of the top machine learning or AI conferences, like for example ICML or [http://www.aaai.org/Library/IAAI/iaai-library.php IAAI]. (Note however that you are required to include a visualization component, which such papers may not have.) Here are some examples. Note that just because a paper is listed here does not mean it is perfect; you must always read with a fair but critical eye.<br />
<br />
*Philip A. Warrick, Emily F. Hamilton, Robert E. Kearney, Doina Precup. [http://www.aaai.org/ocs/index.php/IAAI/IAAI10/paper/view/1597 A Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery.]<br />
*Weiss, Page, Peissig, Natarajan, and McCarty. [http://www.aaai.org/ocs/index.php/IAAI/IAAI-12/paper/view/4778/5451 Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records]<br />
*Chad Cumby, Rayid Ghani [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3528 A Machine Learning Based System for Semi-Automatically Redacting Documents.]<br />
*Mitja Luštrek, Hristijan Gjoreski, Simon Kozina, Božidara Cvetković, Violeta Mirchevska, Matjaž Gams [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/2753 Detecting Falls with Location Sensors and Accelerometers]<br />
* Ben George Weber, Michael John, Michael Mateas, Arnav Jhala [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3526/4029 Modeling Player Retention in Madden NFL 11]<br />
<br />
=== Specific expectations for the report ===<br />
<br />
'''Reproducibility''': The report '''must''' contain enough detail about the methods used to allow a future researcher to reproduce the results if they had access to the appropriate data and access to all appropriate works cited. (Some projects may use proprietary data; that is fine.) Reports that do not contain sufficient method detail will not receive full marks.<br />
<br />
'''Integrity''': The report must adhere to the standards of [http://www.lib.uwaterloo.ca/gradait/content/documents/credit_your_sources.pdf academic honesty].<br />
<br />
'''Formality''': The report should be written in formal academic language appropriate for a technical report/workshop/conference/journal publication. The author should refer to him/herself in the second person plural, i.e. using "we." ("We present a novel analysis...")<br />
<br />
'''Writing Quality''': The writing must of the quality level expected of a senior undergraduate or graduate student at a world-class university. The [http://www.sdc.uwo.ca/writing/ Writing Support Centre] at UWO can help you reach this level.<br />
<br />
== Report Submission and Reviewing ==<br />
<br />
'''Final report submissions will be done through OWL.'''<br />
<br />
Following report submission, each '''Computer Science graduate (9637)''' student will be randomly assigned two project reports to review over the week following the due date but before the end of the exam period.<br />
<br />
* The main purpose of reviewing is to provide feedback to authors that they can make use of in their future careers, which gives them a better return on the investment they have made in their course project.<br />
* The secondary purpose is to give students a view of the variety of work that has been done in the course.<br />
* '''Reviews from other students will not affect the grade of the author in any way.'''<br />
* Reviewing will be single-blind: Authors will not know who reviews their project.<br />
* Reviewers are expected to provide feedback that is '''constructive'''. Constructive feedback '''makes concrete suggestions on improving the work''' under review. Feedback that is both negative and non-constructive will not be tolerated.<br />
<br />
=== Review Guidelines ===<br />
'''Students must follow the review guidelines below. Include headings where appropriate'''<br />
<br />
* '''Summary:''' Summarize the goal of the project. What are the authors trying to achieve? Then summarize the contributions of the project in a few sentences. Describe the substantive problem, the data used, and the analysis applied. Describe the results. Note that not every project will have "good results" and for this project that is not necessarily a fault; the meta-goal of this project is for each author to gain experience with DS methods. Keep that in mind when you summarize: did the authors sufficiently explore the space of appropriate methods?<br />
* After the summary, comment on the following aspects of the report:<br />
** '''Background''': Comment on whether the report clearly explains the problem to be tackled, and whether it clearly describes how the substantive problem will be formulated as a data science problem.<br />
** '''Data''': Comment on whether you were able to clearly understand what data were available and how they were used in the analysis.<br />
** '''Analysis and Visualization''': Comment on the appropriateness of the DS methods used, and '''comment on the reproducibility of the results''' as described above. Comment on the evaluation measures use.<br />
** '''Future work''': Make some suggestions on how the work could be extended in the future.<br />
<br />
Depending on the project, these sections of the review may be longer or shorter. Use your judgement. Be sure to have at least a few interesting sentences under each heading.<br />
<br />
== Brainstorming ==<br />
<br />
A brainstorming session will consist of a 10-minute presentation by a team, followed by a class discussion for a total of 15 minutes. The presenter may choose to take questions during the talk, or save them until the end. The presentation should detail an applied problem, dataset, and potential DS methods that could be useful, much like the project proposal. The Brainstorming Session '''''may or may not''''' be on the team's project topic, but of course it may be advantageous to use your brainstorming slot to get feedback and ideas.<br />
<br />
* Guidelines<br />
** Presentations should use projected slides<br />
** Presentations should cover more or less the same topics as a project proposal: Description of Applied Problem, Description of Available Data, Plan for Analysis and Visualization<br />
** Presenters will receive a 5-minute warning, but presentations *will* be terminated at the 15-minute mark.<br />
<br />
* Evaluation (by instructor) is based on <br />
** Effective explanation of the problem<br />
** Effective explanation of the available data. It is often a good idea to show a specific example of a single "data item" from the available data, whatever that might mean for the specific project.<br />
** Effective explanation potential DS methods<br />
** Ability to answer questions about the data and the analysis and visualization plan<br />
** Working within the strict 10+5 minute timeslot<br />
<br />
In general, it is better to *show* your plan rather than tell it. Use actual examples from your dataset where possible. Show how feature vectors and any class labels/regression targets are constructed.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Project_Guidelines&diff=167Project Guidelines2018-10-16T17:59:52Z<p>Dan Lizotte: /* Proposal */ student -> group</p>
<hr />
<div>== Goal ==<br />
<br />
The goal of this project is for students to gain experience in understanding a substantive problem/question, acquiring data relevant to the problem/question, and applying appropriate data science techniques in an effort to address the problem/question. The ''substantive field'' refers to the field of science (not data science) containing the problem to be addressed. Example substantive fields include medicine, chemistry, astronomy, and computer networks. All project must include a visualization component, which may be static or dynamic.<br />
<br />
== Structure and Regulations ==<br />
<br />
*Projects are to be completed in groups of two or three individuals. <br />
*The project will be submitted as three deliverables, a project [[#Proposal|proposal]] early in the term, a [[#Report Draft|draft]] partway through the term, and a final research [[#Final Report|report]] at the end of the term. '''All of these must be submitted as pdfs generated by Markdown, LaTeX, or Word; see instructions below.''' After this, each '''9637''' student will [[#Review Guidelines|review]] a subset of projects; reviews are due one week after final project submission.<br />
*All projects ''must'' be based on a dataset that is '''sufficiently interesting''' for our purposes as judged by the instructor. Note that any [http://archive.ics.uci.edu/ml/ UCI] dataset that was donated prior to 2007 is considered '''un'''interesting and is therefore disallowed.<br />
*You are encouraged to contact the instructor at any point to determine if your project topic is suitable<br />
*'''No Spam Filters. Furthermore, the Enron-Spam datasets are explicitly forbidden'''<br />
<br />
== Proposal ==<br />
<br />
For the proposal, each group will identify an applied problem (or a few related problems) that could be solved using data science methods, identify an appropriate dataset, and give a detailed plan for analyzing the data that includes what pre-processing will be required, what kind of feature development will be necessary, and what analysis and visualization methods might be applied. Don't forget to include details for how you will assess the performance of any models you build. The proposal should have '''three main headings''':<br />
<br />
* Description of Applied Problem<br />
* Description of Available Data<br />
* Plan for Analysis and Visualization<br />
<br />
The main body of the proposal document should be 2 pages long, single spaced. Page 3 and after may only contain references, tables, and figures. If you are using LaTeX, use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ CS4637/CS9637 style files], which are based on the ICML style files. There is no style file for markdown, but keep in mind that if you use Markdown, you still need to have proper references. [http://www.chriskrycho.com/2015/academic-markdown-and-citations.html This resource] may help, as might a bit of Google/StackExchange searching, but in the end the onus is on you. If using word, use 3/4" margins and a 12 point serif font.<br />
<br />
Include a brief abstract of a few sentences. '''At least two appropriate references''' must be listed for works (papers or books) that discuss and describe the applied problem, '''at least one reference''' that describes the available data (may be URL(s)) and '''at least two references''' that describe the methods you plan to explore in your analysis and visualization plan.<br />
<br />
'''Whether you are using LaTeX, Markdown, or Word, submit your proposal as a PDF file. Proposals must submitted through OWL. Late submissions will not be accepted.'''<br />
<br />
== Report Draft ==<br />
<br />
A draft of the final report will be due approximately 2/3 of the way through the term. Use Word, Markdown, or LaTeX with the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], just as you must for the final report. To ensure you get useful feedback, the draft should have a complete abstract, background section, and analysis and visualization plan. The rest of the paper should at least be sketched in, perhaps in point form, to give a sense of the final shape of the document. '''The precise content of the draft is not specified, but the more you provide, the better feedback you will get.'''<br />
<br />
'''Report drafts must be submitted <!-- to EasyChair [https://www.easychair.org/conferences/?conf=amlf14 https://www.easychair.org/conferences/?conf=amlf14] --> through OWL by 5pm on the due date. *Do not e-mail the instructor your draft.*''' Late submissions will not be accepted. <!-- Later, to submit your final report, you will simply "Update" your draft submission with a new .pdf (and maybe title.) --><br />
<br />
== Final Report ==<br />
<br />
The report must be no more than 6 pages long, single spaced, not including references. '''If you wish''', you may also include an additional appendix with an unlimited number of pages that contain '''only figures, figure captions, and tables'''. Use Word, or use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], which are based on the ICML style files, or use Markdown. Include a brief abstract. As mentioned above, all reports must include a visualization component.<br />
<br />
An outstanding report might resemble an application-focussed publication in a workshop at one of the top machine learning or AI conferences, like for example ICML or [http://www.aaai.org/Library/IAAI/iaai-library.php IAAI]. (Note however that you are required to include a visualization component, which such papers may not have.) Here are some examples. Note that just because a paper is listed here does not mean it is perfect; you must always read with a fair but critical eye.<br />
<br />
*Philip A. Warrick, Emily F. Hamilton, Robert E. Kearney, Doina Precup. [http://www.aaai.org/ocs/index.php/IAAI/IAAI10/paper/view/1597 A Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery.]<br />
*Weiss, Page, Peissig, Natarajan, and McCarty. [http://www.aaai.org/ocs/index.php/IAAI/IAAI-12/paper/view/4778/5451 Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records]<br />
*Chad Cumby, Rayid Ghani [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3528 A Machine Learning Based System for Semi-Automatically Redacting Documents.]<br />
*Mitja Luštrek, Hristijan Gjoreski, Simon Kozina, Božidara Cvetković, Violeta Mirchevska, Matjaž Gams [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/2753 Detecting Falls with Location Sensors and Accelerometers]<br />
* Ben George Weber, Michael John, Michael Mateas, Arnav Jhala [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3526/4029 Modeling Player Retention in Madden NFL 11]<br />
<br />
=== Specific expectations for the report ===<br />
<br />
'''Reproducibility''': The report '''must''' contain enough detail about the methods used to allow a future researcher to reproduce the results if they had access to the appropriate data and access to all appropriate works cited. (Some projects may use proprietary data; that is fine.) Reports that do not contain sufficient method detail will not receive full marks.<br />
<br />
'''Integrity''': The report must adhere to the standards of [http://www.lib.uwaterloo.ca/gradait/content/documents/credit_your_sources.pdf academic honesty].<br />
<br />
'''Formality''': The report should be written in formal academic language appropriate for a technical report/workshop/conference/journal publication. The author should refer to him/herself in the second person plural, i.e. using "we." ("We present a novel analysis...")<br />
<br />
'''Writing Quality''': The writing must of the quality level expected of a senior undergraduate or graduate student at a world-class university. The [http://www.sdc.uwo.ca/writing/ Writing Support Centre] at UWO can help you reach this level.<br />
<br />
== Report Submission and Reviewing ==<br />
<br />
'''Final report submissions will be done through OWL.'''<br />
<br />
Following report submission, each '''Computer Science graduate (9637)''' student will be randomly assigned two project reports to review over the week following the due date but before the end of the exam period.<br />
<br />
* The main purpose of reviewing is to provide feedback to authors that they can make use of in their future careers, which gives them a better return on the investment they have made in their course project.<br />
* The secondary purpose is to give students a view of the variety of work that has been done in the course.<br />
* '''Reviews from other students will not affect the grade of the author in any way.'''<br />
* Reviewing will be single-blind: Authors will not know who reviews their project.<br />
* Reviewers are expected to provide feedback that is '''constructive'''. Constructive feedback '''makes concrete suggestions on improving the work''' under review. Feedback that is both negative and non-constructive will not be tolerated.<br />
<br />
=== Review Guidelines ===<br />
'''Students must follow the review guidelines below. Include headings where appropriate'''<br />
<br />
* '''Summary:''' Summarize the goal of the project. What are the authors trying to achieve? Then summarize the contributions of the project in a few sentences. Describe the substantive problem, the data used, and the analysis applied. Describe the results. Note that not every project will have "good results" and for this project that is not necessarily a fault; the meta-goal of this project is for each author to gain experience with DS methods. Keep that in mind when you summarize: did the authors sufficiently explore the space of appropriate methods?<br />
* After the summary, comment on the following aspects of the report:<br />
** '''Background''': Comment on whether the report clearly explains the problem to be tackled, and whether it clearly describes how the substantive problem will be formulated as a data science problem.<br />
** '''Data''': Comment on whether you were able to clearly understand what data were available and how they were used in the analysis.<br />
** '''Analysis and Visualization''': Comment on the appropriateness of the DS methods used, and '''comment on the reproducibility of the results''' as described above. Comment on the evaluation measures use.<br />
** '''Future work''': Make some suggestions on how the work could be extended in the future.<br />
<br />
Depending on the project, these sections of the review may be longer or shorter. Use your judgement. Be sure to have at least a few interesting sentences under each heading.<br />
<br />
== Brainstorming ==<br />
<br />
A brainstorming session will consist of a 10-minute presentation by a student, followed by a class discussion for a total of 15 minutes. The presenter may choose to take questions during the talk, or save them until the end. The presentation should detail an applied problem, dataset, and potential DS methods that could be useful, much like the project proposal. The Brainstorming Session '''''may or may not''''' be on the student's project topic, but of course it may be advantageous to use your brainstorming slot to get feedback and ideas.<br />
<br />
* Guidelines<br />
** Presentations should use projected slides<br />
** Presentations should cover more or less the same topics as a project proposal: Description of Applied Problem, Description of Available Data, Plan for Analysis and Visualization<br />
** Presenters will receive a 5-minute warning, but presentations *will* be terminated at the 15-minute mark.<br />
<br />
* Evaluation (by instructor) is based on <br />
** Effective explanation of the problem<br />
** Effective explanation of the available data. It is often a good idea to show a specific example of a single "data item" from the available data, whatever that might mean for the specific project.<br />
** Effective explanation potential DS methods<br />
** Ability to answer questions about the data and the analysis and visualization plan<br />
** Working within the strict 10+5 minute timeslot<br />
<br />
In general, it is better to *show* your plan rather than tell it. Use actual examples from your dataset where possible. Show how feature vectors and any class labels/regression targets are constructed.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=166Lecture Materials2018-10-16T16:36:39Z<p>Dan Lizotte: Updated with Classification and Non-linear models for F18</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Tidyness [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pptx ppt] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pdf pdf] ]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* Introduction to Statistical Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=165Lecture Materials2018-10-02T16:52:20Z<p>Dan Lizotte: Update to F18 Classification</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Tidyness [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pptx ppt] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pdf pdf] ]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* Introduction to Statistical Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/7_Classification/classification.pdf pdf]]<br />
<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=164Lecture Materials2018-09-25T15:24:14Z<p>Dan Lizotte: 2018 model selection</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Tidyness [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pptx ppt] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pdf pdf] ]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* Introduction to Statistical Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=163Lecture Materials2018-09-20T18:07:33Z<p>Dan Lizotte: </p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Tidyness [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pptx ppt] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pdf pdf] ]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* Introduction to Statistical Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=162Lecture Materials2018-09-20T16:18:47Z<p>Dan Lizotte: /* Lecture Materials */</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Tidyness [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pptx ppt] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pdf pdf] ]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* Introduction to Statistical Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Performance Evaluation<br />
* Model Selection<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=161Lecture Materials2018-09-18T18:08:15Z<p>Dan Lizotte: Updating reintroduction to statistics</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Tidyness [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pptx ppt] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pdf pdf] ]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* Introduction to Statistical Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/4_Reintroduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=160Lecture Materials2018-09-13T18:01:58Z<p>Dan Lizotte: /* Lecture Materials */ Added Lecture 3</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Tidyness [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pptx ppt] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pdf pdf] ]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* Introduction to Statistical Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/3_Intro%20to%20Statistical%20Learning/intro_to_statistical_learning.pdf pdf] ]<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=159Lecture Materials2018-09-11T21:03:47Z<p>Dan Lizotte: /* Lecture Materials */</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Tidyness [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pptx ppt] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/TidyData.pdf pdf] ]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=158Lecture Materials2018-09-11T18:14:55Z<p>Dan Lizotte: Update lecture 1 F18</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=157Lecture Materials2018-09-10T21:12:46Z<p>Dan Lizotte: Intro lecture F18</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F18/Lectures/1_Welcome/welcome.pdf Welcome]<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=156Introduction to Data Science I2018-09-10T20:52:45Z<p>Dan Lizotte: /* Important Dates */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">Note that this course is in high demand. Now that those who submitted a proposal successfully have been registered, the course is open to all computer science students (who can register themselves online) subject to space availability. If there is space remaining after 21 September, students from other Departments and Faculties may be admitted. Those interested should attend lectures anyway.''</span><br />
<br />
<span style="color:#EE0000">'''Note that Master of Data Analytics students are exempt from this and will be registered in 9114A.'''</span><br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Instructor Information ===<br />
<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Nathan Phelps - nphelps3 at uwo dot ca<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Talbot College [http://www.music.uwo.ca/pdf/resources/TC-03.pdf '''TC-205''']<br />
<!-- * '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' --><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
=== Course Description and Objectives ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 23 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Course Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' (Available through UWO Library)<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 10% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 10% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accommodation and Accessibility ===<br />
<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in NCB 280, and can be contacted at scibmsac@uwo.ca. <br />
<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Academic Policies ===<br />
<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
In accordance with policy, http://www.uwo.ca/its/identity/activatenonstudent.html, <br />
the centrally administered e-mail account provided to students will be considered the individual’s official university e-mail address. It is the responsibility of the account holder to ensure that e-mail received from the University at his/her official university address is attended to in a timely manner.<br />
<br />
Electronic devices are not permitted for the midterm.<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at the following Web site: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf.<br />
<br />
All required papers may be subject to submission for textual similarity review to the commercial plagiarism detection software under license to the University for the detection of plagiarism. All papers submitted for such checking will be included as source documents in the reference database for the purpose of detecting plagiarism of papers subsequently submitted to the system. Use of the service is subject to the licensing agreement, currently between The University of Western Ontario and Turnitin.com (http://www.turnitin.com).<br />
<br />
Computer-marked multiple-choice tests and exams may be subject to submission for similarity review by software that will check for unusual coincidences in answer patterns that may indicate cheating.<br />
<br />
=== Support Services ===<br />
<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Student Accessibility Services (SAS) at 661-2147 if you have any questions regarding accommodations.<br />
<br />
The policy on Accommodation for Students with Disabilities can be found here: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_disabilities.pdf<br />
<br />
The policy on Accommodation for Religious Holidays can be found here:<br />
http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_religious.pdf<br />
<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - '''Project Proposal Due 26 Oct at 5pm''' - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Midterm Review and Q&A<br />
** 13 Nov - Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 23 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=155Introduction to Data Science I2018-09-10T20:50:58Z<p>Dan Lizotte: /* Timeline (Tentative) */ Reconciled dates</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">Note that this course is in high demand. Now that those who submitted a proposal successfully have been registered, the course is open to all computer science students (who can register themselves online) subject to space availability. If there is space remaining after 21 September, students from other Departments and Faculties may be admitted. Those interested should attend lectures anyway.''</span><br />
<br />
<span style="color:#EE0000">'''Note that Master of Data Analytics students are exempt from this and will be registered in 9114A.'''</span><br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Instructor Information ===<br />
<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Nathan Phelps - nphelps3 at uwo dot ca<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Talbot College [http://www.music.uwo.ca/pdf/resources/TC-03.pdf '''TC-205''']<br />
<!-- * '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' --><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
=== Course Description and Objectives ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Course Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' (Available through UWO Library)<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 10% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 10% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accommodation and Accessibility ===<br />
<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in NCB 280, and can be contacted at scibmsac@uwo.ca. <br />
<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Academic Policies ===<br />
<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
In accordance with policy, http://www.uwo.ca/its/identity/activatenonstudent.html, <br />
the centrally administered e-mail account provided to students will be considered the individual’s official university e-mail address. It is the responsibility of the account holder to ensure that e-mail received from the University at his/her official university address is attended to in a timely manner.<br />
<br />
Electronic devices are not permitted for the midterm.<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at the following Web site: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf.<br />
<br />
All required papers may be subject to submission for textual similarity review to the commercial plagiarism detection software under license to the University for the detection of plagiarism. All papers submitted for such checking will be included as source documents in the reference database for the purpose of detecting plagiarism of papers subsequently submitted to the system. Use of the service is subject to the licensing agreement, currently between The University of Western Ontario and Turnitin.com (http://www.turnitin.com).<br />
<br />
Computer-marked multiple-choice tests and exams may be subject to submission for similarity review by software that will check for unusual coincidences in answer patterns that may indicate cheating.<br />
<br />
=== Support Services ===<br />
<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Student Accessibility Services (SAS) at 661-2147 if you have any questions regarding accommodations.<br />
<br />
The policy on Accommodation for Students with Disabilities can be found here: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_disabilities.pdf<br />
<br />
The policy on Accommodation for Religious Holidays can be found here:<br />
http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_religious.pdf<br />
<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - '''Project Proposal Due 26 Oct at 5pm''' - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Midterm Review and Q&A<br />
** 13 Nov - Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 23 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=154Introduction to Data Science I2018-09-09T17:32:56Z<p>Dan Lizotte: /* Course outline for COMPSCI 4414A/9637A/9114A */ Update to comply with 2018 course outline requirements</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">Note that this course is in high demand. Now that those who submitted a proposal successfully have been registered, the course is open to all computer science students (who can register themselves online) subject to space availability. If there is space remaining after 21 September, students from other Departments and Faculties may be admitted. Those interested should attend lectures anyway.''</span><br />
<br />
<span style="color:#EE0000">'''Note that Master of Data Analytics students are exempt from this and will be registered in 9114A.'''</span><br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Instructor Information ===<br />
<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Nathan Phelps - nphelps3 at uwo dot ca<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Talbot College [http://www.music.uwo.ca/pdf/resources/TC-03.pdf '''TC-205''']<br />
<!-- * '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' --><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
=== Course Description and Objectives ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Course Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' (Available through UWO Library)<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 10% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 10% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accommodation and Accessibility ===<br />
<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in NCB 280, and can be contacted at scibmsac@uwo.ca. <br />
<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Academic Policies ===<br />
<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
In accordance with policy, http://www.uwo.ca/its/identity/activatenonstudent.html, <br />
the centrally administered e-mail account provided to students will be considered the individual’s official university e-mail address. It is the responsibility of the account holder to ensure that e-mail received from the University at his/her official university address is attended to in a timely manner.<br />
<br />
Electronic devices are not permitted for the midterm.<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at the following Web site: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf.<br />
<br />
All required papers may be subject to submission for textual similarity review to the commercial plagiarism detection software under license to the University for the detection of plagiarism. All papers submitted for such checking will be included as source documents in the reference database for the purpose of detecting plagiarism of papers subsequently submitted to the system. Use of the service is subject to the licensing agreement, currently between The University of Western Ontario and Turnitin.com (http://www.turnitin.com).<br />
<br />
Computer-marked multiple-choice tests and exams may be subject to submission for similarity review by software that will check for unusual coincidences in answer patterns that may indicate cheating.<br />
<br />
=== Support Services ===<br />
<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Student Accessibility Services (SAS) at 661-2147 if you have any questions regarding accommodations.<br />
<br />
The policy on Accommodation for Students with Disabilities can be found here: http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_disabilities.pdf<br />
<br />
The policy on Accommodation for Religious Holidays can be found here:<br />
http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_religious.pdf<br />
<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - '''Project Proposal Due 19 Oct at 5pm''' - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Midterm Review and Q&A<br />
** 13 Nov - Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=153Introduction to Data Science I2018-09-09T17:22:15Z<p>Dan Lizotte: /* Course outline for COMPSCI 4414A/9637A/9114A */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">Note that this course is in high demand. Now that those who submitted a proposal successfully have been registered, the course is open to all computer science students (who can register themselves online) subject to space availability. If there is space remaining after 21 September, students from other Departments and Faculties may be admitted. Those interested should attend lectures anyway.''</span><br />
<br />
<span style="color:#EE0000">'''Note that Master of Data Analytics students are exempt from this and will be registered in 9114A.'''</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Nathan Phelps - nphelps3 at uwo dot ca<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Talbot College [http://www.music.uwo.ca/pdf/resources/TC-03.pdf '''TC-205''']<br />
<!-- * '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' --><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' (Available through UWO Library)<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 10% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 10% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - '''Project Proposal Due 19 Oct at 5pm''' - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Midterm Review and Q&A<br />
** 13 Nov - Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=152Introduction to Data Science I2018-09-06T17:36:34Z<p>Dan Lizotte: /* Timeline (Tentative) */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">Note that this course is in high demand. Now that those who submitted a proposal successfully have been registered, the course is open to all computer science students (who can register themselves online) subject to space availability. If there is space remaining after 21 September, students from other Departments and Faculties may be admitted. Those interested should attend lectures anyway.''</span><br />
<br />
<span style="color:#EE0000">'''Note that Master of Data Analytics students are exempt from this and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Nathan Phelps - nphelps3 at uwo dot ca<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Talbot College [http://www.music.uwo.ca/pdf/resources/TC-03.pdf '''TC-205''']<br />
<!-- * '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' --><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' (Available through UWO Library)<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 10% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 10% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - '''Project Proposal Due 19 Oct at 5pm''' - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Midterm Review and Q&A<br />
** 13 Nov - Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=151Introduction to Data Science I2018-09-06T17:33:21Z<p>Dan Lizotte: /* Logistics */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">Note that this course is in high demand. Now that those who submitted a proposal successfully have been registered, the course is open to all computer science students (who can register themselves online) subject to space availability. If there is space remaining after 21 September, students from other Departments and Faculties may be admitted. Those interested should attend lectures anyway.''</span><br />
<br />
<span style="color:#EE0000">'''Note that Master of Data Analytics students are exempt from this and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Nathan Phelps - nphelps3 at uwo dot ca<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Talbot College [http://www.music.uwo.ca/pdf/resources/TC-03.pdf '''TC-205''']<br />
<!-- * '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' --><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' (Available through UWO Library)<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 10% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 10% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - '''Project Proposal Due 19 Oct at 5pm''' - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Brainstorming: Midterm Review and Q&A<br />
** 13 Nov - Brainstorming: Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=150Lecture Materials2018-09-06T17:14:41Z<p>Dan Lizotte: /* Lecture Materials */</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
<!--<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
--><br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=149Introduction to Data Science I2018-09-06T14:13:55Z<p>Dan Lizotte: /* Materials */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">Note that this course is in high demand. Now that those who submitted a proposal successfully have been registered, the course is open to all computer science students (who can register themselves online) subject to space availability. If there is space remaining after 21 September, students from other Departments and Faculties may be admitted. Those interested should attend lectures anyway.''</span><br />
<br />
<span style="color:#EE0000">'''Note that Master of Data Analytics students are exempt from this and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Nathan Phelps<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Talbot College [http://www.music.uwo.ca/pdf/resources/TC-03.pdf '''TC-205''']<br />
<!-- * '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' --><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' (Available through UWO Library)<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 10% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 10% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - '''Project Proposal Due 19 Oct at 5pm''' - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Brainstorming: Midterm Review and Q&A<br />
** 13 Nov - Brainstorming: Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=148Lecture Materials2018-09-06T13:55:15Z<p>Dan Lizotte: </p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
<br />
= Previous Offerings =<br />
<br />
== From F17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=147Introduction to Data Science I2018-09-04T13:47:00Z<p>Dan Lizotte: /* Course outline for COMPSCI 4414A/9637A/9114A */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">Note that this course is in high demand. Now that those who submitted a proposal successfully have been registered, the course is open to all computer science students (who can register themselves online) subject to space availability. If there is space remaining after 21 September, students from other Departments and Faculties may be admitted. Those interested should attend lectures anyway.''</span><br />
<br />
<span style="color:#EE0000">'''Note that Master of Data Analytics students are exempt from this and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Nathan Phelps<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Talbot College [http://www.music.uwo.ca/pdf/resources/TC-03.pdf '''TC-205''']<br />
<!-- * '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' --><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 10% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 10% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - '''Project Proposal Due 19 Oct at 5pm''' - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Brainstorming: Midterm Review and Q&A<br />
** 13 Nov - Brainstorming: Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=146Introduction to Data Science I2018-08-29T15:21:34Z<p>Dan Lizotte: /* Logistics */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the [https://owl.uwo.ca/x/V3CrNO OWL site] "Intro to Data Science I - Enrolment Applications."'''</span><br />
<br />
<span style="color:#EE0000">To join the site, log into OWL and go to your Home page. Choose "Membership" from the menu on the left, then click the Joinable Sites tab. Search for "Data Science" and join the site. You will then be able to submit the summary as an assignment.</span><br />
<br />
<span style="color:#EE0000">Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Nathan Phelps<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Talbot College [http://www.music.uwo.ca/pdf/resources/TC-03.pdf '''TC-205''']<br />
<!-- * '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' --><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 10% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 10% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - '''Project Proposal Due 19 Oct at 5pm''' - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Brainstorming: Midterm Review and Q&A<br />
** 13 Nov - Brainstorming: Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Project_Guidelines&diff=145Project Guidelines2018-08-03T19:35:39Z<p>Dan Lizotte: /* Final Report */</p>
<hr />
<div>== Goal ==<br />
<br />
The goal of this project is for students to gain experience in understanding a substantive problem/question, acquiring data relevant to the problem/question, and applying appropriate data science techniques in an effort to address the problem/question. The ''substantive field'' refers to the field of science (not data science) containing the problem to be addressed. Example substantive fields include medicine, chemistry, astronomy, and computer networks. All project must include a visualization component, which may be static or dynamic.<br />
<br />
== Structure and Regulations ==<br />
<br />
*Projects are to be completed in groups of two or three individuals. <br />
*The project will be submitted as three deliverables, a project [[#Proposal|proposal]] early in the term, a [[#Report Draft|draft]] partway through the term, and a final research [[#Final Report|report]] at the end of the term. '''All of these must be submitted as pdfs generated by Markdown, LaTeX, or Word; see instructions below.''' After this, each '''9637''' student will [[#Review Guidelines|review]] a subset of projects; reviews are due one week after final project submission.<br />
*All projects ''must'' be based on a dataset that is '''sufficiently interesting''' for our purposes as judged by the instructor. Note that any [http://archive.ics.uci.edu/ml/ UCI] dataset that was donated prior to 2007 is considered '''un'''interesting and is therefore disallowed.<br />
*You are encouraged to contact the instructor at any point to determine if your project topic is suitable<br />
*'''No Spam Filters. Furthermore, the Enron-Spam datasets are explicitly forbidden'''<br />
<br />
== Proposal ==<br />
<br />
For the proposal, each student will identify an applied problem (or a few related problems) that could be solved using data science methods, identify an appropriate dataset, and give a detailed plan for analyzing the data that includes what pre-processing will be required, what kind of feature development will be necessary, and what analysis and visualization methods might be applied. Don't forget to include details for how you will assess the performance of any models you build. The proposal should have '''three main headings''':<br />
<br />
* Description of Applied Problem<br />
* Description of Available Data<br />
* Plan for Analysis and Visualization<br />
<br />
The main body of the proposal document should be 2 pages long, single spaced. Page 3 and after may only contain references, tables, and figures. If you are using LaTeX, use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ CS4637/CS9637 style files], which are based on the ICML style files. There is no style file for markdown, but keep in mind that if you use Markdown, you still need to have proper references. [http://www.chriskrycho.com/2015/academic-markdown-and-citations.html This resource] may help, as might a bit of Google/StackExchange searching, but in the end the onus is on you. If using word, use 3/4" margins and a 12 point serif font.<br />
<br />
Include a brief abstract of a few sentences. '''At least two appropriate references''' must be listed for works (papers or books) that discuss and describe the applied problem, '''at least one reference''' that describes the available data (may be URL(s)) and '''at least two references''' that describe the methods you plan to explore in your analysis and visualization plan.<br />
<br />
'''Whether you are using LaTeX, Markdown, or Word, submit your proposal as a PDF file. Proposals must submitted through OWL. Late submissions will not be accepted.'''<br />
<br />
== Report Draft ==<br />
<br />
A draft of the final report will be due approximately 2/3 of the way through the term. Use Word, Markdown, or LaTeX with the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], just as you must for the final report. To ensure you get useful feedback, the draft should have a complete abstract, background section, and analysis and visualization plan. The rest of the paper should at least be sketched in, perhaps in point form, to give a sense of the final shape of the document. '''The precise content of the draft is not specified, but the more you provide, the better feedback you will get.'''<br />
<br />
'''Report drafts must be submitted <!-- to EasyChair [https://www.easychair.org/conferences/?conf=amlf14 https://www.easychair.org/conferences/?conf=amlf14] --> through OWL by 5pm on the due date. *Do not e-mail the instructor your draft.*''' Late submissions will not be accepted. <!-- Later, to submit your final report, you will simply "Update" your draft submission with a new .pdf (and maybe title.) --><br />
<br />
== Final Report ==<br />
<br />
The report must be no more than 6 pages long, single spaced, not including references. '''If you wish''', you may also include an additional appendix with an unlimited number of pages that contain '''only figures, figure captions, and tables'''. Use Word, or use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], which are based on the ICML style files, or use Markdown. Include a brief abstract. As mentioned above, all reports must include a visualization component.<br />
<br />
An outstanding report might resemble an application-focussed publication in a workshop at one of the top machine learning or AI conferences, like for example ICML or [http://www.aaai.org/Library/IAAI/iaai-library.php IAAI]. (Note however that you are required to include a visualization component, which such papers may not have.) Here are some examples. Note that just because a paper is listed here does not mean it is perfect; you must always read with a fair but critical eye.<br />
<br />
*Philip A. Warrick, Emily F. Hamilton, Robert E. Kearney, Doina Precup. [http://www.aaai.org/ocs/index.php/IAAI/IAAI10/paper/view/1597 A Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery.]<br />
*Weiss, Page, Peissig, Natarajan, and McCarty. [http://www.aaai.org/ocs/index.php/IAAI/IAAI-12/paper/view/4778/5451 Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records]<br />
*Chad Cumby, Rayid Ghani [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3528 A Machine Learning Based System for Semi-Automatically Redacting Documents.]<br />
*Mitja Luštrek, Hristijan Gjoreski, Simon Kozina, Božidara Cvetković, Violeta Mirchevska, Matjaž Gams [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/2753 Detecting Falls with Location Sensors and Accelerometers]<br />
* Ben George Weber, Michael John, Michael Mateas, Arnav Jhala [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3526/4029 Modeling Player Retention in Madden NFL 11]<br />
<br />
=== Specific expectations for the report ===<br />
<br />
'''Reproducibility''': The report '''must''' contain enough detail about the methods used to allow a future researcher to reproduce the results if they had access to the appropriate data and access to all appropriate works cited. (Some projects may use proprietary data; that is fine.) Reports that do not contain sufficient method detail will not receive full marks.<br />
<br />
'''Integrity''': The report must adhere to the standards of [http://www.lib.uwaterloo.ca/gradait/content/documents/credit_your_sources.pdf academic honesty].<br />
<br />
'''Formality''': The report should be written in formal academic language appropriate for a technical report/workshop/conference/journal publication. The author should refer to him/herself in the second person plural, i.e. using "we." ("We present a novel analysis...")<br />
<br />
'''Writing Quality''': The writing must of the quality level expected of a senior undergraduate or graduate student at a world-class university. The [http://www.sdc.uwo.ca/writing/ Writing Support Centre] at UWO can help you reach this level.<br />
<br />
== Report Submission and Reviewing ==<br />
<br />
'''Final report submissions will be done through OWL.'''<br />
<br />
Following report submission, each '''Computer Science graduate (9637)''' student will be randomly assigned two project reports to review over the week following the due date but before the end of the exam period.<br />
<br />
* The main purpose of reviewing is to provide feedback to authors that they can make use of in their future careers, which gives them a better return on the investment they have made in their course project.<br />
* The secondary purpose is to give students a view of the variety of work that has been done in the course.<br />
* '''Reviews from other students will not affect the grade of the author in any way.'''<br />
* Reviewing will be single-blind: Authors will not know who reviews their project.<br />
* Reviewers are expected to provide feedback that is '''constructive'''. Constructive feedback '''makes concrete suggestions on improving the work''' under review. Feedback that is both negative and non-constructive will not be tolerated.<br />
<br />
=== Review Guidelines ===<br />
'''Students must follow the review guidelines below. Include headings where appropriate'''<br />
<br />
* '''Summary:''' Summarize the goal of the project. What are the authors trying to achieve? Then summarize the contributions of the project in a few sentences. Describe the substantive problem, the data used, and the analysis applied. Describe the results. Note that not every project will have "good results" and for this project that is not necessarily a fault; the meta-goal of this project is for each author to gain experience with DS methods. Keep that in mind when you summarize: did the authors sufficiently explore the space of appropriate methods?<br />
* After the summary, comment on the following aspects of the report:<br />
** '''Background''': Comment on whether the report clearly explains the problem to be tackled, and whether it clearly describes how the substantive problem will be formulated as a data science problem.<br />
** '''Data''': Comment on whether you were able to clearly understand what data were available and how they were used in the analysis.<br />
** '''Analysis and Visualization''': Comment on the appropriateness of the DS methods used, and '''comment on the reproducibility of the results''' as described above. Comment on the evaluation measures use.<br />
** '''Future work''': Make some suggestions on how the work could be extended in the future.<br />
<br />
Depending on the project, these sections of the review may be longer or shorter. Use your judgement. Be sure to have at least a few interesting sentences under each heading.<br />
<br />
== Brainstorming ==<br />
<br />
A brainstorming session will consist of a 10-minute presentation by a student, followed by a class discussion for a total of 15 minutes. The presenter may choose to take questions during the talk, or save them until the end. The presentation should detail an applied problem, dataset, and potential DS methods that could be useful, much like the project proposal. The Brainstorming Session '''''may or may not''''' be on the student's project topic, but of course it may be advantageous to use your brainstorming slot to get feedback and ideas.<br />
<br />
* Guidelines<br />
** Presentations should use projected slides<br />
** Presentations should cover more or less the same topics as a project proposal: Description of Applied Problem, Description of Available Data, Plan for Analysis and Visualization<br />
** Presenters will receive a 5-minute warning, but presentations *will* be terminated at the 15-minute mark.<br />
<br />
* Evaluation (by instructor) is based on <br />
** Effective explanation of the problem<br />
** Effective explanation of the available data. It is often a good idea to show a specific example of a single "data item" from the available data, whatever that might mean for the specific project.<br />
** Effective explanation potential DS methods<br />
** Ability to answer questions about the data and the analysis and visualization plan<br />
** Working within the strict 10+5 minute timeslot<br />
<br />
In general, it is better to *show* your plan rather than tell it. Use actual examples from your dataset where possible. Show how feature vectors and any class labels/regression targets are constructed.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Project_Guidelines&diff=144Project Guidelines2018-08-03T19:29:51Z<p>Dan Lizotte: /* Goal */</p>
<hr />
<div>== Goal ==<br />
<br />
The goal of this project is for students to gain experience in understanding a substantive problem/question, acquiring data relevant to the problem/question, and applying appropriate data science techniques in an effort to address the problem/question. The ''substantive field'' refers to the field of science (not data science) containing the problem to be addressed. Example substantive fields include medicine, chemistry, astronomy, and computer networks. All project must include a visualization component, which may be static or dynamic.<br />
<br />
== Structure and Regulations ==<br />
<br />
*Projects are to be completed in groups of two or three individuals. <br />
*The project will be submitted as three deliverables, a project [[#Proposal|proposal]] early in the term, a [[#Report Draft|draft]] partway through the term, and a final research [[#Final Report|report]] at the end of the term. '''All of these must be submitted as pdfs generated by Markdown, LaTeX, or Word; see instructions below.''' After this, each '''9637''' student will [[#Review Guidelines|review]] a subset of projects; reviews are due one week after final project submission.<br />
*All projects ''must'' be based on a dataset that is '''sufficiently interesting''' for our purposes as judged by the instructor. Note that any [http://archive.ics.uci.edu/ml/ UCI] dataset that was donated prior to 2007 is considered '''un'''interesting and is therefore disallowed.<br />
*You are encouraged to contact the instructor at any point to determine if your project topic is suitable<br />
*'''No Spam Filters. Furthermore, the Enron-Spam datasets are explicitly forbidden'''<br />
<br />
== Proposal ==<br />
<br />
For the proposal, each student will identify an applied problem (or a few related problems) that could be solved using data science methods, identify an appropriate dataset, and give a detailed plan for analyzing the data that includes what pre-processing will be required, what kind of feature development will be necessary, and what analysis and visualization methods might be applied. Don't forget to include details for how you will assess the performance of any models you build. The proposal should have '''three main headings''':<br />
<br />
* Description of Applied Problem<br />
* Description of Available Data<br />
* Plan for Analysis and Visualization<br />
<br />
The main body of the proposal document should be 2 pages long, single spaced. Page 3 and after may only contain references, tables, and figures. If you are using LaTeX, use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ CS4637/CS9637 style files], which are based on the ICML style files. There is no style file for markdown, but keep in mind that if you use Markdown, you still need to have proper references. [http://www.chriskrycho.com/2015/academic-markdown-and-citations.html This resource] may help, as might a bit of Google/StackExchange searching, but in the end the onus is on you. If using word, use 3/4" margins and a 12 point serif font.<br />
<br />
Include a brief abstract of a few sentences. '''At least two appropriate references''' must be listed for works (papers or books) that discuss and describe the applied problem, '''at least one reference''' that describes the available data (may be URL(s)) and '''at least two references''' that describe the methods you plan to explore in your analysis and visualization plan.<br />
<br />
'''Whether you are using LaTeX, Markdown, or Word, submit your proposal as a PDF file. Proposals must submitted through OWL. Late submissions will not be accepted.'''<br />
<br />
== Report Draft ==<br />
<br />
A draft of the final report will be due approximately 2/3 of the way through the term. Use Word, Markdown, or LaTeX with the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], just as you must for the final report. To ensure you get useful feedback, the draft should have a complete abstract, background section, and analysis and visualization plan. The rest of the paper should at least be sketched in, perhaps in point form, to give a sense of the final shape of the document. '''The precise content of the draft is not specified, but the more you provide, the better feedback you will get.'''<br />
<br />
'''Report drafts must be submitted <!-- to EasyChair [https://www.easychair.org/conferences/?conf=amlf14 https://www.easychair.org/conferences/?conf=amlf14] --> through OWL by 5pm on the due date. *Do not e-mail the instructor your draft.*''' Late submissions will not be accepted. <!-- Later, to submit your final report, you will simply "Update" your draft submission with a new .pdf (and maybe title.) --><br />
<br />
== Final Report ==<br />
<br />
The report must be no more than 4 pages long, single spaced, not including references. '''If you wish''', you may also include an additional appendix with an unlimited number of pages that contain '''only figures, figure captions, and tables'''. Use Word, or use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], which are based on the ICML style files, or use Markdown. Include a brief abstract. As mentioned above, all reports must include a visualization component.<br />
<br />
An outstanding report might resemble an application-focussed publication in a workshop at one of the top machine learning or AI conferences, like for example ICML or [http://www.aaai.org/Library/IAAI/iaai-library.php IAAI]. (Note however that you are required to include a visualization component, which such papers may not have.) Here are some examples. Note that just because a paper is listed here does not mean it is perfect; you must always read with a fair but critical eye.<br />
<br />
*Philip A. Warrick, Emily F. Hamilton, Robert E. Kearney, Doina Precup. [http://www.aaai.org/ocs/index.php/IAAI/IAAI10/paper/view/1597 A Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery.]<br />
*Weiss, Page, Peissig, Natarajan, and McCarty. [http://www.aaai.org/ocs/index.php/IAAI/IAAI-12/paper/view/4778/5451 Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records]<br />
*Chad Cumby, Rayid Ghani [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3528 A Machine Learning Based System for Semi-Automatically Redacting Documents.]<br />
*Mitja Luštrek, Hristijan Gjoreski, Simon Kozina, Božidara Cvetković, Violeta Mirchevska, Matjaž Gams [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/2753 Detecting Falls with Location Sensors and Accelerometers]<br />
* Ben George Weber, Michael John, Michael Mateas, Arnav Jhala [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3526/4029 Modeling Player Retention in Madden NFL 11]<br />
<br />
=== Specific expectations for the report ===<br />
<br />
'''Reproducibility''': The report '''must''' contain enough detail about the methods used to allow a future researcher to reproduce the results if they had access to the appropriate data and access to all appropriate works cited. (Some projects may use proprietary data; that is fine.) Reports that do not contain sufficient method detail will not receive full marks.<br />
<br />
'''Integrity''': The report must adhere to the standards of [http://www.lib.uwaterloo.ca/gradait/content/documents/credit_your_sources.pdf academic honesty].<br />
<br />
'''Formality''': The report should be written in formal academic language appropriate for a technical report/workshop/conference/journal publication. The author should refer to him/herself in the second person plural, i.e. using "we." ("We present a novel analysis...")<br />
<br />
'''Writing Quality''': The writing must of the quality level expected of a senior undergraduate or graduate student at a world-class university. The [http://www.sdc.uwo.ca/writing/ Writing Support Centre] at UWO can help you reach this level.<br />
<br />
== Report Submission and Reviewing ==<br />
<br />
'''Final report submissions will be done through OWL.'''<br />
<br />
Following report submission, each '''Computer Science graduate (9637)''' student will be randomly assigned two project reports to review over the week following the due date but before the end of the exam period.<br />
<br />
* The main purpose of reviewing is to provide feedback to authors that they can make use of in their future careers, which gives them a better return on the investment they have made in their course project.<br />
* The secondary purpose is to give students a view of the variety of work that has been done in the course.<br />
* '''Reviews from other students will not affect the grade of the author in any way.'''<br />
* Reviewing will be single-blind: Authors will not know who reviews their project.<br />
* Reviewers are expected to provide feedback that is '''constructive'''. Constructive feedback '''makes concrete suggestions on improving the work''' under review. Feedback that is both negative and non-constructive will not be tolerated.<br />
<br />
=== Review Guidelines ===<br />
'''Students must follow the review guidelines below. Include headings where appropriate'''<br />
<br />
* '''Summary:''' Summarize the goal of the project. What are the authors trying to achieve? Then summarize the contributions of the project in a few sentences. Describe the substantive problem, the data used, and the analysis applied. Describe the results. Note that not every project will have "good results" and for this project that is not necessarily a fault; the meta-goal of this project is for each author to gain experience with DS methods. Keep that in mind when you summarize: did the authors sufficiently explore the space of appropriate methods?<br />
* After the summary, comment on the following aspects of the report:<br />
** '''Background''': Comment on whether the report clearly explains the problem to be tackled, and whether it clearly describes how the substantive problem will be formulated as a data science problem.<br />
** '''Data''': Comment on whether you were able to clearly understand what data were available and how they were used in the analysis.<br />
** '''Analysis and Visualization''': Comment on the appropriateness of the DS methods used, and '''comment on the reproducibility of the results''' as described above. Comment on the evaluation measures use.<br />
** '''Future work''': Make some suggestions on how the work could be extended in the future.<br />
<br />
Depending on the project, these sections of the review may be longer or shorter. Use your judgement. Be sure to have at least a few interesting sentences under each heading.<br />
<br />
== Brainstorming ==<br />
<br />
A brainstorming session will consist of a 10-minute presentation by a student, followed by a class discussion for a total of 15 minutes. The presenter may choose to take questions during the talk, or save them until the end. The presentation should detail an applied problem, dataset, and potential DS methods that could be useful, much like the project proposal. The Brainstorming Session '''''may or may not''''' be on the student's project topic, but of course it may be advantageous to use your brainstorming slot to get feedback and ideas.<br />
<br />
* Guidelines<br />
** Presentations should use projected slides<br />
** Presentations should cover more or less the same topics as a project proposal: Description of Applied Problem, Description of Available Data, Plan for Analysis and Visualization<br />
** Presenters will receive a 5-minute warning, but presentations *will* be terminated at the 15-minute mark.<br />
<br />
* Evaluation (by instructor) is based on <br />
** Effective explanation of the problem<br />
** Effective explanation of the available data. It is often a good idea to show a specific example of a single "data item" from the available data, whatever that might mean for the specific project.<br />
** Effective explanation potential DS methods<br />
** Ability to answer questions about the data and the analysis and visualization plan<br />
** Working within the strict 10+5 minute timeslot<br />
<br />
In general, it is better to *show* your plan rather than tell it. Use actual examples from your dataset where possible. Show how feature vectors and any class labels/regression targets are constructed.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Project_Guidelines&diff=143Project Guidelines2018-08-03T19:26:32Z<p>Dan Lizotte: /* Structure and Regulations */</p>
<hr />
<div>== Goal ==<br />
<br />
The goal of this project is for the student to gain experience in understanding a substantive problem/question, acquiring data relevant to the problem/question, and applying appropriate data science techniques in an effort to address the problem/question. Here I'm using the word ''substantive'' in the way a statistician might: the ''substantive field'' refers to the field of science (not statistical science) containing the problem to be addressed. Example substantive fields include medicine, chemistry, astronomy, and computer networks. All project must include a visualization component, which may be static or dynamic.<br />
<br />
== Structure and Regulations ==<br />
<br />
*Projects are to be completed in groups of two or three individuals. <br />
*The project will be submitted as three deliverables, a project [[#Proposal|proposal]] early in the term, a [[#Report Draft|draft]] partway through the term, and a final research [[#Final Report|report]] at the end of the term. '''All of these must be submitted as pdfs generated by Markdown, LaTeX, or Word; see instructions below.''' After this, each '''9637''' student will [[#Review Guidelines|review]] a subset of projects; reviews are due one week after final project submission.<br />
*All projects ''must'' be based on a dataset that is '''sufficiently interesting''' for our purposes as judged by the instructor. Note that any [http://archive.ics.uci.edu/ml/ UCI] dataset that was donated prior to 2007 is considered '''un'''interesting and is therefore disallowed.<br />
*You are encouraged to contact the instructor at any point to determine if your project topic is suitable<br />
*'''No Spam Filters. Furthermore, the Enron-Spam datasets are explicitly forbidden'''<br />
<br />
== Proposal ==<br />
<br />
For the proposal, each student will identify an applied problem (or a few related problems) that could be solved using data science methods, identify an appropriate dataset, and give a detailed plan for analyzing the data that includes what pre-processing will be required, what kind of feature development will be necessary, and what analysis and visualization methods might be applied. Don't forget to include details for how you will assess the performance of any models you build. The proposal should have '''three main headings''':<br />
<br />
* Description of Applied Problem<br />
* Description of Available Data<br />
* Plan for Analysis and Visualization<br />
<br />
The main body of the proposal document should be 2 pages long, single spaced. Page 3 and after may only contain references, tables, and figures. If you are using LaTeX, use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ CS4637/CS9637 style files], which are based on the ICML style files. There is no style file for markdown, but keep in mind that if you use Markdown, you still need to have proper references. [http://www.chriskrycho.com/2015/academic-markdown-and-citations.html This resource] may help, as might a bit of Google/StackExchange searching, but in the end the onus is on you. If using word, use 3/4" margins and a 12 point serif font.<br />
<br />
Include a brief abstract of a few sentences. '''At least two appropriate references''' must be listed for works (papers or books) that discuss and describe the applied problem, '''at least one reference''' that describes the available data (may be URL(s)) and '''at least two references''' that describe the methods you plan to explore in your analysis and visualization plan.<br />
<br />
'''Whether you are using LaTeX, Markdown, or Word, submit your proposal as a PDF file. Proposals must submitted through OWL. Late submissions will not be accepted.'''<br />
<br />
== Report Draft ==<br />
<br />
A draft of the final report will be due approximately 2/3 of the way through the term. Use Word, Markdown, or LaTeX with the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], just as you must for the final report. To ensure you get useful feedback, the draft should have a complete abstract, background section, and analysis and visualization plan. The rest of the paper should at least be sketched in, perhaps in point form, to give a sense of the final shape of the document. '''The precise content of the draft is not specified, but the more you provide, the better feedback you will get.'''<br />
<br />
'''Report drafts must be submitted <!-- to EasyChair [https://www.easychair.org/conferences/?conf=amlf14 https://www.easychair.org/conferences/?conf=amlf14] --> through OWL by 5pm on the due date. *Do not e-mail the instructor your draft.*''' Late submissions will not be accepted. <!-- Later, to submit your final report, you will simply "Update" your draft submission with a new .pdf (and maybe title.) --><br />
<br />
== Final Report ==<br />
<br />
The report must be no more than 4 pages long, single spaced, not including references. '''If you wish''', you may also include an additional appendix with an unlimited number of pages that contain '''only figures, figure captions, and tables'''. Use Word, or use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], which are based on the ICML style files, or use Markdown. Include a brief abstract. As mentioned above, all reports must include a visualization component.<br />
<br />
An outstanding report might resemble an application-focussed publication in a workshop at one of the top machine learning or AI conferences, like for example ICML or [http://www.aaai.org/Library/IAAI/iaai-library.php IAAI]. (Note however that you are required to include a visualization component, which such papers may not have.) Here are some examples. Note that just because a paper is listed here does not mean it is perfect; you must always read with a fair but critical eye.<br />
<br />
*Philip A. Warrick, Emily F. Hamilton, Robert E. Kearney, Doina Precup. [http://www.aaai.org/ocs/index.php/IAAI/IAAI10/paper/view/1597 A Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery.]<br />
*Weiss, Page, Peissig, Natarajan, and McCarty. [http://www.aaai.org/ocs/index.php/IAAI/IAAI-12/paper/view/4778/5451 Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records]<br />
*Chad Cumby, Rayid Ghani [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3528 A Machine Learning Based System for Semi-Automatically Redacting Documents.]<br />
*Mitja Luštrek, Hristijan Gjoreski, Simon Kozina, Božidara Cvetković, Violeta Mirchevska, Matjaž Gams [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/2753 Detecting Falls with Location Sensors and Accelerometers]<br />
* Ben George Weber, Michael John, Michael Mateas, Arnav Jhala [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3526/4029 Modeling Player Retention in Madden NFL 11]<br />
<br />
=== Specific expectations for the report ===<br />
<br />
'''Reproducibility''': The report '''must''' contain enough detail about the methods used to allow a future researcher to reproduce the results if they had access to the appropriate data and access to all appropriate works cited. (Some projects may use proprietary data; that is fine.) Reports that do not contain sufficient method detail will not receive full marks.<br />
<br />
'''Integrity''': The report must adhere to the standards of [http://www.lib.uwaterloo.ca/gradait/content/documents/credit_your_sources.pdf academic honesty].<br />
<br />
'''Formality''': The report should be written in formal academic language appropriate for a technical report/workshop/conference/journal publication. The author should refer to him/herself in the second person plural, i.e. using "we." ("We present a novel analysis...")<br />
<br />
'''Writing Quality''': The writing must of the quality level expected of a senior undergraduate or graduate student at a world-class university. The [http://www.sdc.uwo.ca/writing/ Writing Support Centre] at UWO can help you reach this level.<br />
<br />
== Report Submission and Reviewing ==<br />
<br />
'''Final report submissions will be done through OWL.'''<br />
<br />
Following report submission, each '''Computer Science graduate (9637)''' student will be randomly assigned two project reports to review over the week following the due date but before the end of the exam period.<br />
<br />
* The main purpose of reviewing is to provide feedback to authors that they can make use of in their future careers, which gives them a better return on the investment they have made in their course project.<br />
* The secondary purpose is to give students a view of the variety of work that has been done in the course.<br />
* '''Reviews from other students will not affect the grade of the author in any way.'''<br />
* Reviewing will be single-blind: Authors will not know who reviews their project.<br />
* Reviewers are expected to provide feedback that is '''constructive'''. Constructive feedback '''makes concrete suggestions on improving the work''' under review. Feedback that is both negative and non-constructive will not be tolerated.<br />
<br />
=== Review Guidelines ===<br />
'''Students must follow the review guidelines below. Include headings where appropriate'''<br />
<br />
* '''Summary:''' Summarize the goal of the project. What are the authors trying to achieve? Then summarize the contributions of the project in a few sentences. Describe the substantive problem, the data used, and the analysis applied. Describe the results. Note that not every project will have "good results" and for this project that is not necessarily a fault; the meta-goal of this project is for each author to gain experience with DS methods. Keep that in mind when you summarize: did the authors sufficiently explore the space of appropriate methods?<br />
* After the summary, comment on the following aspects of the report:<br />
** '''Background''': Comment on whether the report clearly explains the problem to be tackled, and whether it clearly describes how the substantive problem will be formulated as a data science problem.<br />
** '''Data''': Comment on whether you were able to clearly understand what data were available and how they were used in the analysis.<br />
** '''Analysis and Visualization''': Comment on the appropriateness of the DS methods used, and '''comment on the reproducibility of the results''' as described above. Comment on the evaluation measures use.<br />
** '''Future work''': Make some suggestions on how the work could be extended in the future.<br />
<br />
Depending on the project, these sections of the review may be longer or shorter. Use your judgement. Be sure to have at least a few interesting sentences under each heading.<br />
<br />
== Brainstorming ==<br />
<br />
A brainstorming session will consist of a 10-minute presentation by a student, followed by a class discussion for a total of 15 minutes. The presenter may choose to take questions during the talk, or save them until the end. The presentation should detail an applied problem, dataset, and potential DS methods that could be useful, much like the project proposal. The Brainstorming Session '''''may or may not''''' be on the student's project topic, but of course it may be advantageous to use your brainstorming slot to get feedback and ideas.<br />
<br />
* Guidelines<br />
** Presentations should use projected slides<br />
** Presentations should cover more or less the same topics as a project proposal: Description of Applied Problem, Description of Available Data, Plan for Analysis and Visualization<br />
** Presenters will receive a 5-minute warning, but presentations *will* be terminated at the 15-minute mark.<br />
<br />
* Evaluation (by instructor) is based on <br />
** Effective explanation of the problem<br />
** Effective explanation of the available data. It is often a good idea to show a specific example of a single "data item" from the available data, whatever that might mean for the specific project.<br />
** Effective explanation potential DS methods<br />
** Ability to answer questions about the data and the analysis and visualization plan<br />
** Working within the strict 10+5 minute timeslot<br />
<br />
In general, it is better to *show* your plan rather than tell it. Use actual examples from your dataset where possible. Show how feature vectors and any class labels/regression targets are constructed.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Project_Guidelines&diff=142Project Guidelines2018-08-03T19:20:54Z<p>Dan Lizotte: /* Structure and Regulations */</p>
<hr />
<div>== Goal ==<br />
<br />
The goal of this project is for the student to gain experience in understanding a substantive problem/question, acquiring data relevant to the problem/question, and applying appropriate data science techniques in an effort to address the problem/question. Here I'm using the word ''substantive'' in the way a statistician might: the ''substantive field'' refers to the field of science (not statistical science) containing the problem to be addressed. Example substantive fields include medicine, chemistry, astronomy, and computer networks. All project must include a visualization component, which may be static or dynamic.<br />
<br />
== Structure and Regulations ==<br />
<br />
*The project will be submitted as three deliverables, a project [[#Proposal|proposal]] early in the term, a [[#Report Draft|draft]] partway through the term, and a final research [[#Final Report|report]] at the end of the term. '''All of these must be submitted as pdfs generated by Markdown, LaTeX, or Word; see instructions below.''' After this, each '''9637''' student will [[#Review Guidelines|review]] a subset of projects; reviews are due one week after final project submission.<br />
*All projects ''must'' be based on a dataset that is '''sufficiently interesting''' for our purposes as judged by the instructor. Note that any [http://archive.ics.uci.edu/ml/ UCI] dataset that was donated prior to 2007 is considered '''un'''interesting and is therefore disallowed.<br />
*You are encouraged to contact the instructor at any point to determine if your project topic is suitable<br />
*'''No Spam Filters. Furthermore, the Enron-Spam datasets are explicitly forbidden'''<br />
<br />
== Proposal ==<br />
<br />
For the proposal, each student will identify an applied problem (or a few related problems) that could be solved using data science methods, identify an appropriate dataset, and give a detailed plan for analyzing the data that includes what pre-processing will be required, what kind of feature development will be necessary, and what analysis and visualization methods might be applied. Don't forget to include details for how you will assess the performance of any models you build. The proposal should have '''three main headings''':<br />
<br />
* Description of Applied Problem<br />
* Description of Available Data<br />
* Plan for Analysis and Visualization<br />
<br />
The main body of the proposal document should be 2 pages long, single spaced. Page 3 and after may only contain references, tables, and figures. If you are using LaTeX, use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ CS4637/CS9637 style files], which are based on the ICML style files. There is no style file for markdown, but keep in mind that if you use Markdown, you still need to have proper references. [http://www.chriskrycho.com/2015/academic-markdown-and-citations.html This resource] may help, as might a bit of Google/StackExchange searching, but in the end the onus is on you. If using word, use 3/4" margins and a 12 point serif font.<br />
<br />
Include a brief abstract of a few sentences. '''At least two appropriate references''' must be listed for works (papers or books) that discuss and describe the applied problem, '''at least one reference''' that describes the available data (may be URL(s)) and '''at least two references''' that describe the methods you plan to explore in your analysis and visualization plan.<br />
<br />
'''Whether you are using LaTeX, Markdown, or Word, submit your proposal as a PDF file. Proposals must submitted through OWL. Late submissions will not be accepted.'''<br />
<br />
== Report Draft ==<br />
<br />
A draft of the final report will be due approximately 2/3 of the way through the term. Use Word, Markdown, or LaTeX with the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], just as you must for the final report. To ensure you get useful feedback, the draft should have a complete abstract, background section, and analysis and visualization plan. The rest of the paper should at least be sketched in, perhaps in point form, to give a sense of the final shape of the document. '''The precise content of the draft is not specified, but the more you provide, the better feedback you will get.'''<br />
<br />
'''Report drafts must be submitted <!-- to EasyChair [https://www.easychair.org/conferences/?conf=amlf14 https://www.easychair.org/conferences/?conf=amlf14] --> through OWL by 5pm on the due date. *Do not e-mail the instructor your draft.*''' Late submissions will not be accepted. <!-- Later, to submit your final report, you will simply "Update" your draft submission with a new .pdf (and maybe title.) --><br />
<br />
== Final Report ==<br />
<br />
The report must be no more than 4 pages long, single spaced, not including references. '''If you wish''', you may also include an additional appendix with an unlimited number of pages that contain '''only figures, figure captions, and tables'''. Use Word, or use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], which are based on the ICML style files, or use Markdown. Include a brief abstract. As mentioned above, all reports must include a visualization component.<br />
<br />
An outstanding report might resemble an application-focussed publication in a workshop at one of the top machine learning or AI conferences, like for example ICML or [http://www.aaai.org/Library/IAAI/iaai-library.php IAAI]. (Note however that you are required to include a visualization component, which such papers may not have.) Here are some examples. Note that just because a paper is listed here does not mean it is perfect; you must always read with a fair but critical eye.<br />
<br />
*Philip A. Warrick, Emily F. Hamilton, Robert E. Kearney, Doina Precup. [http://www.aaai.org/ocs/index.php/IAAI/IAAI10/paper/view/1597 A Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery.]<br />
*Weiss, Page, Peissig, Natarajan, and McCarty. [http://www.aaai.org/ocs/index.php/IAAI/IAAI-12/paper/view/4778/5451 Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records]<br />
*Chad Cumby, Rayid Ghani [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3528 A Machine Learning Based System for Semi-Automatically Redacting Documents.]<br />
*Mitja Luštrek, Hristijan Gjoreski, Simon Kozina, Božidara Cvetković, Violeta Mirchevska, Matjaž Gams [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/2753 Detecting Falls with Location Sensors and Accelerometers]<br />
* Ben George Weber, Michael John, Michael Mateas, Arnav Jhala [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3526/4029 Modeling Player Retention in Madden NFL 11]<br />
<br />
=== Specific expectations for the report ===<br />
<br />
'''Reproducibility''': The report '''must''' contain enough detail about the methods used to allow a future researcher to reproduce the results if they had access to the appropriate data and access to all appropriate works cited. (Some projects may use proprietary data; that is fine.) Reports that do not contain sufficient method detail will not receive full marks.<br />
<br />
'''Integrity''': The report must adhere to the standards of [http://www.lib.uwaterloo.ca/gradait/content/documents/credit_your_sources.pdf academic honesty].<br />
<br />
'''Formality''': The report should be written in formal academic language appropriate for a technical report/workshop/conference/journal publication. The author should refer to him/herself in the second person plural, i.e. using "we." ("We present a novel analysis...")<br />
<br />
'''Writing Quality''': The writing must of the quality level expected of a senior undergraduate or graduate student at a world-class university. The [http://www.sdc.uwo.ca/writing/ Writing Support Centre] at UWO can help you reach this level.<br />
<br />
== Report Submission and Reviewing ==<br />
<br />
'''Final report submissions will be done through OWL.'''<br />
<br />
Following report submission, each '''Computer Science graduate (9637)''' student will be randomly assigned two project reports to review over the week following the due date but before the end of the exam period.<br />
<br />
* The main purpose of reviewing is to provide feedback to authors that they can make use of in their future careers, which gives them a better return on the investment they have made in their course project.<br />
* The secondary purpose is to give students a view of the variety of work that has been done in the course.<br />
* '''Reviews from other students will not affect the grade of the author in any way.'''<br />
* Reviewing will be single-blind: Authors will not know who reviews their project.<br />
* Reviewers are expected to provide feedback that is '''constructive'''. Constructive feedback '''makes concrete suggestions on improving the work''' under review. Feedback that is both negative and non-constructive will not be tolerated.<br />
<br />
=== Review Guidelines ===<br />
'''Students must follow the review guidelines below. Include headings where appropriate'''<br />
<br />
* '''Summary:''' Summarize the goal of the project. What are the authors trying to achieve? Then summarize the contributions of the project in a few sentences. Describe the substantive problem, the data used, and the analysis applied. Describe the results. Note that not every project will have "good results" and for this project that is not necessarily a fault; the meta-goal of this project is for each author to gain experience with DS methods. Keep that in mind when you summarize: did the authors sufficiently explore the space of appropriate methods?<br />
* After the summary, comment on the following aspects of the report:<br />
** '''Background''': Comment on whether the report clearly explains the problem to be tackled, and whether it clearly describes how the substantive problem will be formulated as a data science problem.<br />
** '''Data''': Comment on whether you were able to clearly understand what data were available and how they were used in the analysis.<br />
** '''Analysis and Visualization''': Comment on the appropriateness of the DS methods used, and '''comment on the reproducibility of the results''' as described above. Comment on the evaluation measures use.<br />
** '''Future work''': Make some suggestions on how the work could be extended in the future.<br />
<br />
Depending on the project, these sections of the review may be longer or shorter. Use your judgement. Be sure to have at least a few interesting sentences under each heading.<br />
<br />
== Brainstorming ==<br />
<br />
A brainstorming session will consist of a 10-minute presentation by a student, followed by a class discussion for a total of 15 minutes. The presenter may choose to take questions during the talk, or save them until the end. The presentation should detail an applied problem, dataset, and potential DS methods that could be useful, much like the project proposal. The Brainstorming Session '''''may or may not''''' be on the student's project topic, but of course it may be advantageous to use your brainstorming slot to get feedback and ideas.<br />
<br />
* Guidelines<br />
** Presentations should use projected slides<br />
** Presentations should cover more or less the same topics as a project proposal: Description of Applied Problem, Description of Available Data, Plan for Analysis and Visualization<br />
** Presenters will receive a 5-minute warning, but presentations *will* be terminated at the 15-minute mark.<br />
<br />
* Evaluation (by instructor) is based on <br />
** Effective explanation of the problem<br />
** Effective explanation of the available data. It is often a good idea to show a specific example of a single "data item" from the available data, whatever that might mean for the specific project.<br />
** Effective explanation potential DS methods<br />
** Ability to answer questions about the data and the analysis and visualization plan<br />
** Working within the strict 10+5 minute timeslot<br />
<br />
In general, it is better to *show* your plan rather than tell it. Use actual examples from your dataset where possible. Show how feature vectors and any class labels/regression targets are constructed.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=141Introduction to Data Science I2018-08-03T19:15:46Z<p>Dan Lizotte: /* Evaluation */ Updated marking scheme for 2018. Removed quiz points.</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the [https://owl.uwo.ca/x/V3CrNO OWL site] "Intro to Data Science I - Enrolment Applications."'''</span><br />
<br />
<span style="color:#EE0000">To join the site, log into OWL and go to your Home page. Choose "Membership" from the menu on the left, then click the Joinable Sites tab. Search for "Data Science" and join the site. You will then be able to submit the summary as an assignment.</span><br />
<br />
<span style="color:#EE0000">Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will co-lead a brainstorming session, and co-produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Midterm - '''4414/9114:''' 35% '''9637:''' 30% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 10% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414/9114:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 10% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - '''Project Proposal Due 19 Oct at 5pm''' - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Brainstorming: Midterm Review and Q&A<br />
** 13 Nov - Brainstorming: Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=140Introduction to Data Science I2018-08-03T19:09:57Z<p>Dan Lizotte: Updated schedule for 2018. 27 brainstroming slots</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the [https://owl.uwo.ca/x/V3CrNO OWL site] "Intro to Data Science I - Enrolment Applications."'''</span><br />
<br />
<span style="color:#EE0000">To join the site, log into OWL and go to your Home page. Choose "Membership" from the menu on the left, then click the Joinable Sites tab. Search for "Data Science" and join the site. You will then be able to submit the summary as an assignment.</span><br />
<br />
<span style="color:#EE0000">Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
=== Timeline (Tentative) ===<br />
<br />
* 6 Sep - Lectures: <br />
** 11 Sep - Lectures: <br />
* 13 Sep - Lectures: <br />
** 18 Sep - Lectures: <br />
* 20 Sep - Lectures: <br />
** 25 Sep - Lectures: <br />
* 27 Sep - Lectures: <br />
** 2 Oct - Lectures: <br />
* 4 Oct - '''Pick Brainstorming Slot by 5 Oct 5pm''' - Lectures: <br />
** ''9 Oct - '''Fall Reading Week''' ''<br />
* ''11 Oct - '''Fall Reading Week''' ''<br />
** 16 Oct - Lectures: <br />
* 18 Oct - '''Project Proposal Due 19 Oct at 5pm''' - Lectures: <br />
** 23 Oct - Lectures: <br />
* 25 Oct - Lectures: <br />
** 30 Oct - Lectures: <br />
* 1 Nov - Lectures:<br />
** 6 Nov - Lectures:<br />
<br />
* 8 Nov - Brainstorming: Midterm Review and Q&A<br />
** 13 Nov - Brainstorming: Midterm<br />
<br />
* 15 Nov - GUEST LECTURE<br />
<br />
** 20 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 22 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: 1,2,3<br />
** 27 Nov - Brainstorming: 1,2,3,4,5,6<br />
* 29 Nov - Brainstorming: 1,2,3<br />
** 4 Dec - Brainstorming: 1,2,3,4,5,6<br />
* 6 Dec - Brainstorming: 1,2,3<br />
<br />
* '''Project Document Due Friday 7 December 5pm'''<br />
* '''Reviews (graduate students only) Due Friday 14 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=139Introduction to Data Science I2018-08-01T15:15:30Z<p>Dan Lizotte: /* Peer Review – 9637 only: 5% */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the [https://owl.uwo.ca/x/V3CrNO OWL site] "Intro to Data Science I - Enrolment Applications."'''</span><br />
<br />
<span style="color:#EE0000">To join the site, log into OWL and go to your Home page. Choose "Membership" from the menu on the left, then click the Joinable Sites tab. Search for "Data Science" and join the site. You will then be able to submit the summary as an assignment.</span><br />
<br />
<span style="color:#EE0000">Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate student enrolled in CS9637''' will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=138Introduction to Data Science I2018-06-20T15:11:31Z<p>Dan Lizotte: /* Course outline for COMPSCI 4414A/9637A/9114A */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the [https://owl.uwo.ca/x/V3CrNO OWL site] "Intro to Data Science I - Enrolment Applications."'''</span><br />
<br />
<span style="color:#EE0000">To join the site, log into OWL and go to your Home page. Choose "Membership" from the menu on the left, then click the Joinable Sites tab. Search for "Data Science" and join the site. You will then be able to submit the summary as an assignment.</span><br />
<br />
<span style="color:#EE0000">Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Project_Guidelines&diff=137Project Guidelines2018-06-20T14:53:40Z<p>Dan Lizotte: /* Structure and Regulations */</p>
<hr />
<div>== Goal ==<br />
<br />
The goal of this project is for the student to gain experience in understanding a substantive problem/question, acquiring data relevant to the problem/question, and applying appropriate data science techniques in an effort to address the problem/question. Here I'm using the word ''substantive'' in the way a statistician might: the ''substantive field'' refers to the field of science (not statistical science) containing the problem to be addressed. Example substantive fields include medicine, chemistry, astronomy, and computer networks. All project must include a visualization component, which may be static or dynamic.<br />
<br />
== Structure and Regulations ==<br />
<br />
*The project will be submitted as three deliverables, a project [[#Proposal|proposal]] early in the term, a [[#Report Draft|draft]] partway through the term, and a final research [[#Final Report|report]] at the end of the term. '''All of these must be submitted as pdfs generated by Markdown, LaTeX, or Word; see instructions below.''' After this, each '''graduate''' student will [[#Review Guidelines|review]] a subset of projects; reviews are due one week after final project submission.<br />
*All projects ''must'' be based on a dataset that is '''sufficiently interesting''' for our purposes as judged by the instructor. Note that any [http://archive.ics.uci.edu/ml/ UCI] dataset that was donated prior to 2007 is considered '''un'''interesting and is therefore disallowed.<br />
*You are encouraged to contact Dan at any point to determine if your project topic is suitable<br />
*'''No Spam Filters. Furthermore, the Enron-Spam datasets are explicitly forbidden'''<br />
<br />
== Proposal ==<br />
<br />
For the proposal, each student will identify an applied problem (or a few related problems) that could be solved using data science methods, identify an appropriate dataset, and give a detailed plan for analyzing the data that includes what pre-processing will be required, what kind of feature development will be necessary, and what analysis and visualization methods might be applied. Don't forget to include details for how you will assess the performance of any models you build. The proposal should have '''three main headings''':<br />
<br />
* Description of Applied Problem<br />
* Description of Available Data<br />
* Plan for Analysis and Visualization<br />
<br />
The main body of the proposal document should be 2 pages long, single spaced. Page 3 and after may only contain references, tables, and figures. If you are using LaTeX, use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ CS4637/CS9637 style files], which are based on the ICML style files. There is no style file for markdown, but keep in mind that if you use Markdown, you still need to have proper references. [http://www.chriskrycho.com/2015/academic-markdown-and-citations.html This resource] may help, as might a bit of Google/StackExchange searching, but in the end the onus is on you. If using word, use 3/4" margins and a 12 point serif font.<br />
<br />
Include a brief abstract of a few sentences. '''At least two appropriate references''' must be listed for works (papers or books) that discuss and describe the applied problem, '''at least one reference''' that describes the available data (may be URL(s)) and '''at least two references''' that describe the methods you plan to explore in your analysis and visualization plan.<br />
<br />
'''Whether you are using LaTeX, Markdown, or Word, submit your proposal as a PDF file. Proposals must submitted through OWL. Late submissions will not be accepted.'''<br />
<br />
== Report Draft ==<br />
<br />
A draft of the final report will be due approximately 2/3 of the way through the term. Use Word, Markdown, or LaTeX with the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], just as you must for the final report. To ensure you get useful feedback, the draft should have a complete abstract, background section, and analysis and visualization plan. The rest of the paper should at least be sketched in, perhaps in point form, to give a sense of the final shape of the document. '''The precise content of the draft is not specified, but the more you provide, the better feedback you will get.'''<br />
<br />
'''Report drafts must be submitted <!-- to EasyChair [https://www.easychair.org/conferences/?conf=amlf14 https://www.easychair.org/conferences/?conf=amlf14] --> through OWL by 5pm on the due date. *Do not e-mail the instructor your draft.*''' Late submissions will not be accepted. <!-- Later, to submit your final report, you will simply "Update" your draft submission with a new .pdf (and maybe title.) --><br />
<br />
== Final Report ==<br />
<br />
The report must be no more than 4 pages long, single spaced, not including references. '''If you wish''', you may also include an additional appendix with an unlimited number of pages that contain '''only figures, figure captions, and tables'''. Use Word, or use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], which are based on the ICML style files, or use Markdown. Include a brief abstract. As mentioned above, all reports must include a visualization component.<br />
<br />
An outstanding report might resemble an application-focussed publication in a workshop at one of the top machine learning or AI conferences, like for example ICML or [http://www.aaai.org/Library/IAAI/iaai-library.php IAAI]. (Note however that you are required to include a visualization component, which such papers may not have.) Here are some examples. Note that just because a paper is listed here does not mean it is perfect; you must always read with a fair but critical eye.<br />
<br />
*Philip A. Warrick, Emily F. Hamilton, Robert E. Kearney, Doina Precup. [http://www.aaai.org/ocs/index.php/IAAI/IAAI10/paper/view/1597 A Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery.]<br />
*Weiss, Page, Peissig, Natarajan, and McCarty. [http://www.aaai.org/ocs/index.php/IAAI/IAAI-12/paper/view/4778/5451 Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records]<br />
*Chad Cumby, Rayid Ghani [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3528 A Machine Learning Based System for Semi-Automatically Redacting Documents.]<br />
*Mitja Luštrek, Hristijan Gjoreski, Simon Kozina, Božidara Cvetković, Violeta Mirchevska, Matjaž Gams [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/2753 Detecting Falls with Location Sensors and Accelerometers]<br />
* Ben George Weber, Michael John, Michael Mateas, Arnav Jhala [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3526/4029 Modeling Player Retention in Madden NFL 11]<br />
<br />
=== Specific expectations for the report ===<br />
<br />
'''Reproducibility''': The report '''must''' contain enough detail about the methods used to allow a future researcher to reproduce the results if they had access to the appropriate data and access to all appropriate works cited. (Some projects may use proprietary data; that is fine.) Reports that do not contain sufficient method detail will not receive full marks.<br />
<br />
'''Integrity''': The report must adhere to the standards of [http://www.lib.uwaterloo.ca/gradait/content/documents/credit_your_sources.pdf academic honesty].<br />
<br />
'''Formality''': The report should be written in formal academic language appropriate for a technical report/workshop/conference/journal publication. The author should refer to him/herself in the second person plural, i.e. using "we." ("We present a novel analysis...")<br />
<br />
'''Writing Quality''': The writing must of the quality level expected of a senior undergraduate or graduate student at a world-class university. The [http://www.sdc.uwo.ca/writing/ Writing Support Centre] at UWO can help you reach this level.<br />
<br />
== Report Submission and Reviewing ==<br />
<br />
'''Final report submissions will be done through OWL.'''<br />
<br />
Following report submission, each '''Computer Science graduate (9637)''' student will be randomly assigned two project reports to review over the week following the due date but before the end of the exam period.<br />
<br />
* The main purpose of reviewing is to provide feedback to authors that they can make use of in their future careers, which gives them a better return on the investment they have made in their course project.<br />
* The secondary purpose is to give students a view of the variety of work that has been done in the course.<br />
* '''Reviews from other students will not affect the grade of the author in any way.'''<br />
* Reviewing will be single-blind: Authors will not know who reviews their project.<br />
* Reviewers are expected to provide feedback that is '''constructive'''. Constructive feedback '''makes concrete suggestions on improving the work''' under review. Feedback that is both negative and non-constructive will not be tolerated.<br />
<br />
=== Review Guidelines ===<br />
'''Students must follow the review guidelines below. Include headings where appropriate'''<br />
<br />
* '''Summary:''' Summarize the goal of the project. What are the authors trying to achieve? Then summarize the contributions of the project in a few sentences. Describe the substantive problem, the data used, and the analysis applied. Describe the results. Note that not every project will have "good results" and for this project that is not necessarily a fault; the meta-goal of this project is for each author to gain experience with DS methods. Keep that in mind when you summarize: did the authors sufficiently explore the space of appropriate methods?<br />
* After the summary, comment on the following aspects of the report:<br />
** '''Background''': Comment on whether the report clearly explains the problem to be tackled, and whether it clearly describes how the substantive problem will be formulated as a data science problem.<br />
** '''Data''': Comment on whether you were able to clearly understand what data were available and how they were used in the analysis.<br />
** '''Analysis and Visualization''': Comment on the appropriateness of the DS methods used, and '''comment on the reproducibility of the results''' as described above. Comment on the evaluation measures use.<br />
** '''Future work''': Make some suggestions on how the work could be extended in the future.<br />
<br />
Depending on the project, these sections of the review may be longer or shorter. Use your judgement. Be sure to have at least a few interesting sentences under each heading.<br />
<br />
== Brainstorming ==<br />
<br />
A brainstorming session will consist of a 10-minute presentation by a student, followed by a class discussion for a total of 15 minutes. The presenter may choose to take questions during the talk, or save them until the end. The presentation should detail an applied problem, dataset, and potential DS methods that could be useful, much like the project proposal. The Brainstorming Session '''''may or may not''''' be on the student's project topic, but of course it may be advantageous to use your brainstorming slot to get feedback and ideas.<br />
<br />
* Guidelines<br />
** Presentations should use projected slides<br />
** Presentations should cover more or less the same topics as a project proposal: Description of Applied Problem, Description of Available Data, Plan for Analysis and Visualization<br />
** Presenters will receive a 5-minute warning, but presentations *will* be terminated at the 15-minute mark.<br />
<br />
* Evaluation (by instructor) is based on <br />
** Effective explanation of the problem<br />
** Effective explanation of the available data. It is often a good idea to show a specific example of a single "data item" from the available data, whatever that might mean for the specific project.<br />
** Effective explanation potential DS methods<br />
** Ability to answer questions about the data and the analysis and visualization plan<br />
** Working within the strict 10+5 minute timeslot<br />
<br />
In general, it is better to *show* your plan rather than tell it. Use actual examples from your dataset where possible. Show how feature vectors and any class labels/regression targets are constructed.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=136Introduction to Data Science I2018-06-20T14:48:23Z<p>Dan Lizotte: /* Course outline for COMPSCI 4414A/9637A/9114A */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the [https://owl.uwo.ca/x/V3CrNO OWL site] "Intro to Data Science I - Enrolment Applications.'''</span><br />
<br />
<span style="color:#EE0000">To join the site, log into OWL and go to your Home page. Choose "Membership" from the menu on the left, then click the Joinable Sites tab. Search for "Data Science" and join the site. You will then be able to submit the summary as an assignment.</span><br />
<br />
<span style="color:#EE0000">Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=135Introduction to Data Science I2018-06-20T14:46:52Z<p>Dan Lizotte: /* Course outline for COMPSCI 4414A/9637A/9114A */</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the [https://owl.uwo.ca/x/V3CrNO OWL site] "Intro to Data Science I - Enrolment Applications.''' <br />
<br />
To join the site, log into OWL and go to your Home page. Choose "Membership" from the menu on the left, then click the Joinable Sites tab. Search for "Data Science" and join the site. You will then be able to submit the summary as an assignment.<br />
<br />
Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=134Introduction to Data Science I2018-06-14T16:39:48Z<p>Dan Lizotte: /* Course outline for COMPSCI 4414A/9637A/9114A */ had wrong OWL url</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the [https://owl.uwo.ca/x/V3CrNO OWL site] "Intro to Data Science I - Enrolment Applications.''' Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=133Introduction to Data Science I2018-06-14T16:36:42Z<p>Dan Lizotte: /* Course outline for COMPSCI 4414A/9637A/9114A */ linked to enrolment owl site</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the [https://owl.uwo.ca/x/kuLdBO OWL site] "Intro to Data Science I - Enrolment Applications.''' Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=132Introduction to Data Science I2018-06-14T16:34:05Z<p>Dan Lizotte: /* Course outline for COMPSCI 4414A/9637A/9114A */ Took away reference to proposal guidelines</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the OWL site "Intro to Data Science I - Enrolment Applications.''' Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=131Introduction to Data Science I2018-06-14T16:31:58Z<p>Dan Lizotte: /* Course outline for COMPSCI 4414A/9637A/9114A */ added requirement for name, id, programme</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the OWL site "Intro to Data Science I - Enrolment Applications.''' (See the Proposal Guidelines for the general idea.) Ensure that your 1/2 page summary document includes your name, programme, and student number. This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=130Introduction to Data Science I2018-06-14T16:27:00Z<p>Dan Lizotte: Updating with enrolment application requirement, updated some dates to 2018. Weekly sched needs updating still.</p>
<hr />
<div>== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2018<br />'''<br />
<br />
<span style="color:#EE0000">This is a very high-demand course that interests students in various programs across campus. The diversity of backgrounds assembled in the class makes for a better learning experience for all; however, space is limited. Because of the volume of requests I receive, I am not able to manage a wait list. '''All students who who wish to register for the course must submit a written a 1/2 page proposal sketch on the project they would like to pursue to the OWL site "Intro to Data Science I - Enrolment Applications.''' (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 31 July 2018 and does not guarantee enrolment. Enrolment will be decided based on space available, quality of the proposal sketch, and program. '''Note that Master of Data Analytics students are exempt from this requirement and will be registered in 9114A.'''</span><br />
<br />
<span style="color:#EE0000">'''THE CONTENT BELOW IS NOT FINALISED AND MAY CHANGE</span><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''wrangle''' - be willing to [https://en.wikipedia.org/wiki/Data_wrangling wrangle] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
0.5 course from Biology 2244A/B, Statistical Sciences 2035, Statistical Sciences 2141A/B, Statistical Sciences 2143A/B, Statistical Sciences 2244A/B or Statistical Sciences 2858A/B; 1.0 course from Computer Science 1025A/B, Computer Science 1026A/B, Computer Science 1027A/B, Computer Science 1037A/B, Computer Science 2120A/B, Computer Science 2121A/B, Digital Humanities 2220A/B, Digital Humanities 2221A/B, Engineering Science 1036A/B; and 0.5 course from Mathematics 1229A/B, Mathematics 1600A/B, Applied Mathematics 1411A/B; '''and written permission of the Department obtained by applying as above.'''<br />
<br />
=== Logistics ===<br />
To be determined.<br />
<-- <br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320'''<br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication. --><br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 5 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 26 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 16 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 7 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 14 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=127Lecture Materials2017-11-24T21:58:01Z<p>Dan Lizotte: Added classification performance evaluation materials</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
'''Materials with associated video lectures (see OWL)'''<br />
<br />
* Classification Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/10_Classification%20Performance%20Evaluation/classification_performance_evaluation.pdf pdf] ]<br />
<br />
<br />
= Previous Offerings =<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=126Introduction to Data Science I2017-11-23T19:17:30Z<p>Dan Lizotte: /* Timeline (Tentative) */</p>
<hr />
<div><br />
<br />
== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2017<br />'''<br />
<br />
'''From Dan:''' This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. <span style="color:#EE0000">Because of the volume of requests I receive, I am not able to manage a wait list. Students will have to monitor the registration website for available spots. However, all are welcome to sit in the room if there is space.</span>'''<br />
<!-- <span style="color:#EE0000">Therefore, '''all ''graduate'' students who are ''not'' in the MSc or PhD programme within the Department of Computer Science, and who are not in the MDA programme, must e-mail me a 1/2 page proposal sketch on the project they would like to pursue. (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 15 December 2016 and does not guarantee enrolment. Enrolment will be decided based on space available and quality of the proposal sketches.</span>''' --><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''hack''' - be willing to [http://en.wikipedia.org/wiki/Data_munging munge] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students.<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' <!-- in Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC320''']--><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 6 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 27 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 17 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 8 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 15 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
== Timeline (Tentative) ==<br />
<br />
* 7 Sep - Lectures: Welcome<br />
** 12 Sep - Lectures: Data Preparation, Introduction to Statistics<br />
* 14 Sep - Lectures: Introduction to Statistics<br />
** 19 Sep - Lectures: Supervised Learning<br />
* 21 Sep - Lectures: Supervised Learning, Performance Evaluation<br />
** 26 Sep - Lectures: Performance Evaluation, Model Selection<br />
* 28 Sep - Lectures: Classification<br />
** 3 Oct - Lectures: Classification, Performance Evaluation for Classification<br />
* 5 Oct - '''Pick Brainstorming Slot by 6 Oct 5pm''' - Lectures: Nonlinear Classification<br />
** ''10 Oct - '''Fall Reading Week''' ''<br />
* ''12 Oct - '''Fall Reading Week''' ''<br />
** 17 Oct - Lectures: <br />
* 19 Oct - Lectures: '''Guest Lecture by Amanda Holden''' of SAS. Topic TBA.<br />
** 24 Oct - Lectures: <br />
* 26 Oct - '''Project Proposal Due 27 Oct at 5pm''' - Lectures: '''Guest Lecture by Dr. Kemi Ola''' on Visualization<br />
** 31 Oct - Lectures: <br />
* 2 Nov - Lectures: Midterm Review/Q&A<br />
** 7 Nov - '''Midterm'''<br />
* 9 Nov - Brainstorming: Ethan Jackson, *Zaid Albirawi* <br />'''9637 Slots 3:30pm-4:30pm''': Mahtab Ahmed, *Nick DelBen*<br />
** 14 Nov - Brainstorming: Ashutosh Mishra, Brandon Glied-Goldstein, Jonathan Tan, Duff Jones, Patrick Carnahan, Nathan Phelps<br />
* 16 Nov - Brainstorming: *slot1*, Gurpreet Singh, Erica Yarmol-Matusiak<br />'''9637 Slots 3:30pm-4:30pm''': Ruoxi Shi, Valeria Cesar, Mingda Sun, Xindi Wang<br />
** 21 Nov - Brainstorming: Cole Fisher, Xiaoyu Yang & Sachi Elkerton, Felipe Urra, Tianzhi Zhu<br />
* 23 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: Nanditha Rao, Jumayel Islam, Sabyasachi Patjoshi<br />'''9637 Slots 3:30pm-4:30pm''': *Hao Jiang*, *Abdelkareem Jaradat*, *Debanjan Guha Roy*<br />
** 28 Nov - Brainstorming: Yancong Wang & Jiayi JI, Mohammad, Angela Zhao & Yanbing Zhu, Yu Zhu, Gagan Verma & Kerlin Lobo, Zeyu Wang<br />
* 30 Nov - Brainstorming: *Marios-Stavros Grigoriou*, Roopa Bose, *Paul Bartlett*<br />'''9637 Slots 3:30pm-4:30pm''': '''CANCELLED'''<br />
** 5 Dec - Brainstorming: (Sanjay Ghanathey, Jenna Le, Tanvi Kumar), *Kun Xie*, *Nasim Samei*, *Jacob Hunte*, *Rifayat Samee*<br />
* 7 Dec - Brainstorming: *Nima khairdoodt*, *Sana Ahmadi*, *Mohsen shirpour*<br />'''9637 Slots 3:30pm-4:30pm''': *Hengyu Yue*, *Zhongwen Zhang*, *Yifang Liu*, *Andrew Bloch-Hansen*<br />
<br />
* '''Project Document Due Friday 8 December 5pm'''<br />
* '''Reviews (graduate students only) Due Thursday 15 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=125Introduction to Data Science I2017-11-23T19:09:20Z<p>Dan Lizotte: /* Timeline (Tentative) */</p>
<hr />
<div><br />
<br />
== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2017<br />'''<br />
<br />
'''From Dan:''' This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. <span style="color:#EE0000">Because of the volume of requests I receive, I am not able to manage a wait list. Students will have to monitor the registration website for available spots. However, all are welcome to sit in the room if there is space.</span>'''<br />
<!-- <span style="color:#EE0000">Therefore, '''all ''graduate'' students who are ''not'' in the MSc or PhD programme within the Department of Computer Science, and who are not in the MDA programme, must e-mail me a 1/2 page proposal sketch on the project they would like to pursue. (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 15 December 2016 and does not guarantee enrolment. Enrolment will be decided based on space available and quality of the proposal sketches.</span>''' --><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''hack''' - be willing to [http://en.wikipedia.org/wiki/Data_munging munge] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students.<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' <!-- in Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC320''']--><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 6 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 27 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 17 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 8 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 15 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
== Timeline (Tentative) ==<br />
<br />
* 7 Sep - Lectures: Welcome<br />
** 12 Sep - Lectures: Data Preparation, Introduction to Statistics<br />
* 14 Sep - Lectures: Introduction to Statistics<br />
** 19 Sep - Lectures: Supervised Learning<br />
* 21 Sep - Lectures: Supervised Learning, Performance Evaluation<br />
** 26 Sep - Lectures: Performance Evaluation, Model Selection<br />
* 28 Sep - Lectures: Classification<br />
** 3 Oct - Lectures: Classification, Performance Evaluation for Classification<br />
* 5 Oct - '''Pick Brainstorming Slot by 6 Oct 5pm''' - Lectures: Nonlinear Classification<br />
** ''10 Oct - '''Fall Reading Week''' ''<br />
* ''12 Oct - '''Fall Reading Week''' ''<br />
** 17 Oct - Lectures: <br />
* 19 Oct - Lectures: '''Guest Lecture by Amanda Holden''' of SAS. Topic TBA.<br />
** 24 Oct - Lectures: <br />
* 26 Oct - '''Project Proposal Due 27 Oct at 5pm''' - Lectures: '''Guest Lecture by Dr. Kemi Ola''' on Visualization<br />
** 31 Oct - Lectures: <br />
* 2 Nov - Lectures: Midterm Review/Q&A<br />
** 7 Nov - '''Midterm'''<br />
* 9 Nov - Brainstorming: Ethan Jackson, *Zaid Albirawi* <br />'''9637 Slots 3:30pm-4:30pm''': Mahtab Ahmed, *Nick DelBen*<br />
** 14 Nov - Brainstorming: Ashutosh Mishra, Brandon Glied-Goldstein, Jonathan Tan, Duff Jones, Patrick Carnahan, Nathan Phelps<br />
* 16 Nov - Brainstorming: *slot1*, Gurpreet Singh, Erica Yarmol-Matusiak<br />'''9637 Slots 3:30pm-4:30pm''': Ruoxi Shi, Valeria Cesar, Mingda Sun, Xindi Wang<br />
** 21 Nov - Brainstorming: Cole Fisher, Xiaoyu Yang & Sachi Elkerton, Felipe Urra, Tianzhi Zhu<br />
* 23 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: Nanditha Rao, Jumayel Islam, Sabyasachi Patjoshi<br />'''9637 Slots 3:30pm-4:30pm''': *Hao Jiang*, *Abdelkareem Jaradat*, *Debanjan Guha Roy*<br />
** 28 Nov - Brainstorming: *Yancong Wang & Jiayi JI*, Mohammad, Angela Zhao & Yanbing Zhu, Yu Zhu, *Gagan Verma*, *Zeyu Wang*<br />
* 30 Nov - Brainstorming: *Marios-Stavros Grigoriou*, *slot*, *Paul Bartlett*<br />'''9637 Slots 3:30pm-4:30pm''': '''CANCELLED'''<br />
** 5 Dec - Brainstorming: (Sanjay Ghanathey, Jenna Le, Tanvi Kumar), *Kun Xie*, *Nasim Samei*, *Jacob Hunte*, *Rifayat Samee*<br />
* 7 Dec - Brainstorming: *Nima khairdoodt*, *Sana Ahmadi*, *Mohsen shirpour*<br />'''9637 Slots 3:30pm-4:30pm''': *Hengyu Yue*, *Zhongwen Zhang*, *Yifang Liu*, *Andrew Bloch-Hansen*<br />
<br />
* '''Project Document Due Friday 8 December 5pm'''<br />
* '''Reviews (graduate students only) Due Thursday 15 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=124Introduction to Data Science I2017-11-21T20:11:32Z<p>Dan Lizotte: /* Timeline (Tentative) */</p>
<hr />
<div><br />
<br />
== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2017<br />'''<br />
<br />
'''From Dan:''' This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. <span style="color:#EE0000">Because of the volume of requests I receive, I am not able to manage a wait list. Students will have to monitor the registration website for available spots. However, all are welcome to sit in the room if there is space.</span>'''<br />
<!-- <span style="color:#EE0000">Therefore, '''all ''graduate'' students who are ''not'' in the MSc or PhD programme within the Department of Computer Science, and who are not in the MDA programme, must e-mail me a 1/2 page proposal sketch on the project they would like to pursue. (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 15 December 2016 and does not guarantee enrolment. Enrolment will be decided based on space available and quality of the proposal sketches.</span>''' --><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''hack''' - be willing to [http://en.wikipedia.org/wiki/Data_munging munge] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students.<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' <!-- in Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC320''']--><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 6 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 27 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 17 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 8 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 15 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
== Timeline (Tentative) ==<br />
<br />
* 7 Sep - Lectures: Welcome<br />
** 12 Sep - Lectures: Data Preparation, Introduction to Statistics<br />
* 14 Sep - Lectures: Introduction to Statistics<br />
** 19 Sep - Lectures: Supervised Learning<br />
* 21 Sep - Lectures: Supervised Learning, Performance Evaluation<br />
** 26 Sep - Lectures: Performance Evaluation, Model Selection<br />
* 28 Sep - Lectures: Classification<br />
** 3 Oct - Lectures: Classification, Performance Evaluation for Classification<br />
* 5 Oct - '''Pick Brainstorming Slot by 6 Oct 5pm''' - Lectures: Nonlinear Classification<br />
** ''10 Oct - '''Fall Reading Week''' ''<br />
* ''12 Oct - '''Fall Reading Week''' ''<br />
** 17 Oct - Lectures: <br />
* 19 Oct - Lectures: '''Guest Lecture by Amanda Holden''' of SAS. Topic TBA.<br />
** 24 Oct - Lectures: <br />
* 26 Oct - '''Project Proposal Due 27 Oct at 5pm''' - Lectures: '''Guest Lecture by Dr. Kemi Ola''' on Visualization<br />
** 31 Oct - Lectures: <br />
* 2 Nov - Lectures: Midterm Review/Q&A<br />
** 7 Nov - '''Midterm'''<br />
* 9 Nov - Brainstorming: Ethan Jackson, *Zaid Albirawi* <br />'''9637 Slots 3:30pm-4:30pm''': Mahtab Ahmed, *Nick DelBen*<br />
** 14 Nov - Brainstorming: Ashutosh Mishra, Brandon Glied-Goldstein, Jonathan Tan, Duff Jones, Patrick Carnahan, Nathan Phelps<br />
* 16 Nov - Brainstorming: *slot1*, Gurpreet Singh, Erica Yarmol-Matusiak<br />'''9637 Slots 3:30pm-4:30pm''': Ruoxi Shi, Valeria Cesar, Mingda Sun, Xindi Wang<br />
** 21 Nov - Brainstorming: Cole Fisher, Xiaoyu Yang & Sachi Elkerton, Felipe Urra, Tianzhi Zhu<br />
* 23 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: Nanditha Rao, Jumayel Islam, Sabyasachi Patjoshi<br />'''9637 Slots 3:30pm-4:30pm''': Roopa Bose, *Hao Jiang*, *Abdelkareem Jaradat*, *Debanjan Guha Roy*<br />
** 28 Nov - Brainstorming: *Yancong Wang & Jiayi JI*, Mohammad, Angela Zhao & Yanbing Zhu, Yu Zhu, *Gagan Verma*, *Zeyu Wang*<br />
* 30 Nov - Brainstorming: *Marios-Stavros Grigoriou*, *slot*, *Paul Bartlett*<br />'''9637 Slots 3:30pm-4:30pm''': '''CANCELLED'''<br />
** 5 Dec - Brainstorming: (Sanjay Ghanathey, Jenna Le, Tanvi Kumar), *Kun Xie*, *Nasim Samei*, *Jacob Hunte*, *Rifayat Samee*<br />
* 7 Dec - Brainstorming: *Nima khairdoodt*, *Sana Ahmadi*, *Mohsen shirpour*<br />'''9637 Slots 3:30pm-4:30pm''': *Hengyu Yue*, *Zhongwen Zhang*, *Yifang Liu*, *Andrew Bloch-Hansen*<br />
<br />
* '''Project Document Due Friday 8 December 5pm'''<br />
* '''Reviews (graduate students only) Due Thursday 15 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=120Introduction to Data Science I2017-11-15T20:48:11Z<p>Dan Lizotte: /* Timeline (Tentative) */ Updated draft due date</p>
<hr />
<div><br />
<br />
== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2017<br />'''<br />
<br />
'''From Dan:''' This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. <span style="color:#EE0000">Because of the volume of requests I receive, I am not able to manage a wait list. Students will have to monitor the registration website for available spots. However, all are welcome to sit in the room if there is space.</span>'''<br />
<!-- <span style="color:#EE0000">Therefore, '''all ''graduate'' students who are ''not'' in the MSc or PhD programme within the Department of Computer Science, and who are not in the MDA programme, must e-mail me a 1/2 page proposal sketch on the project they would like to pursue. (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 15 December 2016 and does not guarantee enrolment. Enrolment will be decided based on space available and quality of the proposal sketches.</span>''' --><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''hack''' - be willing to [http://en.wikipedia.org/wiki/Data_munging munge] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students.<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' <!-- in Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC320''']--><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 6 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 27 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 17 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 8 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 15 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
== Timeline (Tentative) ==<br />
<br />
* 7 Sep - Lectures: Welcome<br />
** 12 Sep - Lectures: Data Preparation, Introduction to Statistics<br />
* 14 Sep - Lectures: Introduction to Statistics<br />
** 19 Sep - Lectures: Supervised Learning<br />
* 21 Sep - Lectures: Supervised Learning, Performance Evaluation<br />
** 26 Sep - Lectures: Performance Evaluation, Model Selection<br />
* 28 Sep - Lectures: Classification<br />
** 3 Oct - Lectures: Classification, Performance Evaluation for Classification<br />
* 5 Oct - '''Pick Brainstorming Slot by 6 Oct 5pm''' - Lectures: Nonlinear Classification<br />
** ''10 Oct - '''Fall Reading Week''' ''<br />
* ''12 Oct - '''Fall Reading Week''' ''<br />
** 17 Oct - Lectures: <br />
* 19 Oct - Lectures: '''Guest Lecture by Amanda Holden''' of SAS. Topic TBA.<br />
** 24 Oct - Lectures: <br />
* 26 Oct - '''Project Proposal Due 27 Oct at 5pm''' - Lectures: '''Guest Lecture by Dr. Kemi Ola''' on Visualization<br />
** 31 Oct - Lectures: <br />
* 2 Nov - Lectures: Midterm Review/Q&A<br />
** 7 Nov - '''Midterm'''<br />
* 9 Nov - Brainstorming: Ethan Jackson, *Zaid Albirawi* <br />'''9637 Slots 3:30pm-4:30pm''': Mahtab Ahmed, *Nick DelBen*<br />
** 14 Nov - Brainstorming: Ashutosh Mishra, Brandon Glied-Goldstein, Jonathan Tan, Duff Jones, Patrick Carnahan, Nathan Phelps<br />
* 16 Nov - Brainstorming: *slot1*, Gurpreet Singh, Erica Yarmol-Matusiak<br />'''9637 Slots 3:30pm-4:30pm''': Ruoxi Shi, Valeria Cesar, Mingda Sun, Xindi Wang<br />
** 21 Nov - Brainstorming: Cole Fisher, Angela Zhao, *Xiaoyu Yang & Sachi Elkerton*, Nanditha Rao, Felipe Urra, *TianzhiZhu*<br />
* 23 Nov - '''Project Draft Due 24 Nov at 5pm''' - Brainstorming: *slot*, Jumayel Islam, Sabyasachi Patjoshi<br />'''9637 Slots 3:30pm-4:30pm''': Roopa Bose, *Hao Jiang*, *Abdelkareem Jaradat*, *Debanjan Guha Roy*<br />
** 28 Nov - Brainstorming: *Yancong Wang*, Mohammad, Yanbing Zhu, Yu Zhu, *Gagan Verma*, *Zeyu Wang*<br />
* 30 Nov - Brainstorming: *Marios-Stavros Grigoriou*, *Jiayi Ji*, *Paul Bartlett*<br />'''9637 Slots 3:30pm-4:30pm''': '''CANCELLED'''<br />
** 5 Dec - Brainstorming: *Sanjay Ghanathey*, *Jenna Le*, *Kun Xie*, *Nasim Samei*, *Jacob Hunte*, *Rifayat Samee*<br />
* 7 Dec - Brainstorming: *Nima khairdoodt*, *Sana Ahmadi*, *Mohsen shirpour*<br />'''9637 Slots 3:30pm-4:30pm''': *Hengyu Yue*, *Zhongwen Zhang*, *Yifang Liu*, *Andrew Bloch-Hansen*<br />
<br />
* '''Project Document Due Friday 8 December 5pm'''<br />
* '''Reviews (graduate students only) Due Thursday 15 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Project_Guidelines&diff=118Project Guidelines2017-11-10T19:46:39Z<p>Dan Lizotte: /* Report Submission and Reviewing */</p>
<hr />
<div>== Goal ==<br />
<br />
The goal of this project is for the student to gain experience in understanding a substantive problem/question, acquiring data relevant to the problem/question, and applying appropriate data science techniques in an effort to address the problem/question. Here I'm using the word ''substantive'' in the way a statistician might: the ''substantive field'' refers to the field of science (not statistical science) containing the problem to be addressed. Example substantive fields include medicine, chemistry, astronomy, and computer networks. All project must include a visualization component, which may be static or dynamic.<br />
<br />
== Structure and Regulations ==<br />
<br />
*The project will be submitted as three deliverables, a project [[#Proposal|proposal]] early in the term, a [[#Report Draft|draft]] partway through the term, and a final research [[#Final Report|report]] at the end of the term. '''All of these must be submitted as pdfs generated by Markdown, LaTeX, or Word; see instructions below.''' After this, each '''graduate''' student will [[#Review Guidelines|review]] a subset of projects; reviews are due one week after final project submission.<br />
*Projects are to be completed '''individually'''.<br />
*All projects ''must'' be based on a dataset that is '''sufficiently interesting''' for our purposes as judged by the instructor. Note that any [http://archive.ics.uci.edu/ml/ UCI] dataset that was donated prior to 2007 is considered '''un'''interesting and is therefore disallowed.<br />
*You are encouraged to contact Dan at any point to determine if your project topic is suitable<br />
*'''No Spam Filters. Furthermore, the Enron-Spam datasets are explicitly forbidden'''<br />
<br />
== Proposal ==<br />
<br />
For the proposal, each student will identify an applied problem (or a few related problems) that could be solved using data science methods, identify an appropriate dataset, and give a detailed plan for analyzing the data that includes what pre-processing will be required, what kind of feature development will be necessary, and what analysis and visualization methods might be applied. Don't forget to include details for how you will assess the performance of any models you build. The proposal should have '''three main headings''':<br />
<br />
* Description of Applied Problem<br />
* Description of Available Data<br />
* Plan for Analysis and Visualization<br />
<br />
The main body of the proposal document should be 2 pages long, single spaced. Page 3 and after may only contain references, tables, and figures. If you are using LaTeX, use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ CS4637/CS9637 style files], which are based on the ICML style files. There is no style file for markdown, but keep in mind that if you use Markdown, you still need to have proper references. [http://www.chriskrycho.com/2015/academic-markdown-and-citations.html This resource] may help, as might a bit of Google/StackExchange searching, but in the end the onus is on you. If using word, use 3/4" margins and a 12 point serif font.<br />
<br />
Include a brief abstract of a few sentences. '''At least two appropriate references''' must be listed for works (papers or books) that discuss and describe the applied problem, '''at least one reference''' that describes the available data (may be URL(s)) and '''at least two references''' that describe the methods you plan to explore in your analysis and visualization plan.<br />
<br />
'''Whether you are using LaTeX, Markdown, or Word, submit your proposal as a PDF file. Proposals must submitted through OWL. Late submissions will not be accepted.'''<br />
<br />
== Report Draft ==<br />
<br />
A draft of the final report will be due approximately 2/3 of the way through the term. Use Word, Markdown, or LaTeX with the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], just as you must for the final report. To ensure you get useful feedback, the draft should have a complete abstract, background section, and analysis and visualization plan. The rest of the paper should at least be sketched in, perhaps in point form, to give a sense of the final shape of the document. '''The precise content of the draft is not specified, but the more you provide, the better feedback you will get.'''<br />
<br />
'''Report drafts must be submitted <!-- to EasyChair [https://www.easychair.org/conferences/?conf=amlf14 https://www.easychair.org/conferences/?conf=amlf14] --> through OWL by 5pm on the due date. *Do not e-mail the instructor your draft.*''' Late submissions will not be accepted. <!-- Later, to submit your final report, you will simply "Update" your draft submission with a new .pdf (and maybe title.) --><br />
<br />
== Final Report ==<br />
<br />
The report must be no more than 4 pages long, single spaced, not including references. '''If you wish''', you may also include an additional appendix with an unlimited number of pages that contain '''only figures, figure captions, and tables'''. Use Word, or use the [http://www.csd.uwo.ca/~dlizotte/teaching/stylefiles/ style files], which are based on the ICML style files, or use Markdown. Include a brief abstract. As mentioned above, all reports must include a visualization component.<br />
<br />
An outstanding report might resemble an application-focussed publication in a workshop at one of the top machine learning or AI conferences, like for example ICML or [http://www.aaai.org/Library/IAAI/iaai-library.php IAAI]. (Note however that you are required to include a visualization component, which such papers may not have.) Here are some examples. Note that just because a paper is listed here does not mean it is perfect; you must always read with a fair but critical eye.<br />
<br />
*Philip A. Warrick, Emily F. Hamilton, Robert E. Kearney, Doina Precup. [http://www.aaai.org/ocs/index.php/IAAI/IAAI10/paper/view/1597 A Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery.]<br />
*Weiss, Page, Peissig, Natarajan, and McCarty. [http://www.aaai.org/ocs/index.php/IAAI/IAAI-12/paper/view/4778/5451 Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records]<br />
*Chad Cumby, Rayid Ghani [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3528 A Machine Learning Based System for Semi-Automatically Redacting Documents.]<br />
*Mitja Luštrek, Hristijan Gjoreski, Simon Kozina, Božidara Cvetković, Violeta Mirchevska, Matjaž Gams [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/2753 Detecting Falls with Location Sensors and Accelerometers]<br />
* Ben George Weber, Michael John, Michael Mateas, Arnav Jhala [http://www.aaai.org/ocs/index.php/IAAI/IAAI-11/paper/view/3526/4029 Modeling Player Retention in Madden NFL 11]<br />
<br />
=== Specific expectations for the report ===<br />
<br />
'''Reproducibility''': The report '''must''' contain enough detail about the methods used to allow a future researcher to reproduce the results if they had access to the appropriate data and access to all appropriate works cited. (Some projects may use proprietary data; that is fine.) Reports that do not contain sufficient method detail will not receive full marks.<br />
<br />
'''Integrity''': The report must adhere to the standards of [http://www.lib.uwaterloo.ca/gradait/content/documents/credit_your_sources.pdf academic honesty].<br />
<br />
'''Formality''': The report should be written in formal academic language appropriate for a technical report/workshop/conference/journal publication. The author should refer to him/herself in the second person plural, i.e. using "we." ("We present a novel analysis...")<br />
<br />
'''Writing Quality''': The writing must of the quality level expected of a senior undergraduate or graduate student at a world-class university. The [http://www.sdc.uwo.ca/writing/ Writing Support Centre] at UWO can help you reach this level.<br />
<br />
== Report Submission and Reviewing ==<br />
<br />
'''Final report submissions will be done through OWL.'''<br />
<br />
Following report submission, each '''Computer Science graduate (9637)''' student will be randomly assigned two project reports to review over the week following the due date but before the end of the exam period.<br />
<br />
* The main purpose of reviewing is to provide feedback to authors that they can make use of in their future careers, which gives them a better return on the investment they have made in their course project.<br />
* The secondary purpose is to give students a view of the variety of work that has been done in the course.<br />
* '''Reviews from other students will not affect the grade of the author in any way.'''<br />
* Reviewing will be single-blind: Authors will not know who reviews their project.<br />
* Reviewers are expected to provide feedback that is '''constructive'''. Constructive feedback '''makes concrete suggestions on improving the work''' under review. Feedback that is both negative and non-constructive will not be tolerated.<br />
<br />
=== Review Guidelines ===<br />
'''Students must follow the review guidelines below. Include headings where appropriate'''<br />
<br />
* '''Summary:''' Summarize the goal of the project. What are the authors trying to achieve? Then summarize the contributions of the project in a few sentences. Describe the substantive problem, the data used, and the analysis applied. Describe the results. Note that not every project will have "good results" and for this project that is not necessarily a fault; the meta-goal of this project is for each author to gain experience with DS methods. Keep that in mind when you summarize: did the authors sufficiently explore the space of appropriate methods?<br />
* After the summary, comment on the following aspects of the report:<br />
** '''Background''': Comment on whether the report clearly explains the problem to be tackled, and whether it clearly describes how the substantive problem will be formulated as a data science problem.<br />
** '''Data''': Comment on whether you were able to clearly understand what data were available and how they were used in the analysis.<br />
** '''Analysis and Visualization''': Comment on the appropriateness of the DS methods used, and '''comment on the reproducibility of the results''' as described above. Comment on the evaluation measures use.<br />
** '''Future work''': Make some suggestions on how the work could be extended in the future.<br />
<br />
Depending on the project, these sections of the review may be longer or shorter. Use your judgement. Be sure to have at least a few interesting sentences under each heading.<br />
<br />
== Brainstorming ==<br />
<br />
A brainstorming session will consist of a 10-minute presentation by a student, followed by a class discussion for a total of 15 minutes. The presenter may choose to take questions during the talk, or save them until the end. The presentation should detail an applied problem, dataset, and potential DS methods that could be useful, much like the project proposal. The Brainstorming Session '''''may or may not''''' be on the student's project topic, but of course it may be advantageous to use your brainstorming slot to get feedback and ideas.<br />
<br />
* Guidelines<br />
** Presentations should use projected slides<br />
** Presentations should cover more or less the same topics as a project proposal: Description of Applied Problem, Description of Available Data, Plan for Analysis and Visualization<br />
** Presenters will receive a 5-minute warning, but presentations *will* be terminated at the 15-minute mark.<br />
<br />
* Evaluation (by instructor) is based on <br />
** Effective explanation of the problem<br />
** Effective explanation of the available data. It is often a good idea to show a specific example of a single "data item" from the available data, whatever that might mean for the specific project.<br />
** Effective explanation potential DS methods<br />
** Ability to answer questions about the data and the analysis and visualization plan<br />
** Working within the strict 10+5 minute timeslot<br />
<br />
In general, it is better to *show* your plan rather than tell it. Use actual examples from your dataset where possible. Show how feature vectors and any class labels/regression targets are constructed.</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=115Introduction to Data Science I2017-11-05T12:56:49Z<p>Dan Lizotte: /* Timeline (Tentative) */ Kerlin will co-present with Gagan</p>
<hr />
<div><br />
<br />
== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2017<br />'''<br />
<br />
'''From Dan:''' This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. <span style="color:#EE0000">Because of the volume of requests I receive, I am not able to manage a wait list. Students will have to monitor the registration website for available spots. However, all are welcome to sit in the room if there is space.</span>'''<br />
<!-- <span style="color:#EE0000">Therefore, '''all ''graduate'' students who are ''not'' in the MSc or PhD programme within the Department of Computer Science, and who are not in the MDA programme, must e-mail me a 1/2 page proposal sketch on the project they would like to pursue. (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 15 December 2016 and does not guarantee enrolment. Enrolment will be decided based on space available and quality of the proposal sketches.</span>''' --><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''hack''' - be willing to [http://en.wikipedia.org/wiki/Data_munging munge] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students.<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' <!-- in Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC320''']--><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 6 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 27 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 17 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 8 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 15 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
== Timeline (Tentative) ==<br />
<br />
* 7 Sep - Lectures: Welcome<br />
** 12 Sep - Lectures: Data Preparation, Introduction to Statistics<br />
* 14 Sep - Lectures: Introduction to Statistics<br />
** 19 Sep - Lectures: Supervised Learning<br />
* 21 Sep - Lectures: Supervised Learning, Performance Evaluation<br />
** 26 Sep - Lectures: Performance Evaluation, Model Selection<br />
* 28 Sep - Lectures: Classification<br />
** 3 Oct - Lectures: Classification, Performance Evaluation for Classification<br />
* 5 Oct - '''Pick Brainstorming Slot by 6 Oct 5pm''' - Lectures: Nonlinear Classification<br />
** ''10 Oct - '''Fall Reading Week''' ''<br />
* ''12 Oct - '''Fall Reading Week''' ''<br />
** 17 Oct - Lectures: <br />
* 19 Oct - Lectures: '''Guest Lecture by Amanda Holden''' of SAS. Topic TBA.<br />
** 24 Oct - Lectures: <br />
* 26 Oct - '''Project Proposal Due 27 Oct at 5pm''' - Lectures: '''Guest Lecture by Dr. Kemi Ola''' on Visualization<br />
** 31 Oct - Lectures: <br />
* 2 Nov - Lectures: Midterm Review/Q&A<br />
** 7 Nov - '''Midterm'''<br />
* 9 Nov - Brainstorming: Ethan Jackson, *Zaid Albirawi*, Sachi Elkerton<br />'''9637 Slots 3:30pm-4:30pm''': *slot*, *slot*, *slot*, *Nick DelBen*<br />
** 14 Nov - Brainstorming: Ashutosh Mishra, Brandon Glied-Goldstein, Jonathan Tan, Duff Jones, Patrick Carnahan, Nathan Phelps<br />
* 16 Nov - '''Project Draft Due 17 Nov at 5pm''' - Brainstorming: *slot1*, Gurpreet Singh, Erica Yarmol-Matusiak<br />'''9637 Slots 3:30pm-4:30pm''': Ruoxi Shi, Valeria Cesar, Mingda Sun, Xindi Wang<br />
** 21 Nov - Brainstorming: Cole Fisher, Angela Zhao, *Xiaoyu Yang*, Nanditha Rao, Felipe Urra, *TianzhiZhu*<br />
* 23 Nov - Brainstorming: Mahtab Ahmed, Jumayel Islam, Sabyasachi Patjoshi<br />'''9637 Slots 3:30pm-4:30pm''': Roopa Bose, *Hao Jiang*, *Abdelkareem Jaradat*, *Debanjan Guha Roy*<br />
** 28 Nov - Brainstorming: *Yancong Wang*, Mohammad, Yanbing Zhu, Yu Zhu, *Gagan Verma*, *Zeyu Wang*<br />
* 30 Nov - Brainstorming: *Marios-Stavros Grigoriou*, *Jiayi Ji*, *Paul Bartlett*<br />'''9637 Slots 3:30pm-4:30pm''': '''CANCELLED'''<br />
** 5 Dec - Brainstorming: *Sanjay Ghanathey*, *Jenna Le*, *Kun Xie*, *Nasim Samei*, *Jacob Hunte*, *Rifayat Samee*<br />
* 7 Dec - Brainstorming: *Nima khairdoodt*, *Sana Ahmadi*, *Mohsen shirpour*<br />'''9637 Slots 3:30pm-4:30pm''': *Hengyu Yue*, *Zhongwen Zhang*, *Yifang Liu*, *Andrew Bloch-Hansen*<br />
<br />
* '''Project Document Due Friday 8 December 5pm'''<br />
* '''Reviews (graduate students only) Due Thursday 15 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=114Lecture Materials2017-11-02T20:20:57Z<p>Dan Lizotte: /* Lecture Materials */ Fixed link to nonlinear models pdf</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
= Previous Offerings =<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Lecture_Materials&diff=111Lecture Materials2017-10-31T18:07:30Z<p>Dan Lizotte: Updated unsupervised learning lecture materials F2017</p>
<hr />
<div>= Lecture Materials =<br />
Materials from the most recent run of the course will be posted here. They will be updated as the term progresses.<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Performance Evaluation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/5_Performance%20Evaluation/performance_evaluation.pdf pdf]]<br />
* Model Selection [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/6_Model%20Selection/model_selection.pdf pdf]]<br />
* Classification [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/7_Classification/classification.pdf pdf]]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/8_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Nonlinear%20Models/nonlinear_models.pdf pdf] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4414_F17/Lectures/9_Unsupervised%20Learning/unsupervised-learning.pdf pdf] ]<br />
<br />
= Previous Offerings =<br />
<br />
== From W17 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/2_Data%20Preparation/data_preparation.pdf pdf] ]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/3_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.pdf pdf]]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/4_Supervised%20Learning/supervised_learning.pdf pdf]]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/6_Linear%20Models/linear_models.pdf pdf] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models.pdf pdf] ] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/7_Nonlinear%20Models/nonlinear_models_continuous.html html] ]<br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/8_Unsupervised%20Learning/unsupervised-learning_continuous.html html] ]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.html slides] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W17/Lectures/D_Performance%20Measures/performance_measures_continuous.html html] ]<br />
<br />
* Information Visualisation<br />
:* [https://www.youtube.com/watch?v=oJNY5eUbSQI Lecture] on what I would call "Principles of Information Visualisation"<br />
:* [https://public.tableau.com/en-us/s/gallery Inspiration] from the Tableau public gallery. (Recall Tableau is free for students.)<br />
<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
<br />
== From W16 ==<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/1_Welcome/welcome.pdf Welcome]<br />
* Data Preparation [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/2_Data%20Preparation/data_preparation.Rmd Rmd] ]<br />
* Google Flu Trends [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/Google%20Flu%20Trends.pdf pdf] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/3_Google%20Flu%20Trends/google_flu_trends.Rmd Rmd] ]<br />
:* Flu trends papers: On [https://owl.uwo.ca/ OWL]<br />
* (Re)introduction to Statistics [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/4_(Re)introduction%20to%20Statistics/reintroduction_to_statistics.Rmd Rmd] ]<br />
* Supervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/5_Supervised%20Learning/supervised_learning.Rmd Rmd] ]<br />
* Linear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/6_Linear%20Models/linear_models.Rmd Rmd] ]<br />
* Nonlinear Models [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/7_Nonlinear%20Models/nonlinear_models.Rmd Rmd] <br />
* Unsupervised Learning [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/8_Unsupervised%20Learning/unsupervised-learning.Rmd Rmd] ]<br />
* Visual Analytics '''Guest Lecture''' by Arman Didandeh [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/A_Visual%20Analytics/InfoViz4DataScience.pdf pdf]]<br />
* MapReduce '''Guest Lecture''' by Hanan Lutfiyya [[http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/B_MapReduce/mapReduce.pdf pdf]]<br />
* Performance Measures and Class Imbalance [ [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.html html] | [http://www.csd.uwo.ca/~dlizotte/teaching/cs4437_W16/Lectures/D_Performance%20Measures/performance_measures.Rmd Rmd] ]<br />
* Feature Selection and Construction '''Video Lectures''' by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
<br />
= Tutorials and Summaries = <br />
<br />
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
* [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
<br />
= Other Resources =<br />
<br />
* [http://cs229.stanford.edu/materials.html Materials from Stanford's ML class] by Andrew Ng. Excellent notes.<br />
<br />
* [http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Classic tutorial on HMMs by Rabiner]<br />
<br />
* <span id="colinbib">Bibliography</span>/suggested reading from Colin Cherry's lecture:<br />
**Structured Perceptron<br />
***Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. EMNLP 2002. [http://www.aclweb.org/anthology-new/W/W02/W02-1001.pdf]<br />
**Some applications:<br />
***Scott Miller; Jethran Guinness; Alex Zamanian. Name Tagging with Word Clusters and Discriminative Training. NAACL 2004. [http://www.aclweb.org/anthology/N/N04/N04-1043.pdf]<br />
***Robert C. Moore. A Discriminative Framework for Bilingual Word Alignment. EMNLP 2005. [http://www.aclweb.org/anthology-new/H/H05/H05-1011.pdf]<br />
**Passive Aggressive Algorithm and MIRA:<br />
***Koby Crammer and Yoram Singer. Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research 2003. [http://www.ai.mit.edu/projects/jmlr/papers/v3/crammer03a.html]<br />
***Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 2006. [http://jmlr.csail.mit.edu/papers/v7/crammer06a.html]<br />
**Applications (of MIRA):<br />
***Ryan McDonald; Koby Crammer; Fernando Pereira Online Large-Margin Training of Dependency Parsers. ACL 2005. [http://www.aclweb.org/anthology/P/P05/P05-1012.pdf]<br />
***Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak. Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion. ACL 2008. [http://www.aclweb.org/anthology/P/P08/P08-1103.pdf]<br />
**Pegasos<br />
***Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. ICML 2007. [http://www.cs.huji.ac.il/~shais/papers/ShalevSiSr07.pdf]<br />
**Structured SVM:<br />
***I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. ICML 2004. [http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf]<br />
***B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. Neural Information Processing Systems Conference [http://www.seas.upenn.edu/~taskar/pubs/mmmn.pdf]<br />
<br />
== Previous Incarnations of This Course: CS886 at the University of Waterloo ==<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/02-1-logreg-nb-svm.pdf Lecture 3,4,5,6] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-knn.pdf Lecture 7] - k-NN and related methods<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/TUT-trees.pdf Lecture 8] - Decision Trees, Documents<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/f14/Docs-Images-Clustering-Dimred.pdf Lecture 9] - Documents, Images, Clustering, Dimensionality Reduction<br />
* Watch-On-Your-Own - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 10] - Introduction to HMMs - Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/doucette-guest-lecture.pdf Lecture 11] - Machine Learning Words of Wisdom - John Doucette<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/WaterlooTalk_Oct17_14_Online.pdf Lecture 12] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
<br />
=== S13 ===<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-1-intro.pdf Lecture 1] - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/01-2-intro.pdf Lecture 2] - Model Selection, Empirical Evaluation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-1-logreg-nb-svm.pdf Lecture 3,4,5] - Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/02-3-LearningTheory.pdf Lecture 6] - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/07-documents-and-images.pdf Lecture 7] - Documents and Images<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/08-clustering.pdf Lecture 8] - Clustering<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/09-timeseries-and-dimensionality-reduction.pdf Lecture 9] - Sound Features, Dimensionality Reduction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/WaterlooTalk_Jun06_13_Online.pdf Lecture 10] - Scaling Up with Online Learning - Dr. Colin Cherry<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/DataMiningCS886.pdf Lecture 11] - Data Mining - Luiza Antonie<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 12] - Introduction to HMMs - Michelle Karg<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-trees.pdf Short Lecture 1] - Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/s13/TUT-knn.pdf Short Lecture 2] - K-Nearest-Neighbours<br />
<br />
=== EarlierTerms ===<br />
<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-1-intro.pdf Lecture 1] - (F12) - Intro, Regression<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/01-2-intro.pdf Lecture 2] - (F12) - Overfitting, Performance Evaluation, Cross-Validation<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-1-logreg-nb-svm.pdf Lecture 3,4] - (F12) - More Classification: Logistic Regression, Naive Bayes, SVMs<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-2-knn-trees.pdf Lecture 5,6] - (F12) - Non-linear Classifiers: Knn, Decision Trees<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/02-3-LearningTheory.pdf Lecture 6] - (F12) - Learning Theory Light<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/04-image-features-and-clustering.pdf Lecture 7] - (F12) - Image Features, Clustering<br />
** [http://www.ifp.illinois.edu/~jyang29/papers/CVPR09-ScSPM.pdf Paper] on SIFTs + VQ (or Sparse Coding) for classification<br />
** [http://www.vlfeat.org/~vedaldi/code/sift.html Open-Source SIFT (and other) software]<br />
** [http://ufldl.stanford.edu/eccv10-tutorial/ ECCV Tutorial] on Feature Learning for Image Classification. Kai Yu and Andrew Ng<br />
* Lecture 8 - Lectures on feature selection and construction by Isabelle Guyon of [http://www.clopinet.com/ ClopiNet]<br />
** [http://videolectures.net/bootcamp07_guyon_ifs/ Isabelle Guyon] on Feature Selection ([http://videolectures.net/mmdss07_guyon_fsf/ longer version])<br />
** [http://videolectures.net/bootcamp07_guyon_fcon/ Isabelle Guyon] on Feature Construction (starts at 1:00:00)<br />
** [http://clopinet.com/isabelle/Projects/ETH/ Course] on feature selection/construction<br />
** [http://jmlr.csail.mit.edu/papers/special/feature03.html Special issue on features] in JMLR<br />
** [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf paper] by Guyon et al. on feature selection/construction<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/05-timeseries-and-dimensionality-reduction.pdf Lecture 9] - (F12) - Audio Features, Dimensionality Reduction (PCA)<br />
**[http://videolectures.net/mcvc08_frank_fea/ Feature extraction from audio and their application in music organization and transient enhancement in recorded music]<br />
**[http://videolectures.net/mcvc08_kohler_acs/ Audio Content Search]<br />
**Related [http://ismir2003.ismir.net/papers/McKinney.PDF paper]: Martin F. McKinney and Jeroen Breebaart. Features for Audio and Music Classification.<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/wagstaff-demud.pptx Lecture 10] by Dr. Kiri Wagstaff<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf Lecture 11] by Dr. Michelle Karg<br />
* [http://www.csd.uwo.ca/~dlizotte/teaching/cs886_slides/colin/WaterlooTalk_Oct18_12_Online.pdf Lecture 12] by Dr. [http://sites.google.com/site/colinacherry/ Colin Cherry] - (F12) - See also the [[#colinbib|bibliography]]</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=103Introduction to Data Science I2017-10-17T16:03:07Z<p>Dan Lizotte: /* Materials */</p>
<hr />
<div><br />
<br />
== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2017<br />'''<br />
<br />
'''From Dan:''' This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. <span style="color:#EE0000">Because of the volume of requests I receive, I am not able to manage a wait list. Students will have to monitor the registration website for available spots. However, all are welcome to sit in the room if there is space.</span>'''<br />
<!-- <span style="color:#EE0000">Therefore, '''all ''graduate'' students who are ''not'' in the MSc or PhD programme within the Department of Computer Science, and who are not in the MDA programme, must e-mail me a 1/2 page proposal sketch on the project they would like to pursue. (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 15 December 2016 and does not guarantee enrolment. Enrolment will be decided based on space available and quality of the proposal sketches.</span>''' --><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''hack''' - be willing to [http://en.wikipedia.org/wiki/Data_munging munge] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students.<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' <!-- in Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC320''']--><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 6 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 27 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 17 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 8 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 15 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
== Timeline (Tentative) ==<br />
<br />
* 7 Sep - Lectures: Welcome<br />
** 12 Sep - Lectures: Data Preparation, Introduction to Statistics<br />
* 14 Sep - Lectures: Introduction to Statistics<br />
** 19 Sep - Lectures: Supervised Learning<br />
* 21 Sep - Lectures: Supervised Learning, Performance Evaluation<br />
** 26 Sep - Lectures: Performance Evaluation, Model Selection<br />
* 28 Sep - Lectures: Classification<br />
** 3 Oct - Lectures: Classification, Performance Evaluation for Classification<br />
* 5 Oct - '''Pick Brainstorming Slot by 6 Oct 5pm''' - Lectures: Nonlinear Classification<br />
** ''10 Oct - '''Fall Reading Week''' ''<br />
* ''12 Oct - '''Fall Reading Week''' ''<br />
** 17 Oct - Lectures: <br />
* 19 Oct - Lectures: '''Guest Lecture by Amanda Holden''' of SAS. Topic TBA.<br />
** 24 Oct - Lectures: <br />
* 26 Oct - '''Project Proposal Due 27 Oct at 5pm''' - Lectures: '''Guest Lecture by Dr. Kemi Ola''' on Visualization<br />
** 31 Oct - Lectures: <br />
* 2 Nov - Lectures: Midterm Review/Q&A<br />
** 7 Nov - '''Midterm'''<br />
* 9 Nov - Brainstorming: Ethan Jackson, *Zaid Albirawi*, Sachi Elkerton<br />'''9637 Slots 3:30pm-4:30pm''': *Roopa Bose*, *Jenna Le*, *Sanjay Ghanathey*, *slot7*<br />
** 14 Nov - Brainstorming: Ashutosh Mishra, Brandon Glied-Goldstein, Jonathan Tan, Duff Jones, Patrick Carnahan, Nathan Phelps<br />
* 16 Nov - '''Project Draft Due 17 Nov at 5pm''' - Brainstorming: Kerlin Lobo, Gurpreet Singh, *slot3*<br />'''9637 Slots 3:30pm-4:30pm''': Ruoxi Shi, Valeria Cesar, Mingda Sun, Xindi Wang<br />
** 21 Nov - Brainstorming: Cole Fisher, Angela Zhao, *Xiaoyu Yang*, Nanditha Rao, Felipe Urra, *TianzhiZhu*<br />
* 23 Nov - Brainstorming: Mahtab Ahmed, Jumayel Islam, Sabyasachi Patjoshi<br />'''9637 Slots 3:30pm-4:30pm''': *Rifayat Samee*, *Hao Jiang*, *Abdelkareem Jaradat*, *Debanjan Guha Roy*<br />
** 28 Nov - Brainstorming: *Yancong Wang*, Mohammad, Yanbing Zhu, Yu Zhu, *Gagan Verma*, *Zeyu Wang*<br />
* 30 Nov - Brainstorming: *Marios-Stavros Grigoriou*, *Jiayi Ji*, *Paul Bartlett*<br />'''9637 Slots 3:30pm-4:30pm''': '''CANCELLED'''<br />
** 5 Dec - Brainstorming: *slot1*, *slot2*, *Kun Xie*, *Nasim Samei*, *Jacob Hunte*, *slot6*<br />
* 7 Dec - Brainstorming: *Nima khairdoodt*, *Sana Ahmadi*, *Mohsen shirpour*<br />'''9637 Slots 3:30pm-4:30pm''': *Hengyu Yue*, *Zhongwen Zhang*, *Yifang Liu*, *Andrew Bloch-Hansen*<br />
<br />
* '''Project Document Due Friday 8 December 5pm'''<br />
* '''Reviews (graduate students only) Due Thursday 15 December 5pm'''</div>Dan Lizottehttps://www.csd.uwo.ca/~dlizotte/teaching/IDS/index.php?title=Introduction_to_Data_Science_I&diff=102Introduction to Data Science I2017-10-17T16:01:53Z<p>Dan Lizotte: /* Materials */ Added deep learning resources</p>
<hr />
<div><br />
<br />
== Course outline for COMPSCI 4414A/9637A/9114A ==<br />
'''The University of Western Ontario<br />'''<br />
'''London, Ontario, Canada<br />'''<br />
'''Department of Computer Science<br />'''<br />
'''Course Outline - Fall (September - December) 2017<br />'''<br />
<br />
'''From Dan:''' This is a very high-demand course that interests students in various programs across campus. I think this is great because the diversity of backgrounds assembled in the class makes for a better learning experience for all. (Myself included!) However, space is limited. <span style="color:#EE0000">Because of the volume of requests I receive, I am not able to manage a wait list. Students will have to monitor the registration website for available spots. However, all are welcome to sit in the room if there is space.</span>'''<br />
<!-- <span style="color:#EE0000">Therefore, '''all ''graduate'' students who are ''not'' in the MSc or PhD programme within the Department of Computer Science, and who are not in the MDA programme, must e-mail me a 1/2 page proposal sketch on the project they would like to pursue. (See the Proposal Guidelines for the general idea.) This must be submitted by 5pm on 15 December 2016 and does not guarantee enrolment. Enrolment will be decided based on space available and quality of the proposal sketches.</span>''' --><br />
<br />
=== Objective ===<br />
<br />
The objective of this course is to introduce students to data science (DS) techniques, with a focus on application to substantive (i.e. "applied") problems. Students will gain experience in identifying which problems can be tackled by DS methods, and learn to identify which speciﬁc DS methods are applicable to a problem at hand. During the course, students will gain an in-depth understanding of a particular (substantive problem, DS solution) pair, and present their ﬁndings to their peers in the class. '''Although this course does not assume prior machine learning or visualization knowledge, it does require students to show substantial initiative in investigating methods that are applicable for their project. The [[Lecture Materials|lectures]] give an overview of important methods, but the lecture content alone is not sufficient to produce a high quality course project.'''<br />
<br />
This course is designed for students who:<br />
<br />
* Like to '''read''' - have a desire to understand substantive problems<br />
* Like to '''think''' - make connections between methods and problems<br />
* Like to '''hack''' - be willing to [http://en.wikipedia.org/wiki/Data_munging munge] data into usability<br />
* Like to '''speak''' - teach us about what you found<br />
<br />
=== Prerequisites ===<br />
<br />
At least one undergraduate programming course (e.g. CS2035) and at least one statistics course (e.g. STAT1024.) This course entails a significant amount of self-directed learning and is directed toward fourth-year undergraduate and graduate students.<br />
<br />
=== Logistics ===<br />
* '''Instructor''': Dan Lizotte – dlizotte at uwo dot ca – Office MC363<br />
* '''Teaching Assistant''': Brent Davis - bdavis56 at uwo dot ca - Runs Q/C Hour (see below)<br />
* '''Time''': Tuesday from 2:30PM – 4:30PM, and on Thursday from 2:30PM – 3:30PM<br />
* '''Place''': Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC-105B''']<br />
* '''Question and Collaboration Hour:''' Tuesday from 4:30pm - 5:30pm '''Location MC 320''' <!-- in Middlesex College [http://accessibility.uwo.ca/doc/floorplan/bf-mc.pdf '''MC320''']--><br />
* '''Communication''': We will be using [https://owl.uwo.ca OWL] for electronic communication.<br />
<br />
===Important Dates===<br />
* Pick Brainstorming Slot by Friday, 6 Oct at 5pm <!-- End of 4th Week --><br />
* Project Proposal Due Friday, 27 Oct at 5pm <!-- End of 7th Week --><br />
* Project Draft Due Friday, 17 Nov at 5pm <!-- End of 11th Week --><br />
* Project Report Due Friday, 8 Dec at 5pm <!-- Last Day of Class --><br />
* Paper Reviews Due Friday, 15 Dec at 5pm <!-- Week after Last Day of Class --><br />
<br />
Register for a wiki account. You will need to use the wiki to let us all know about data sources you find, indicate which dataset you are using, and slot yourself in for brainstorming. Also, everyone should free to make improvements to any part of the wiki. (E.g. if you find some useful software or other resources.)<br />
<br />
Slot yourself in for a brainstorming session in the Timeline portion at the bottom of this page before end of '''Friday, 6 Oct at 5pm''' or Dan will pick a slot for you.<br />
<br />
=== Materials ===<br />
* '''Required Texts'''<br />
:* '''JWHT''': James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). ''An introduction to statistical learning with applications in R.'' New York: Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-7138-7 Western]]<br />
:* '''HTF''': ''The Elements of Statistical Learning'' by Hastie, Tibshirani and Friedman. Expanded version of required text. ['''Free''' [http://web.stanford.edu/~hastie/ElemStatLearn/ online]]<br />
:* '''LW''': Leland Wilkinson's ''The Grammar of Graphics'' (2005). ['''Free''' from [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/book/10.1007/0-387-28695-0 Springer]]<br />
:* ggplot2 book by creator Hadley Wickham (2016). ['''Free''' through [https://alpha.lib.uwo.ca/record=b6962637~S20 Western]]<br />
* '''Review''' if you need to catch up:<br />
:* [https://onlinecourses.science.psu.edu/statprogram/calculus_review Calculus Review] from Penn State University. Includes basic mathematical notation.<br />
:* [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/linalg-review.pdf linear algebra review] - up to and including Section 3.7 - The Inverse<br />
:* [http://www.stat.cmu.edu/~larry/all-of-statistics/ Larry Wasserman's] ''All of Statistics.'' ['''Free''' from [http://link.springer.com/book/10.1007/978-0-387-21736-9 Springer]]<br />
:* Devore, J. L., & Berk, K. N. (2007). ''Modern mathematical statistics with applications.'' 2nd ed. Springer. ['''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://link.springer.com/978-1-4614-0391-3 Western]]<br />
* '''Other Resources'''<br />
:* The [[Data and Software]] Page<br />
:* Cheat Sheets<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf ggplot2] cheat sheet<br />
:** [https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf Data Wrangling] cheat sheet<br />
:* Texts<br />
:** Phil Spector. (2008). ''Data Manipulation with R'' New York: Springer. [ '''Free''' through [https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://www.springer.com/us/book/9780387747309 Western] ]<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/Materials/prob-review.pdf probability review] from Stanford University by way of Doina Precup.<br />
:** [http://www.cs.mcgill.ca/~dprecup/courses/ML/resources.html List of resources] from COMP-652 at McGill (courtesy Doina Precup)<br />
:** C. M. Bishop, Pattern Recognition and Machine Learning (2006)<br />
:** R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (1998)<br />
:** Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.<br />
:** David J. C. MacKay, "Information Theory, Inference and Learning Algorithms", Cambridge University Press, 2003.<br />
:** Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001.<br />
:* Other Links<br />
:** [https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception Data Visualization for Human Perception]<br />
:** [http://datadrivenjournalism.net/news_and_analysis/is_data_journalism_for_everyone Data Journalism]<br />
:* Software<br />
:** The dplyr package [https://cran.r-project.org/web/packages/dplyr/ documentation]. The "vignettes" are particularly good.<br />
:** The Tensorflow Library (Python, C++) [https://www.tensorflow.org/]<br />
:* Deep Learning Resources (courtesy Ethan Jackson)<br />
:** Tutorials on Word2Vec in Python. Learns semantic relationships between words in very large corpora by mapping each word to a high-dimensional word embedding. Semantic relationships are estimated using contextual frequency, i.e. how often a word appears given a context of other words. I can give you more details about the training algorithms if you like.<br />
:***https://radimrehurek.com/gensim/models/word2vec.html<br />
:***https://rare-technologies.com/word2vec-tutorial/<br />
:**Some ideas about using t-SNE for visualization<br />
:***https://www.jeffreythompson.org/blog/2017/02/13/using-word2vec-and-tsne/<br />
:**Digit classification on MNIST dataset using TensorFlow<br />
:***https://www.tensorflow.org/get_started/mnist/beginners<br />
:**Autoencoders for MNIST in Keras (a very high level interface for deep learning libraries including TensorFlow)<br />
:***https://blog.keras.io/building-autoencoders-in-keras.html<br />
:**Convolutional neural networks for image recognition on CIFAR-10 dataset in TensorFlow. Great starting point for image classification using deep learning.<br />
:*** https://www.tensorflow.org/tutorials/deep_cnn<br />
<br />
=== Topics (anticipated) ===<br />
* '''Introduction to Data Science'''<br />
** Definitions<br />
** Components<br />
** Relationships to Other Fields<br />
<br />
* '''Data Munging'''<br />
** Working with structured data: selecting, filtering, joining, aggregating<br />
** Web scraping<br />
** Simple visualizations<br />
** Sanity checking<br />
<br />
* '''(Re)-introduction to Statistics'''<br />
** Data Summaries<br />
** Randomness, Sample Spaces and Events, Probability<br />
** Random Variables, CDF, PMF, PDF<br />
** Expectation<br />
** Estimation<br />
** Sampling Distributions: Law of Large Numbers, Central Limit Theorem, The Bootstrap<br />
** Inference: Hypothesis testing, P-values, Confidence Intervals<br />
** Multivariate Statistics: conditional probability, correlation, independence<br />
<br />
* '''Supervised Machine Learning, Predictive Models'''<br />
** Supervised Learning<br />
*** Regression<br />
*** Classification<br />
** Reinforcement Learning and Sequential Decision Making<br />
<br />
* '''Evaluation'''<br />
** Variance: Test set, cross-validation, bootstrap<br />
** Bias: Confounding, causal inference<br />
<br />
* '''Unsupervised Machine Learning, Representations, and Feature Construction'''<br />
** Clustering<br />
** Dimensionality reduction<br />
** Domain-specific Feature Development<br />
*** Images<br />
*** Sounds<br />
*** Text<br />
<br />
* '''Visualization'''<br />
** Topics to be determined<br />
<br />
=== Evaluation ===<br />
<br />
There will be a midterm test but no final exam. Each student will lead a brainstorming session, produce a proposal, draft, and report for a course project. '''Graduate students (9637)''' will additionally submit peer reviews of other class projects. For detailed requirements, see [[Project Guidelines]].<br />
<br />
Scholastic offences are taken seriously and students are directed to read the appropriate policy, specifically, the definition of what constitutes a Scholastic Offence, at this website: [http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf].<br />
<br />
==== Daily Quizzes – 5% ====<br />
<br />
Starting on the second lecture, there will be a very short quiz at the beginning of class covering the previous day's materials. The final quiz will be on 31 Oct. The lowest quiz mark will be dropped. '''Quiz marks will only be excused for medical reasons.'''<br />
<br />
==== Midterm - 35% ====<br />
<br />
Assessing competencies from the fundamentals taught in the first half of the class.<br />
<br />
==== Brainstorming Session – 5% ====<br />
<br />
Each student will prepare a [[Project Guidelines#Brainstorming|presentation]] explaining an applied problem, as well as some potential data science methods that could be applied to the problem. The presentation should be '''no more than 10 minutes'''. We will then discuss the problem as a class, along with possible approaches for solving the problem using data science methods. '''The student is expected to be prepared to answer deep questions about the nature of their problem to ensure that they receive high quality feedback''' from the brainstorming session.<br />
<br />
==== Project Proposal – '''4414:''' 15% '''9637:''' 10% ====<br />
<br />
Document detailing the plan for the project. See [[Project Guidelines]] for detailed requirements.<br />
<br />
==== Report Draft – 5% ====<br />
<br />
A [[Project Guidelines#Report Draft|draft]] of the final report will be due approximately midway through the term. The purpose of the draft is to allow the instructor to provide feedback on the quality of the writing and the direction of the project.<br />
<br />
==== Project Report – 35% ====<br />
<br />
Each student will prepare a [[Project Guidelines|research paper]] detailing a substantive problem, the data available, the applicable data science methods, and empirical results obtained on the problem.<br />
<br />
==== Peer Review – '''9637 only:''' 5% ====<br />
<br />
Each '''graduate''' student will prepare two [[Project Guidelines#Report Submission and Reviewing|reviews]] of their classmates' work.<br />
<br />
==== Participation and Effort ====<br />
<br />
Success of the course as a useful learning experience hinges on active participation and effort of the students. '''Students are expected to attend all classes''' and are expected to '''actively participate in the brainstorming sessions'''.<br />
<br />
=== Accessibility and Support Available at Western ===<br />
Please contact the course instructor if you require lecture or printed material in an alternate format or if any other arrangements can make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 ext. 82147 if you have questions regarding accommodation.<br />
Support Services<br />
Learning-skills counsellors at the Student Development Centre (http://www.sdc.uwo.ca) are ready to help you improve your learning skills. They offer presentations on strategies for improving time management, multiple-choice exam preparation/writing, textbook reading, and more. Individual support is offered throughout the Fall/Winter terms in the drop-in Learning Help Centre, and year-round through individual counselling.<br />
Students who are in emotional/mental distress should refer to Mental Health@Western (http://www.health.uwo.ca/mental_health) for a complete list of options about how to obtain help.<br />
Additional student-run support services are offered by the USC, http://westernusc.ca/services.<br />
The website for Registrarial Services is http://www.registrar.uwo.ca.<br />
<br />
=== Missed Course Components ===<br />
If you are unable to meet a course requirement due to illness or other serious circumstances, you must provide valid medical or supporting documentation to the Academic Counselling Office of your home faculty as soon as possible. <br />
If you are a Science student, the Academic Counselling Office of the Faculty of Science is located in WSC 140, and can be contacted at 519-661-3040 or scibmsac@uwo.ca. Their website is http://www.uwo.ca/sci/undergrad/academic_counselling/index.html.<br />
A student requiring academic accommodation due to illness must use the Student Medical Certificate (https://studentservices.uwo.ca/secure/medical_document.pdf) when visiting an<br />
off-campus medical facility.<br />
For further information, please consult the university’s medical illness policy at http://www.uwo.ca/univsec/pdf/academic_policies/appeals/accommodation_medical.pdf.<br />
<br />
== Timeline (Tentative) ==<br />
<br />
* 7 Sep - Lectures: Welcome<br />
** 12 Sep - Lectures: Data Preparation, Introduction to Statistics<br />
* 14 Sep - Lectures: Introduction to Statistics<br />
** 19 Sep - Lectures: Supervised Learning<br />
* 21 Sep - Lectures: Supervised Learning, Performance Evaluation<br />
** 26 Sep - Lectures: Performance Evaluation, Model Selection<br />
* 28 Sep - Lectures: Classification<br />
** 3 Oct - Lectures: Classification, Performance Evaluation for Classification<br />
* 5 Oct - '''Pick Brainstorming Slot by 6 Oct 5pm''' - Lectures: Nonlinear Classification<br />
** ''10 Oct - '''Fall Reading Week''' ''<br />
* ''12 Oct - '''Fall Reading Week''' ''<br />
** 17 Oct - Lectures: <br />
* 19 Oct - Lectures: '''Guest Lecture by Amanda Holden''' of SAS. Topic TBA.<br />
** 24 Oct - Lectures: <br />
* 26 Oct - '''Project Proposal Due 27 Oct at 5pm''' - Lectures: '''Guest Lecture by Dr. Kemi Ola''' on Visualization<br />
** 31 Oct - Lectures: <br />
* 2 Nov - Lectures: Midterm Review/Q&A<br />
** 7 Nov - '''Midterm'''<br />
* 9 Nov - Brainstorming: Ethan Jackson, *Zaid Albirawi*, Sachi Elkerton<br />'''9637 Slots 3:30pm-4:30pm''': *Roopa Bose*, *Jenna Le*, *Sanjay Ghanathey*, *slot7*<br />
** 14 Nov - Brainstorming: Ashutosh Mishra, Brandon Glied-Goldstein, Jonathan Tan, Duff Jones, Patrick Carnahan, Nathan Phelps<br />
* 16 Nov - '''Project Draft Due 17 Nov at 5pm''' - Brainstorming: Kerlin Lobo, Gurpreet Singh, *slot3*<br />'''9637 Slots 3:30pm-4:30pm''': Ruoxi Shi, Valeria Cesar, Mingda Sun, Xindi Wang<br />
** 21 Nov - Brainstorming: Cole Fisher, Angela Zhao, *Xiaoyu Yang*, Nanditha Rao, Felipe Urra, *TianzhiZhu*<br />
* 23 Nov - Brainstorming: Mahtab Ahmed, Jumayel Islam, Sabyasachi Patjoshi<br />'''9637 Slots 3:30pm-4:30pm''': *Rifayat Samee*, *Hao Jiang*, *Abdelkareem Jaradat*, *Debanjan Guha Roy*<br />
** 28 Nov - Brainstorming: *Yancong Wang*, Mohammad, Yanbing Zhu, Yu Zhu, *Gagan Verma*, *Zeyu Wang*<br />
* 30 Nov - Brainstorming: *Marios-Stavros Grigoriou*, *Jiayi Ji*, *Paul Bartlett*<br />'''9637 Slots 3:30pm-4:30pm''': '''CANCELLED'''<br />
** 5 Dec - Brainstorming: *slot1*, *slot2*, *Kun Xie*, *Nasim Samei*, *Jacob Hunte*, *slot6*<br />
* 7 Dec - Brainstorming: *Nima khairdoodt*, *Sana Ahmadi*, *Mohsen shirpour*<br />'''9637 Slots 3:30pm-4:30pm''': *Hengyu Yue*, *Zhongwen Zhang*, *Yifang Liu*, *Andrew Bloch-Hansen*<br />
<br />
* '''Project Document Due Friday 8 December 5pm'''<br />
* '''Reviews (graduate students only) Due Thursday 15 December 5pm'''</div>Dan Lizotte