CSC 380: Principle of Data Science
Overview
This course introduces students to principles of data science that are necessary for computer scientists to make effective decisions in their professional careers. A number of computer science sub-disciplines now rely on data collection and analysis. For example, computer systems are now complicated enough that comparing the execution performance of two different programs becomes a statistical estimation problem rather than a deterministic computation. This course teaches students the basic principles of how to properly collect and process data sources in order to derive appropriate conclusions from them. The course has three main components: data analysis, machine learning, and a project where students apply the concepts discussed in class to a substantial open-ended problem.
Logistics info
Time and venue: Tuesday and Thursday 5:00-6:15pm at ILC 130
- Syllabus
- Gradescope
- D2L course webpage
- Piazza link (access code:
wildcats
)
We will be using Piazza to make important announcements and do Q&As. Some general rules:
- If you have technical questions, try posing your questions as general as possible, to promote discussions among the class.
- If you have private questions, generally please make a private Piazza post instead of sending an email - This will help facilitate our processings of your requests significantly.
Course staff
- Instructor: Xinchen Yu
- Teaching assistants:
Office hours:
- Xinchen Yu: Tuesday 2:00pm-3:00pm, Gould-Simpson 854
Textbook
There is no single designated textbook for this course. Much of the course materials and assigned readings will be based on the following books:
WJ: Watkins, J., “An Introduction to the Science of Statistics: From Theory to Implementation”
MK: Murphy, K. “Machine Learning: A Probabilistic Perspective.” MIT press, 2012 (accessible online via UA library)
WL: Wasserman, L. “All of Statistics: A Concise Course in Statistical Inference.” Springer, 2004 (accessible online via UA library)
Other useful resources
You should have no difficulty in Python programming.
Notes for probability review and linear algebra review from Stanford’s CS 229 course.
The matrix cookbook, The Probability and Statistics Cookbook, and Calculus cheatsheet (recommended by Prof. Kwang-Sung Jun).
You may find using LaTeX helpful in writing homeworks or reports. Some useful LaTeX resources: Learn LaTeX in 30 minutes by Overleaf; Introduction to LATEX by MIT Research Science Institute