CSC 380: Principle of Data Science
Overview
This course introduces students to principles of data science that are necessary for computer scientists to make effective decisions in their professional careers. A number of computer science sub-disciplines now rely on data collection and analysis. For example, computer systems are now complicated enough that comparing the execution performance of two different programs becomes a statistical estimation problem rather than a deterministic computation. This course teaches students the basic principles of how to properly collect and process data sources in order to derive appropriate conclusions from them. The course has main components of: basic probability, basic statistics and data wrangling, and basic data analysis using programming libraries.
Logistics info
Time and venue: Tuesday and Thursday 5:00-6:15pm at ILC 130
- Syllabus
- Gradescope
- D2L course webpage
- Piazza link (access code:
wildcats
) - Lecture participation self-report form
We will be using Piazza to make important announcements and do Q&As. Some general rules:
- If you have technical questions, try posing your questions as general as possible, to promote discussions among the class.
- If you have private questions, generally please make a private Piazza post instead of sending an email - This will help facilitate our processings of your requests significantly.
Course staff
- Instructor: Xinchen Yu
- Teaching assistants: Thang Nhat Duong, Tian Tan, Haris Riaz
Office hours:
- Xinchen Yu: Thursday
12:00pm-2:00pm
, Gould-Simpson 829.- You are welcome to drop in at any time during this period. However, students in this course will be given priority from
1:00pm-2:00pm
. If you arrive outside of this priority window, please understand that wait times may vary depending on whether students from another course are being helped.
- You are welcome to drop in at any time during this period. However, students in this course will be given priority from
Textbook
There is no single designated textbook for this course. Much of the course materials and assigned readings will be based on the following books:
WJ: Watkins, J., “An Introduction to the Science of Statistics: From Theory to Implementation”
WL: Wasserman, L. “All of Statistics: A Concise Course in Statistical Inference.” Springer, 2004 (accessible online via UA library)
Sam Lau, Joey Gonzalez, Deb Nolan, “Learning Data Science”, O’Reilly, 2023
Steven S. Skiena, ”The Data Science Design Manual”, Springer, 2017
Other useful resources
You should have no difficulty in Python programming.
Notes for probability review and linear algebra review from Stanford’s CS 229 course.
The matrix cookbook, The Probability and Statistics Cookbook, and Calculus cheatsheet (recommended by Prof. Kwang-Sung Jun).
You may find using LaTeX helpful in writing homeworks or reports. Some useful LaTeX resources: Learn LaTeX in 30 minutes by Overleaf; Introduction to LATEX by MIT Research Science Institute