ECE 204: Data Science & Engineering

We live in the age of data, where algorithms make decisions that affect our day-to-day lives. As we embrace these new tools, it is crucial to understand how they work to ensure that everyone is treated in an equitable and unbiased manner. ECE 204 (formerly ECE 379) is a hands-on introduction to Data Science using the Python programming language. The course is intended for Freshmen and Sophomores of any major who have limited prior experience in computer programming or data science. The course teaches how to think about data-centric problems in a computational way. Given data from real-world phenomena, students will learn to describe, analyze, and make predictions. To this end, the course will also introduce programming in Python, which is the most widely used programming language in the data science industry. Topics covered include: how to import, manipulate, summarize, and visualize data of various types, how to perform descriptive analyses such as clustering and principal component analysis, how to perform predictive analyses such as classification and regression, and notions of bias, fairness, and ethics in data science.

Note: ECE 204 is a required course for the new named option in Machine Learning and Data Science.

Prerequisites: Students should have a solid foundation in college algebra demonstrated through MATH 112, 114, 171 or math placement. We will provide you with all of the other tools you need and teach you how to use them. Most importantly, we will equip you with the knowledge and ability to continue using what you’ve learned long after you complete the class and for the rest of your career as a student and beyond.

What degree requirements does this course satisfy? The information below is accurate as of April 2021. Please check with your academic advisor for the most up-to-date information as degree requirements change over time. For engineering students needing to meet progression requirements, ECE 204 will count toward your core GPA.

  • ECE: ECE 204 counts as a Professional Elective. 
  • ISyE: ECE 204 counts as an Engineering Science Elective and a Stats Elective.
  • CEE: ECE 204 counts as an Engineering Elective outside of Civil Engineering.

Note: you should always check your DARS after registering for classes to make sure they are counting toward your degree as you expected.

Lectures:

  • Fall and Spring: Tue/Thu, 9:30am–10:45am

Instructors:

  • Eduardo R. Arvelo (Spring 20, Fall 20, Spring 21, Summer 21)
  • Laurent Lessard (Spring 19, Fall 19)
  • Kangwook Lee (Fall 21)
  • Matt Malloy (Spring 22)

Learning outcomes:

In other words: what are the skills you will acquire upon completing this class?

  1.   Write working code in Python to import, manipulate, analyze, visualize, and otherwise interact with datasets of various types. If you don’t know what “writing code” even means, you’ll learn that too!
  2.   Perform descriptive analyses to extract, summarize, and interpret salient features from datasets.
  3.   Perform predictive analyses to model trends and make predictions from datasets.
  4.   Apply techniques to identify and clean data that contains missing entries, outliers, or other forms of noise or uncertainty.
  5.   Recognize and evaluate potential issues pertaining to bias, fairness, privacy, and ethics in applying data science techniques. Also understand the limits of what data can do.

Evaluation:

A combination of in-class activities, homework assignments, midterm exams, and a final exam. These will largely be hands-on activities where you will complete tasks on your computer and submit your answers electronically.

Materials required:

The only thing you will need is a laptop. All course-related materials and software will be provided.

Topics covered (as of Spring 2021):

  • Introduction to Python
  • Jupyter Notebooks
  • Pandas
  • Data Visualization
  • Histograms and Numerical Summaries
  • Clustering with K-Means
  • Principal Component Analysis
  • Data Cleaning
  • Pivot Tables
  • Classification with K-Nearest-Neighbors
  • Classification with Decision Trees
  • Model Selection via Cross-validation
  • Linear Regression
  • Polynomial Regression
  • Autoregression
  • Bias in Data