Intermediate Data Programming

Intermediate Data Programming is a fast-paced, college level, programming class. Topics include:

  1. Writing programs that manipulate different types of data
  2. Leveraging the growing ecosystem of tools and libraries for data programming
  3. Writing programs that are both efficient and elegant; and,
  4. Writing medium-scale programs (100 to 200 lines).

The term “data programming” is loosely defined as “the programming required to be an effective data scientist”. Generally, data science classes think about statistical and algorithmic concepts applied to a domain or context. This data programming class focuses more on the algorithmic side of things and the programming required to even get the data into some data science tool.

Students will be introduced to common methods data scientists use to analyze data (data visualization, machine learning, etc) but the focus is on the programming that supports using those methods.

It’s a seriously tough course, so students should be ready to work. Students will spend the first part of the course learning Python very quickly! Python offers intrinsic support for loading, organizing and displaying data using a cornucopia of data structures and modules (lists, dictionaries, sets and data frames). Once students become dangerous with capability, they learn why Python can be slow when wielded improperly (e.g. code complexity & efficiency).

Data is displayed and explored in various ways: tabular, textual, graphical, and geo-spatial. The class will use an ecosystem of data science tools including Jupyter Notebook and various data science libraries including scikit image, scikit learn, and Pandas which will be leveraged to execute data visualization, Machine Learning and data analysis. Yes, students will build Machine Learning Data Models!


Table of contents