The Data Science accelerators are intense non-credit training courses given daily over 5 weeks. All accelerators include a theoretical component and a significant hands-on component, including in-class problem-solving using popular software packages on real-world data sets. All work is completed in the classroom.
The accelerators are targeted to practitioners with diverse backgrounds who do not wish to pursue a formal degree program. The accelerators cover a wide range of topics, from refresher courses to theoretical and practical aspects of data science.
In this accelerator students will become familiar with the math and computer science fundamentals needed to continue to a career in Data Science. The material includes basic calculus, linear algebra, statistics and Python programming. The accelerator includes 75 hours of instructor-led training and in-class hands-on work.
Course Outline
Calculus
Functions
Derivatives
Integrals
Linear Algebra
Vector spaces
Linear independence, bases and dimension
Matrix representation of linear functions
Matrix algebra and solving linear equations
Eigenvectors and eigenvalues
Probability
Discrete and continuous random variables
Histograms and probability distributions
Expectation, variance, covariance
Normal distributions
Statistics
Common statistical tests
Linear regression
Methods for curve fitting
Programming
Basic programming constructs
Basic data structures
Web programming
Basic database design
Algorithms
Sorting and searching, divide and conquer, greedy algorithms
Complexity
Graph algorithms
String algorithms
Software tools
Python
HTML
XML
JSON
SQL
Prerequisites: None
Outcomes: Become familiar with math and computer science fundamentals required for a career in Data Science.
This accelerator provides the essential skills required to extract actionable intelligence from data resources. Topics covered include principles of information-retrieval system design and management, and basic tools and analytic techniques to extract, report and visualize information from the data. The course includes 75 hours of instructor-led training and in-class hands-on work.
Course Outline
Data Collection and Organization
Describing and visualizing data
Pivot tables
Summary statistics
Data preprocessing
Data cleaning
Data integration
Data transformation and normalization
Data reduction
Correlation analysis
Data Structures
Spatial
Time series
Multi-dimensional
Data Visualization
Dashboard and scorecard
What if analysis
Parameters for reports/analysis
Graphs
Big Data Storage
Data base
Key-value stores
Star schema and snowflake
Data lakes
Data Mining
Multi-dimensional modeling
Data warehousing
Online Analytic Processing (OLAP)
Association rule mining
Clustering
Supervised Learning
Decision trees
Naïve Bayesian methods
Neural networks
Support vector machines
Software tools
Java
WEKA
Tableau
Tez
Hive
Pig
SAP
Prerequisites: None
Outcomes: Acquire the skills and learn the tools needed to perform data analysis, reporting and design visual dashboard solutions.
This accelerator will focus on the practical application of data collection, storage, organization and processing. Students will learn how to build the big data infrastructure, interfaces and mechanisms needed to provide data to be analyzed by data scientists. In addition to big data infrastructure, course work will also focus on data modeling, integration and pipeline processing. Basic programming skills and working knowledge of data structures is required. The course includes 75 hours of instructor-led training and in-class hands-on work.
Course Outline:
The Big Data Ecosystem
High performance computing
Grid and cloud computing
Mobile computing
Big Data Storage
Hadoop
HDFS
Big Data Management
Key-value tables
Documents
Graphs
Databases
Relational databases
NoSQL
Big Data Computing
MapReduce and Spark
YARN
Big Data Workflows
Tez and Storm
Oozie
Big Data Analytics
Clustering & classification
Recommendation engines
Machine learning
Data Cleansing Techniques
NA values/format correction
Encoding & Normalization
Big Data Visualization
Challenges
2D, multidimensional data
Hierarchical/network data
Tableau and PowerBI
Mahout
K-means clusters, Bayesian classification
Software tools
Hadoop
MySQL
HBase
MapReduce
Spark
Storm
Pig
Hive
Tez
Oozie
Java/Python
PowerBI
Tableau
Mahout
Prerequisites: Basic programming skills, data structures
Outcomes: Ability to prepare big data infrastructure and provide data to be analyzed by data scientists.
This accelerator covers methods of statistical inference, machine learning, predictive modeling and data visualization, data mining and big data, all of which are key for the daily work of a data scientist. During the course, students will apply methods in Python Scikit-learn library to perform typical data processing, e.g. classification, regression and clustering of data. Basic Python programming skills and working knowledge of data structures and algorithms is required, as is fundamentals of calculus and linear algebra, probability and statistics. The course includes 75 hours of instructor-led training and in-class hands-on work.
Course Outline:
Machine learning basics
Linear algebra
Probability and statistics
Python programming
Bayesian learning
Gaussian classification
Nearest centroids
Naive Bayes
Linear classification
Least squares
Support vector machines (SVM)
Logistic regression
Non-linear classification
Kernel methods
Neural networks
Feature engineering
Feature selection
Dimensionality reduction
Feature learning
Trees
Decision trees
Random forests
Boosting
Unsupervised learning
k-means
Spectral clustering
Regression
Linear regression
Support vector regression
Time series
ARIMA
Regression
Long short term memory
Text processing
Encoding and classification
Software tools
Python
Scikit-learn
Pandas
Prerequisites: Basic Python programming, data structures and algorithms, calculus and linear algebra, probability and statistics
Outcomes: Able to apply methods in Python Scikit-learn library to perform classification, regression and clustering of data.