Data Analysis 4: Prediction Analytics with introduction to Machine Learning
Timing: Full time: weekday
Data analysis in business and policy applications is often aimed at prediction. The course introduces tools to evaluate predictions, such as loss functions or the Brier score. It emphasizes the importance of out-of-sample prediction, the role of stationarity, the dangers of overfitting and the use of training and testing samples and cross-validation. The course presents and compares the most important predictive models that may be useful in various situations. We will discuss classification (with logistic regression) as well as the use of tree-based machine learning methods (CART and random forest). Prediction methods with time series data will be also be presented. Using real life data and discussing difficulties of feature engineering will be an integral part of the course, too.