Data science
Master Analyse et politique économiqueParcours Statistique et économétrie
Description
1) The Data science part is structured in four macro blocks:
1. The art of learning from data. What is learning; supervised learning and function approximation; bias-variance trade-off; model accuracy, assessment and selection; cross validation.
2. Regression methods and regularization. Least squares revisited; model selection and regularization; subset selection methods; shrinkage methods (ridge, LASSO, LARS, elastic nets); dimension reduction methods (PCA, PLS).
3. Classification. Linear regression on indicator matrices; logistic regression; linear and quadratic discriminant analysis (LDA and QDA); hyperplane separation theorems; optimal separating hyperplane; “kernel trick”; Support Vector Machines (SVM).
4. Tree-based methods. Stratified feature space; tree-building process; recursive binary splitting and pruning.
2) The Deep Learning part is structured in four macro blocks:
1. Machine learning paradigm; overfitting and underfitting; bias and variance; gradient-based learning; motivations for deep models; historical trends in artificial neural networks research.
2. Architecture design for deep feedforward neural networks; hidden layers, hidden and output units; universal approximation theorem; computational graphs language; back-propagation algorithm
3. Surrogate loss functions; batch/minibatch deterministic and stochastic methods; main challenges in neural network optimization (ill-conditioning, local minima, flat regions, cliff, etc.); stochastic gradient descent; momentum; Nesterov momentum; parameters initialization strategies; algorithms with adaptive learning rates; supervised pre-training
4. Regularization strategies for deep models; parameter norm penalties; data augmentation and sparse representation; early stopping algorithm; Ensemble methods; dropout; adversarial training.
5. Introduction to convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
Compétences visées
Upon completion of this course, students will have solid theoretical knowledge on the most effective (supervised) machine learning techniques, and gain practice implementing them.
- Select the appropriate method based on the scope and available data.
- Implement a range of regression and classification methods.
- Develop predicts tools for economics and business problems.
- Source, store and pre-process heterogeneous (large scale) data.
- Choose, design and train supervised machine learning techniques.
- Coding in R and Python.
- Speak in public to present an empirical project.
Modalités d'organisation et de suivi
- Supervised learning: Oral lectures (in English) [22h] and computer exercises with Python [8h]
- Deep learning: Oral lectures (in English) [14h] and computer exercises with Python [6h]
Disciplines
- Sciences économiques
Bibliographie
Part 1:
- Hastie T., R. Tibshirani, J. Friedman, 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer.
- James G., D. Witten, T. Hastie, R. Tibshirani, 2013, An Introduction to Statistical Learning with Applications in R, Springer.
Part 2 :
- Goodfellow, I., Y. Bengio, & A. Courville, 2016, Deep learning. MIT press.
- Chollet, F., & J. J. Allaire, 2017, Deep Learning with R. Manning Publications.
Chollet, F., 2017, Deep Learning with Python. Manning Publications.