Machine Learning in Econometrics?

Machine Learning has two branches - they are Supervised and Unsupervised machine learning. In supervised machine learning, a set of inputs ( say x’s) is used to predict the outputs (say y). Here the output is already known. A training set is prepared where the network is tested on making some predictions, and if wrong those predictions are corrected. The training process continues till the desired level of accuracy is achieved.

Example - Back Propagation Neural Network

There are a number of ML methods like Support Vector Machines, random forest etc. Most of the ML algorithms find the complexity of the model from cross-validation. The delta value i.e the square of the difference between the expected and the resultant output is found out. Even in cross-sectional econometrics, the researcher finds out the effectiveness of a model by checking out other alternatives. As we come across data sets with many covariates, systematic model selection will become a standard part in econometrics. Machine learning prediction works on causal inference. The fundamental of supervised ML is that selection of a model is based by testing the accuracy on sample inputs. The accuracy of prediction determines if the model is a good fit.

Economics are often concerned with questions like, the effects of price changes, or changing the minimum wage. The foundation of classical econometrics says that prediction is not based on causal inference; it is found that price and quantity are positively correlated. The cost of living in cities of higher income is higher, as demand for an item increases so does the price. If prices and quantities are positively correlated in the data, the data mapping would not be accurate in case of a model that considers causal effects. The economic model based on causal effect would do good in counterfactual predictions.

Regression trees are decision trees where the target variable can take continuous values. They can be used to determine, what would be the reaction of an individual if the price changes based on his characteristics.

Unsupervised machine learning differs from supervised learning in that the input does not have a known output. This algorithm is effective in finding similar patterns. They can be used to find news which are based on the same topic; even in youtube when you watch videos on a particular topic you get a cluster based on similar topics. Empirical economics will improve a lot by using more sophisticated statistical techniques and more sophisticated statistical validation tests.

Inspired from te brilliant Susan Athey's answer in quora.