# Applied Machine Learning

## K Nearest Neighbors

** Published:**

The *K Nearest Neighbors (KNN)* algorithm is part of a family of *classifier* algorithms that aim to predict the *class* or *category* of an observation. KNN works by calculating the distance, often the Euclidean (i.e., straight line) distance, between observations. In this post, we walk through the application of the KNN algorithm and demonstrate the conditions under which the algorithm excels, does poorly, and is improved through feature engineering.

## Decision Trees

** Published:**

The *Decision Tree* algorithm is part of a family of classifier and regression algorithms that aim to predict the class or value of an observation. Decision trees classify data by splitting features at specified thresholds such that, ideally, we can perfectly predict the observation’s label. At its core, features are split by using two relatively simplistic algorithms: `entropy`

and `information gain`

. When deciding how to split a feature, a threshold is selected such that the informational gain is the highest, meaning more information is revealed and thereby our predictions for our dependent variable’s label is improved (or perfect).

## Naive Bayes

** Published:**

The *Naive Bayes* algorithm is part of a family of *classifier* algorithms that aim to predict the *category* of an observation. It is a Maximum Likelihood (MLE) *generative* model that suggests each class is generated by its features. At its core, the algorithm uses Bayes theorem. In this post, we walk through the application of the *Naive Bayes* algorithm and demonstrate the conditions under which the algorithm excels, does poorly, and is improved through feature engineering.

## Support Vector Machines

** Published:**

The *Support Vector Machine (SVM)* algorithm is part of a family of *classifier* and *regression* algorithms that aim to predict the *class* or *value* of an observation. The SVM algorithm identifies data points, called support vectors, that generate the widest possible margin between two classes in order to yield the best classification generalization. The SVM is made powerful by the use of kernels, a function that computes the dot product of two vectors, thereby allowing us to effectively skip feature transformations and consequently improve computation performance. In this post, we walk through the application of the *SVM* algorithm through linear and nonlinear modeling.

## Dimensionality Reduction

** Published:**

The idea behind dimensionality reduction is simple: take high dimensional feature spaces ($k$) and project them onto lower dimensional subspaces ($m$) (where $m$ < $k$). Dimensionality reduction has several kind of appealing properties like solving the curse of dimensionality and overfitting, but it also allows us to visualize high dimensional data and to compress it. Collapsing high dimensional data that would otherwise be too difficult for us to understand or interpret suddenly becomes much more salient when we collapse it down into two or three dimensions. In this post, we walk through the application of the principal component analysis, a central dimensionality reduction algorithm.

## Cluster Analysis

** Published:**

Cluster analysis is a form of *unsupervised* learning which aims to discover and explore the underlying structure in the data. The crux of a cluster analysis algorithm is distance metrics: the way you measure similarity or distance between observations. Unsupervised learning is often used in situations where you do not have labelled data (perhaps it is expensive) or when you might not know the correct values for some of your data and therefore, you might want to evaluate its underlying structure.