Machine Learning

From Grundy
Jump to: navigation, search

Intuition and Introduction

If you have just come across the thousandth article propounding Machine Learning how (often specifically Deep Learning) in conjunction with Big Data is going to revolutionise every scientific, business and social sphere imaginable; it naturally makes sense to learn some more and get the facts straight. And the number of articles that have popped up on the 'A.I Revolution' would make for a 'Big' data-set in itself!

We need to dispel a few myths first: An important clarification in the above regard is that we are not even close to making a 'Strong' A.I. It is this kind of AI that the colloquial use of the word AI is ascribed to. What we do have are extremely powerful 'Weak' AI that are very good at doing and often outperforming human beings in narrowly defined tasks like playing Chess and Go or recognising a person's face or voice.

For the purpose of this Wiki we shall use AI and Machine Learning interchangeably. This is by no means valid, and Machine Learning is actually a subset of AI research. Typically, Machine Learning which we shall henceforth call ML, is a process of 'training' our algorithm to improve over time at performing the designated task. A useful example, (though limited in it's applicability as we shall see) is to think of a child learning something. Parents show the child a couple of trees. The next time the kid sees a tree, he can identify it to be a tree. This is learning by example. A lot of ML models work this way.

The model basically comprises some parameters. The assumption is that the right choice of parameters will allow us to describe the system of interest correctly. To make things clear, consider Newton's law of gravity. The parameter in this model is the Gravitational constant G. This parameter can be determined by observing some examples of objects moving under the influence of gravity.

Note the extremely important assumption: There exists a parametric model that belongs to this family of models, that describes our system of interest with sufficient accuracy. This is why we must choose the model that we are trying to 'fit' very carefully. There are infinitely many families of models to choose from and there is often no 'right' answer. The trade-off is usually between simplicity and accuracy. More complex models may better describe the system (beware of overfitting!!) but may take much longer to train. This is why Neural Nets and Deep Learning have suddenly gained prominence. With the advent of cheap hardware like General Purpose GPUs to perform the High Performane Computing required (and many other factors like cheap storage, massive datasets), such complex models are now within reach. Neural Networks themselves are a very small subset of what is now called 'Cognitive Computing'. These are paradigms of computing set to challenge the Von Neumann architecture, drawing inspiration from the most powerful, complex and efficient computing device known to man: the brain. Cutting edge neuroscience and new emerging hardware like memresistors is letting us test more and more complex computing models inspired by the brain. Ideas like population encoding and phase-lock logic are set to change how computing is done yet again.

Tl;DR version: Machine learning is nothing but glorified statistics

The rest of the article describes some machine learning resources and pertinent details of this burgeoning field.

Various kinds of ML Models

There's actually more to the story: The above training paradigm where we use 'labelled' examples(this is a tree; that is not a tree) to train our algorithm is called Supervised Learning.

The paradigm that has been garnering the greatest attention of late is called Reinforcement Learning, finding use in training AlphaGo and other game-playing and behavioural AI. Exciting ideas like Genetic and Evolutionary algorithms are subsumed by this paradigm. Check out this video where a bot learns to play Super Mario. The idea is to reinforce correct behaviour and deter incorrect behaviour, often via a stochastic reward mechanism. More technically, the algorithm must minimise a cost function in a learning environment. Lots of fascinating math comes into play and this field sees the direct application of Game Theory and Markov Decision Processes.

The last broad flavor of ML is called Unsupervised Learning. The algorithm is not given any input or reinforcement at all. Only data. It must draw it's own conclusions or observations. This is extremely challenging due to the open ended nature of the problem and largely lies unsolved. Most of the work in this field has been based on Pattern Finding and Clustering.

Basic Regressor-Classifier Models

Decision Trees & Random Forests

Unsupervised Models

Deep Learning

Neural Networks (Feedforward)

Convolutional Neural Networks

This video serves as a basic introductory guide to understanding the intuition behind convolutional nets.

Recurrent Neural Networks

Finally, the most awesome kind of networks which make use of memory and feedback. First up, a blog to boost your spirits - RNN Effectiveness.

Learning Machine Learning

There are tons of resources online. Here are our recommendations for deep learning -

Machine Learning Platforms

Machine learning today is powered by very efficient libraries that run on GPUs. Some of the exciting libraries to look at are,

  • TensorFlow - The tool powering all your favourite Google products - Gmail, Google Translate, Google Search, Google Speech etc. TensorFlow was recently made open source on Github. TensorFlow has a python API making machine learning easier and efficient. Have a look at our TensorFlow tutorial to find a list of TensorFlow resources.
  • Keras
  • Torch - Have a look at our Torch guide.
  • Theano
  • scikit-learn

Motivation: Why Implement from Scratch?

It is always a good practice to implement simple models like Neural Networks, RACTs, clustering etc. from scratch, at least once. For real applications, libraries are preferable as they have a large team working on it and have evolved with time, but understanding the underlying mechanism always helps in scenarios. Also, there may be times when no single library can help and you would have to get your hands dirty! Also,

  • it can help us to understand the inner works of an algorithm
  • we could try to implement an algorithm more efficiently
  • we can add new features to an algorithm or experiment with different variations of the core idea
  • we circumvent licensing issues (e.g., Linux vs. Unix) or platform restrictions
  • we want to invent new algorithms or implement algorithms no one has implemented/shared yet
  • we are not satisfied with the API and/or we want to integrate it more "naturally" into an existing software library

Let us narrow down the phrase "implementing from scratch" a bit further in context of the 6 points I mentioned above. When we talk about "implementing from scratch," we need to narrow down the scope to make this question really tangible. Let's talk about a particular algorithm, simple logistic regression, to address the different points using concrete examples. I'd claim that logistic regression has been implemented more than thousand times.

One reason why we'd still want to implement logistic regression from scratch could be that we don't have the impression that we fully understand how it works; we read a bunch of papers, and kind of understood the core concept though. Using a programming language for prototyping (e.g., Python, MATLAB, R, and so forth), we could take the ideas from paper and try to express them in code -- step by step. An established library, such as scikit-learn, can help us than double-check the results and to see if our implementation -- our idea of how the algorithm is supposed to work -- is correct. Here, we don't really care about efficiency; although we spend so much time to implement the algorithm, we probably want to use an established library if we want to perform some serious analysis in our research lab and/or company. Established libraries are typically more trustworthy -- they have been battle-tested by many people, people who may have already encountered certain edge cases and made sure that there are no weird surprises. Furthermore, it is also more likely that this code was highly optimized for computational efficiency over time. Here, implementing from scratch simply serves the purpose of self-assessment. Reading about a concept is one thing, but putting it to action is a whole other level of understanding -- and being able to explain it to others is the icing on the cake.

Another reason why we want to re-implement logistic regression from scratch may be that we are not satisfied with the "features" of other implementations. Let's us naively assume that other implementations don't have regularization parameters, or it doesn't support multi-class settings (i.e., via One-vs-All, One-vs-One, or softmax). Or if computational (or predictive) efficiency is an issue, maybe we want to implement it with another solver (e.g., Newton vs. Gradient Descent vs. Stochastic Gradient Descent, etc.). But improvements concerning computational efficiency does not necessarily need to be in terms of modifications of the algorithms, but we could use lower-level programming languages, for example, Scala instead of Python, or Fortran instead of Scala, ... this can go all down to assembly or machine code, or designing a chip that is optimized for running such kind of analysis. However, if you are a machine learning (or "data science") practitioner or researcher, this is probably something you should delegate to the software engineering team.

Tips for applying ML

  • Use pickle to save trained model as objects, which can be called easily, even after the kernel has been stopped.

See Also