Recently, there has been a lot of fuss about the word Learning. We always hear about machine learning, deep learning, learning algorithms, etc. But what does this really mean? Did scientists find a way to create a brain-like component and implement it inside machines? Is it just a marketing word used to sell software and services? Are machines going to take over the world? Or what on earth is this about?
Well… do not worry, we are still not there… yet! In this article I will introduce what learning really is, and how scientists teach computers to achieve human-like tasks, or in some cases, even outperform humans. This article is completely technical-free. It is written so that anyone from any domain will be able to grasp the idea. Therefore, the concept will be explained in an oversimplified way. Other concept-heavy topics will be discussed in future articles.
Why do we need Learning?
Before learning took place, problem-solving tasks relied on writing algorithms. An algorithm is simply a set of rules, that takes an input and returns an output that represents a solution.
Consider the following: Given a list of numbers, you are asked to sort them in an increasing order. This problem is solved by algorithms. In fact, there are a lot of algorithms to solve this task. They work by taking the list, applying some rules and manipulations, and returning the list in a sorted way. This problem and other similar ones were “somehow” easy for computer scientists. They only had to think and come up with an algorithm to solve a given task.
On the other hand, some problems were not so easy to solve by algorithms. People started to ask more from computers. They wanted the machine to have super abilities of solving very hard tasks. Tasks that scientists completely had no idea how to program. For example: How is it possible to write an algorithm, that takes an image of an animal and outputs the type of it? This is a very easy task for humans, but solving it with algorithms is a very complex mission if not impossible. Humans know how to classify animal photos, but they do not know how to describe the steps they take to reach the answer. Here an important question arises. How to solve problems that even humans do not know how to describe?
Learning comes to the rescue! Humans learn from experience and so do machines.
Imagine a room with a nice fireplace and a small child playing around. The child’s mind is completely fresh and has no idea what a fireplace is. He always wants to explore and learn stuff. He sees a red flame, for him it is something new, fascinating, and interesting to investigate. In his mind it like let us go and touch it! Unfortunately, here comes the pain, and the child learns from the harmful experience not to touch fire in the future. Here I emphasize on experience, because this is what learning is about.
Types of Learning
The idea behind learning is very simple and intuitive. To make a computer learn a task, we give it a set of questions followed by an answer. Note that we do not know how to describe the steps to go from a question to an answer, so we assign this task to our poor buddy — the machine. This is one category of learning named: Supervised Learning. Example: consider a program that takes an image and answers whether it contains a cat or not. The way of teaching this program is by giving it a lot of images labeled as “is cat” or “is not a cat”. The machine’s responsibility is to learn a way to go from an input to a label. How? we will see an example later.
There are other categories such as Unsupervised Learning where we do not give the answer to the questions, and the model has to find a way to find it. There is Reinforcement Learning which is like the one used in video games where the model learns to reach a high score by choosing the best actions and learning from bad ones. In order to keep this article simple, I will not get into details, but I guess you got the general idea.
How it Works?
It is time to solve the mystery and know the magic behind learning. We will illustrate the idea with a very simple problem. Let us say you want to start a new career in selling old cars. Unfortunately, your experience is very limited in this domain making it difficult to assign prices.
To solve this issue, you go around and collect the following information from a fellow car dealer:
- fuel type,
- miles per gallon (mpg)
- number of doors,
- and most importantly the selling price.
This is considered as your experience or background knowledge, and your goal is to learn from it.
For the sake of simplicity, we will take the mpg first, and we will add the other features later. Now, let us start thinking. The price has to have a relation with the mpg. Let us draw a graph that shows the variation of price with respect to mpg:
The graph shows that the more the mpg, the cheaper is the car. Maybe this is because cars with more mpgs are more economic, thus less expensive than the strong cars.
Now we have a base solution, when a new car with similar mpg comes, we simply assign the same price as before. But, what if we got new car with an unseen value? Here is the main goal behind learning, we want to learn how the values are calculated…
To do so, we start by suggesting a possible calculation model. We propose that there is a linear relationship between the two. This relationship can be described as a straight line as shown in the image below:
We use this line as our reference, when we get a new value we map it to the line and get our price. For example: when we get a car with 40 mpg we assume that the price is around 6000$. Fair enough…
But how to find this straight line? A straight line is represented with the following equation:
price = a + b * mpg
We know the price and the mpg from our dataset, but what are a and b?In simple words, a and b are the parameters of the straight line. Different values of a and b mean different straight lines.
How to find which line is the best?
The best line is the one that better fits the data points. Therefore, the most important question is how to choose the best values of a and b to get the best fitting line? Why this question is so important? Well, because this is all what a machine has to learn… This is the very core of every learning model.
The process is like tuning the keys of a guitar. We start by rotating the different keys until we get the perfect tone.
In our example it goes like this… We ask the machine to try different values for a and b, then compare the price we got with our previously known data. e.g. we know from our data, when mpg = 30, the price =1390. We calculate the predicted price for all the prices, and compare them with the actual one.
If the prices for a given a and b differ a lot, we change a and b to reach a better estimate. We keep going like this until the model finds the best settings. Fortunately, there are some algorithms that speed up and reduce the search space for finding the best parameters e.g. Gradient Descent.
Fitting a straight line with a single feature is not that hard. It is like a blind person on a mountain, searching for a ball down the hill. But consider adding the other features:
price = a + b * miles + c * horsepower + d * fuel type + e * number of doors
Here we are no more in a 2-dimensional space, and our minds are not even able to imagine the situation. All of this and we are still using a linear representation for our model. What about more complex models? e.g. fit a polynomial line instead of a straight line. More complex models require fitting a huge number of parameters. The learning process in this case, will take a lot of computations and time. Well, how complex we need to define our model? This completely depends on the data, and the experience of the machine learning engineer.
I hope this article solved the mystery behind learning. Learning is all about discovering the best parameter values (a, b, c …) for a given model. These values enable the model to output good results based on previous.
Machine learning is now possible due to the advances in computer hardware, and the drop in their prices. There are a lot of machine learning models. They differ with their level of complexity, and the tasks they are able to solve. The one we introduced in this article is called Linear Regression, one of the most simple — yet very powerful — algorithms.