What is Machine Learning?

When I started getting in touch with Artificial Intelligence (AI), no one could give me a clear distinction between all those buzzwords such as “Artificial Intelligence”, “Pattern Recognition”, “Data Mining” and “Machine Learning” (ML). It took me a long time to actually be able to dintinguish the meaning of these, and, to say the truth, it is still not very clear to me how exactly the first three relate to each other. The meaning of “Machine Learning”, however, is very simple, and I believe it should have been made clearer from day one. This post has the goal of separating “Machine Learning” from this mess, making it very clear when something is ML and when something is not, and what the relation between ML and AI is.

That said, while I do expect you to have a perfect notion of what is and what is not part of the field after you read this blog post, I don’t intend to give you a better definition of the expression “Machine Learning” than the definitions you may have already found in other places in the web.

Why the confusion?

When I sat down to write this blog post, I thought of taking a look at how others have defined the field before (because, of course, I didn’t expect to come up with any magic new definition). I came across this video (from the amazing course on Machine Learning in Coursera by Andrew Ng):

And this video:

If you watch them, you will see these three field definitions:

Field of study that gives computers the ability to learn without being explicitly programmed.

A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E.

These first two definitions are good, because they make it clear that the programmer doesn’t tell explicitly what the machine should do: the behavior of the machine is completely dependent on the data it has previously seen (and of course the ways in which the machine learns).

"The extraction of knowledge from data"

A problem with this last definition is that it does not clarify who extracts the knowledge from the data. If the programmer writes rules that conform to the patterns in the data, does this count as Machine Learning? (more on this in the next section)

If you look back at these three definitions, they may cause you to be confused about what exactly the relation between Machine Learning and other fields is. Why is it “Machine Learning” and not, say, “Artificial Intelligence”? (and how is it related to AI, in the first place?) And when am I applying AI that is not ML? And can I apply ML that is not AI?

So… how do I explain this?

A simple story

As I said, I don’t actually intend to give any better definition. Any definition I could present here could be probably invalidated after any amount of some more “socratic” scrutinity. Instead, (and following the field itself,) my explanation is based on examples.

Say you are a photographer who takes two types of pictures: (1) landscapes; and (2) people’s faces (say, for their CV). Let’s assume you have them all stored in a folder in your computer. One fine day you decided to organize your pictures into two folders (landscapes and cv). You started dumping some of your stored photos into the two folders, but after some 50000 images you realized they were too many and thought it would be nice to have some automatic method to do that.

In the lack of a better idea, you decide that an easy way to check if the image is of a human is to count the ammount of pixels that have a color similar to a human skin color. You create a rule that says something like “if the image has more than 500 pixels of those colors, then it is a cv picture”.

You write a program that counts the number of pixels of those colors in your images and moves the files to the respective folder. You run your program and realize that you cv folder now has a lot of images of sandy landscapes. Your program did not do what you hoped it would.

You look better at the images you had already classified, trying to find patterns that could help you to identify the two types of images. You make, say, a histogram of the colors in the two types of images and realize that there is a set of rules that you could apply that would work in most of your cases. You implement those rules, run your program, and feel relatively satisfied: you did as well as you could.

Was this AI? Was this ML? (1)

Now… look back at the story and think about it: did you just make AI? I would say that the answer should probably be yes: if your program had worked, someone who has never seen what your program does would have most probably believed your program was “intelligent”. On the other hand, your rules were engineered by you, and the machine was only supposed to apply them to decide what to do. While you may feel you “taught” the machine what it should do in each case, all the teaching was done through “explicit programming” (see the first definition of Machine Learning in the previous section). In other words: it was not the machine who learnt; it was you.

Revisiting your story

You think you could achieve some better performance, but you are not sure exactly how. Instead of simply counting the colors of your pixels and using this to conclude the type of the image, you think it would be a nice idea to use some fancy image processing techniques. You recall something called the “Discrete Fourier Transformation”, that allows you to get a representation of the images in the “frequency domain”. You apply it to some of your already classified, and realize that, truly, the cv pictures have a very specific pattern of frequencies that you can develop some set of rules for… and anything that does not conform to that pattern seem to be a landscape. You implement your algorithm and feel a little more confident that this time there are fewer errors.

Was this AI? Was this ML? (2)

If you think the first version of your “image classifier” was AI, then you should probably take this second version as AI. (One common complaint among AI practitioners is that people stop taking something as AI as soon as they find out how it works).

On the other hand, could you say you just did ML? My answer is still no: it is not because you applied some fancy “feature extraction” algorithm to understand your data that you are now doing Machine Learning. Again, all the extraction of knowledge is done by you, and you just explicitly instructed the machine to follow what you considered to be the best set of rules you could think of.

So… well… then… actually… when is it ML?

Let’s look one last time to your story. You came a long way until here: used some exploratory statistics to find what the colors in each type of images can tell about them, and also what frequency patterns your images most commonly have. You implemented a rule-based classifier that decided in which folder to put each one of your images.

But you feel all of these rules you developed are not good enough. It would be great if the program could learn by itself what folder to put each image in, based on examples of images of each type. Enter Machine Learning!

When you apply Machine Learning, you don’t want to tell what rules to use: you give data, and you expect the program to figure out by itself what to do. For example, let’s say you still have those 50000 images you had manually classified in the beginning of our story. Let’s say you heard of some popular image classification model called Convolutional Neural Network for which you found some code in the internet and would like to try. In this case, for each one of your 50000 “training” images, you also tell the model what is the answer you expect it to give back (i.e., either cv or landscape). When you start, the model makes a lot of mistakes, outputting many times cv when it was supposed to output landscape, and vice-versa. But every time it makes a mistake, it updates its internal variables in a way that causes it to become more likely to answer right the next time. When the model is done training, you should expect it to answer right most of the times (at least for those 50000 images you were using to train it).

Now that it is done training, you put those 50000 images aside and use the already trained model to put all of your other images in their respective folder. Notice that you didn’t tell the model what to look for in each image. You may have told it how to update its internal variables, but you didn’t explicitly develop any rule. The “rules” were found by the model itself. It learned them!

Some final thoughts

Machine Learning is a huge field, and is going through some nice hype in the last few years. The example I gave in the previous subsections uses something called “Supervised Learning”, which is when you tell the model what is the expected answer for each training instance. There are other models for which you don’t have to explicitly point out the “right answer” every time (and that are useful when you don’t have those 50000 manually labeled examples you trained your Convolutional Neural Network with).

The models may also learn several different types of things. In the case of Convolutional Neural Networks, it is actually not 100% clear what the networks are learning; however, other types of models may learn, say, rules, just like those you were trying to develop manually in the beginning of our story.

Other resources

There are a lot of resources in the internet on both AI and ML. There are some, however, that are my favorites, which I thought of linking here.

On Artificial Intelligence:

On Machine Learning