AMA #9 Questions & Answers.

Your most burning questions from AMA #9.

There are several models on offer for machine learning, such as artificial neural networks, support vector machines, etc. How do we know which model to use given the myriad of choices?

Normally, we will use the simplest model as the baseline model, ie: linear regression. Then, based on how much time we have, we add on more complex models to see if we can get a performance improvement. Usually, more complex models have a trade off in terms of computation resources needed, or explainability.

Do we need to do EDA first to look at the training data before choosing the model?

Yes. It’s best practice to get a sense of your data before jumping into modelling.

I do not really understand the methods such as KMeans, Logistic Regression, … Are they all under the neural network algorithm or they are something different?

All these different algorithms are just some statistical methods that have been nicely packaged into a library that can be called with some simple code. So each one would be applied to solve a slightly different kind of data problem or solve the same problem in a different way.

The DataCamp coursework is very strong on the hands on aspect of using ython and the packages. However, I find that it is lacking in theory and mathematics behind the methods and algorithms. Sometimes the course tweaks or introduces certain (hyper)parameters without much explanation of what it does underneath the hood. Are there any recommendations for good texts that talk about the more theoretical side of what was covered? For a start, maybe resources on the major classification + regression methods? And also those on the few common deep learning algorithms? A more-math-less-text kind of text?

Check out Elements of Statistical Learning for anything non-neural network related. For neural networks, deep learning by Ian Goodfellow, or Andrew Ng’s stuff.

t-SNE is also another unsupervised learning classifier? I did the lessons but I do not really understand this. t-SNE only has .fit_transform(). There is no .predict().

You’re right. Let’s say you have 100 data points and you use t-SNE, you’ll get a 2D map of how t-SNE has clustered them, but if you have another 10 new data points, you’ll either need to rerun the whole thing again, or build a separate model to predict where the new data points will be mapped onto the old t-SNE. t-SNE is a very good tool to quickly visualize high dimensional data after you have clustered them so that you can see if your clusters make sense.

Something like a generative adversarial network was mentioned during the workshop?

You could use a GAN, but personally I would try to use human intuition to figure out what the visualized clusters mean, then hand-label my data with cluster 1/2/3 and use simpler methods like linear regression to predict the labels from there.

In the videos, the lecturer said the axis plotted have no physical meaning. Isn’t this makes the plot meaningless?

It is true that the axis has no meanings in the sense that they do not have units, and ultimately, you are squashing a high dimensional vector into 2 dimensions. However, the plots do preserve the relative distance between points, so it can be used as a visualization tool to give an intuition on how the data is clustered.

What is the difference in the Linear Regression for polyfit() and LinearRegression method in sklearn? They used the same least square method right?

If you’re referring to numpy’s polyfit, it is functionally the same as polyfit(degree=1) and LinearRegression.

Watch AMA #9 here for a quick recap. 

Leave a Comment