Logistic Regression in Python - Machine Learning Basics

Hello guys welcome back to my channel this other and today we'll implement the logistic regression algorithm from scratch using Python and will also implement it using the SK learn library and we'll compare our results at the end of this video. Ok so here are the timestamps to each section in the video you can find the written version on medium as well as the n type code and explanation on a google chrome opera tree and I linked these in the description. And please don't forget to subscribe to my channel and like this video it really helps so let's get started. So what does logistic regression. So in statistics logic regression is used to model the probability of a certain class or event logic regression is similar to linear regression because both of these involve estimating the values of parameters used in the prediction equation and this is all based on some given training data so linear regression basically predicts the value of some continues dependent variable whereas a logic regression is used to predict the probability of a class or an event so the output of logic regression always lies between 0 and 1 and because of this property it is usually used for classification purposes. So now let's look at the logistic model. So consider a model with features x1 x2 x3 up to xn and let the binary output be denoted by Y and this can take the value. 0 or 1 and let P be the probability of y equal to 1 and the mathematical relationship between these variables can be expressed as this equation that is natural log of P by 1 minus P equal to B 0 plus B 1 X 1 plus BX 2 etc and here B. B0 b1 b2 these are the parameters of weights that will be actually estimating using training and here the term. P by 1 minus P is known as the odds and so the natural log of P by 1 minus P is called the log odds and this is simply used to map the probability that lies between 0 and 1 to a range between minus infinity and plus infinity so this is the basic math behind what we are going to do. So from this equation we will derive the value of P so the natural log on the LHS can be eliminated by raising the arch s to the power of E.

We are left with this equation and now we can easily simplify this equation to get the value of P as this if you divide the value of numerator and denominator by this we get the value of P as this expression so if you are familiar with the sigmoid function that is commonly used in machine learning you will notice that this equation is also has the same form that is 1 by 1 plus e raised to minus X where X is B 0 plus B 1 X 1 plus B 2 X 2 plus extra so it always maps to a value between 0 and 1 just like a probability so we'll be using this equation to make our predictions and next step is to estimate the values of the weights B 0 B 1 etc. But before doing that we will define a loss function so the loss function is basically used to calculate the error in a predicted value. Our goal is to find the optimum values of the weights so that this loss or error function is the minimum for any predicted value. So this is the loss function that will be. We'll be using. It is called the l2 loss function here. Y is the actual value and Y bar is our predicted value. So to find the error we find the difference between y and y bar and square this this will be our error for any given value of x right and to get the total error for our training data set. We simply find the sum of this error for all the rows in the data set and that's why we have the Sigma function here. So this is the actual value and this is the value that we predicted using our equation so our goal is to minimize this entire error loss for the given data set to do that. We'll be using the gradient descent algorithm and this whole process is for to us the training process so for a detailed explanation on how the gradient descent algorithm works. You should also check out my video on linear regression using gradient descent. So you might know that. The partial derivative of a function at its minimum value is equal to zero so gradient descent basically uses this concept to estimate the weights of our model by minimizing the error or loss that we defined in the previous section.

So for the rest of this video let us assume that we are predicting the value of y and we have a similar feature. X on which. Y is dependent here why I bar is the predicted value for each X I in our given data set right so X X is the dependent variable and Y. I bar is our prediction. So all you have to do now is to estimate the values of this b0 and b1 so that our predictions are accurate so initially let the value of b0 and b1 be equal to 0 and let n be our learning rate so learning rate controls. How much the weights are updated at each step in the learning process next calculate the partial derivative of the loss function with respect to b0 and b1 and here a d b0 is the partial derivative with respect to B 0 and D. B1 is the partial derivative the loss function with respect to B 1 in case you're not familiar with calculating the partial derivative. Let me quickly show you how we arrived at this. So the partial derivative of the loss function with respect to B 0 is dual by dou B 0 and that would be equal to 2 times. This part comes here why. I minus y I bar and next we'll find the partial derivative of each term inside the bracket and since Y I is constant its partial derivative will be equal to 0 so minus the partial derivative of Y. I bar with respect to e 0. So let's find the partial derivative of this term separately so to find the partial derivative this term will be using the U by B rule which basically states that the derivative of a fraction like u by B is equal to the denominator into the derivative of the numerator - the numerator into the derivative denominator divided by the denominator square. Okay so so. I think you could also use the chain rule to find the derivative of this term but I'll be using this so here the denominator into the derivative the numerator since the numerator is a constant and B zero so minus the numerator into the derivative of the denominator okay so derivative of 1 is 0 plus derivative of e raised to any term tacked on itself the parameter with respect to B 0 of this this term is going to be minus 1 and derivative with respect to B 0 of this term is going to be 0 so that's it divided by so this minus 1 and this - cancel off each other and we are basically left with E raised to minus B 0 minus B 1 X I divided by 1 plus e raised to minus T 0 minus b1 X.

I the whole square. So this can be related to this form. Okay that is simply splitting into two. So the value of this particular term can again be written as 1 minus 1 by 1 plus e raise to minus p0 minus p1. Excite okay so if you notice this and this is actually equal to Y I bar okay. This is the equation that we are using to predict so this is equal to PI I bar and this is also equal to Y I bar so this whole equation is actually equal to Y I bar times 1 minus y I bar okay. So this the required partial derivative of this term. So let's bring this negative sign outside so that is minus 2y i minus y i bar and we found the derivative value of this to be. Y I bar times 1 minus y bar okay. This is a required derivative value and similarly the partial derivative of loss function with respect to B 1 is going to be the same thing except that we will have an additional term X. I here because when we find the derivative of this term with respect to B so it's simply minus 1 instead of that minus 1 we'll have minus X I here so the value of the partial derivative tells us by how much the values of B 0 and B 1 should be updated so that our total error is reduced. Ideally we want the error to be 0 which means that our model is 100% accurate so next step is to update the values of B 0 and B 1 using the calculator derivative value okay so B 0 equal to B 0 minus the learning rate times DB 0 and similarly for B 1. So every time we do this the value of our error is reduced and so our predictions become more more accurate.

Okay so we repeat this process until our loss or error becomes zero so each repetition is known as a nitration or me. Pok so I ran this for 300 iterations and we can see that after around 150 200. Our loss has dropped to zero. Okay so we will repeat the this gradient descent algorithm 150 times okay or for 150 epochs so now. Let's implement this so. I have imported all the libraries that we need so you can download the data set using this command. I have taken the data set from Kaggle from here. You can download it manually from here as well next. We load the data into this variable called data. Let's see how it looks like. So this later said basically describes the fer product was purchased or not depending upon these features so we'll be predicting this value purchased which is a 0 or 1 and we'll be choosing age as our feature. X ok so this will be X and this will be Y visualize the data set using matplotlib and also divide it into training and testing data using the train test flight function. Ok so this what it looks like this is the age and this is whether the product was purchased or not so now let's create the logical aggression model and before that we will define a few helper functions. So here we show the meet to the origin. This is done just because of the characteristics of the logic function next we have predict method which takes in X b0 and b1 as arguments and this basically plugs in these values into our potential equation and turns the value so for each value in X we are making a prediction and returning that value so the contents of this array will be our predicted values by box so we'll also convert this into an umpire array so it is easier to work with now. Let's implement a logic regression algorithm so first of all we need to normalize the value of x next. Let's initialize our values now. We will run our learning our training process for 150 times. So let's define a follow so let's make the prediction.

Let's call our variable. Y prep that is equal to predict X the current value of v-0 and current. Okay at this point if you want you can also calculate the loss by plugging in the values into the loss function that we defined but. I'm not going to do that here. So let's directly find the values of the partial derivatives with respect to b0 and b1 of the loss function so let's call the value of the partial derivative of the loss function with respect to b 0 as the b0 and with respect to e1 as DB 1 okay and we will now simply convert these equations into code and similarly for dv1. There's actually the same thing the only difference is that that is another X term here okay so the sum function is actually a substitute for the Sigma function in our equation. And everything else is just the same just converted to Col next. Let's update the values of b0 and b1. Finally we return the final - of b0 and b1 so now that we have created the model the next step is to train the model using our training data. Okay let's find out what's wrong so. I found the problem so I haven't defined any value called e so instead we'll be using the function called exp which can be found in the library map so from math import exp and this is basically used to raise any value to the power of E. Okay so now. Let's try and run this okay. Philippe okay so I did not okay so we have the values of b0 and b1 so now we are ready to make the tensions with our testing values. Okay so we need to normalize the values of the testing values of X now we can predict the values. Okay so now we have our predictive values in white thread but as I said in the beginning lassic regression is used for classification. So if we check the value of y thread now it will contain values between 0 & 1 or the probabilities that were returned by our credit function so to convert these probabilities into 0 or 1 values. What we are going to do is that we will define a threshold. So let's say 0.

5 and all values about 0.5 will be considered to be a 1 and everything else will be considered to be a 0 so you can actually change your threshold value to any suitable value. You like for example point seven or point eight depending upon your use case and your model so now if we check. All values have converted into zeros and ones so now we are ready to plot this and check so these are our predicted values. Now you can also find the accuracy so what. I'm basically doing here. Is that every time my predicted value is equal to the actual value. I'm incrementing this variable accuracy and so my total accuracy would be the value of this variable divided by the total number of predictions okay and it turns out to be 76% and now let's try and implement the same algorithm using the SQL library which has an inbuilt class core MARSOC regression which does the same thing that we just did so this is the code for that so from SQL or not linear model we put the logic regression class. We create an object of the class and we simply fit called the fit function on the training data and purely shaping the values and this runs just because the fit function expects the values in a certain format. Let's run this. So now we are ready to make a prediction. So let's do that. So we call the predict function on and on the logic regression object. LR model and we pass in X best. Okay next let's try and plot this and also find the accuracy okay so this will log the values and this is used to find the accuracy so it has an inbuilt function called score to find the accuracy we pass in the test values and it will automatically calculate the prediction and compared against test so let's see so this is the graph and our accuracy is around 73% so that is kind of surprising actually because the accuracy of our model that we created from scratch seems to be higher than the accuracy of any built model. Sometimes these values could change depending upon you know the variables that we use for training.

So that's it thank you so much for watching. If you have any suggestions or questions please do leave them in the comment section below or you can reach out to me through email and please don't forget to like this video and to subscribe to my channel for more such content and. I'll see you in the next one.