Logistic Regression | Logistic Regression in Python | Intellipaat
Hey guys welcome to the session by in telepods and in today's session we are going to discus about one of the most commonly used machine learning algorithms or elastic regression so let's how quick glance at the agenda I will start off with a quick look back at the linear regression algorithm and then we'll comprehensively understand the concept of logistic regression with the spam email classifier example and then we'll understand what is the confusion matrix and finally we'll implement a demo with the logistic regression algorithm let me just give you a quick recap at the linear regression concept as we will be needing them in law in stick regulation so let's start off linear regression recap with this example so this is Lauren she is looking for a property to buy but she is confused how to start so she goes to one of her friend Josh and asked him if he could help her to find a property with bigger garden area for Xbox a guy Josh on the right agrees to help her to find a property but he himself doesn't know how so what'd he do he goes to another friend of him explains them the whole situation and asked him if he can do anything about it he immediately says yes and start doing some calculations then he says to Josh that spending X bucks can get her a property of area while now josh is confused and asked him how did he find out a friend goes like simple linear regression now let's see how exactly he used simple linear regression to solve this issue so here are a dependent variable and independent variable property size being the dependent variable and money being the independent variable what this guy won't to do he want to find out a relation between the property size and money so it's like if you want a house of bigger area or bigger property size then you have to spend more money so both of them are directly proportional right so in this case we'll get a positive linear regression line write that as spending more money we'll get her a bigger property okay in other case but again annex an area is she wants a bigger garden area right so imagine the scenario keeping the property area constant the house area is inversely proportional to the garden area it's like if you want to increase the garden area so you have to reduce the house area so take it like this suppose you have already fixed the property and you want to construct a house with a garden in it but now you demand us you want to have a bigger garden area so if the size of the property is fixed and if you increase the size of the garden then obviously you have to reduce the size of your house right so in this case if you try to plot a regression line you would get a negative regression line right the garden area means smaller house right now let's take an example to see how exactly did he predict the value of the house so in order to find the regression line what he did heat up the historical data of property area sold in the particular price okay and he plotted it on the graph so this was the plotted point of property areas sold and passed for a particular price okay now what he did he draw a regression line now to find out what property area she can buy with X bucks he plots X on the independent variable scale and projected to the regression line and then against that point there he has the area Y this is the area Y so this is how he predicted Lauren can buy Y area of property in X Box all right now let's see what he can and what he cannot say from this so he can say that if Florence spends X amount of money she can buy a property area of Y but what he cannot say is well the property would have a good neighborhood or not or will the location be noiseless suburb or a bustling city right so these are the question which even he cannot answer using this graph so the questions like will the property will have a good neighborhood or will it train tomorrow or not or is this mail a spam or not all these kind of problem fall under a particular category known as classification problems in machine learning now with linear regression algorithm we could not answer these problem so thatís where logistic regression comes into picture now let us see where this logistic regression algorithm that we just talked about lies in the machine learning algorithm tree so in machine learning we use two traditional learning techniques to build a predictive model supervised learning and unsupervised learning again look at the supervised learning there are two categories regression and classification right in Dragon we have linear regression and in classification we have logistic regression and svm so our today's topic of discussion is logistic regression which comes under the category of classification okay so now that we have got a little bit idea about logistic regression let's go a little bit deeper and discuss about what exactly is logistic regression and why do we use it so what is logistic regression well learn stick regression is a statistical classification model that deals with categorical dependent variable again you must be wondering what are these categorical dependent variable well these are some of the discrete variable that have two or more categories without having any kind of natural order for example temperature area or general okay so you can say that logistic regression is generally used where the dependent variable is binary or where the dependent variable is binary that is only two outcomes are possible either yes/no true/false 1 0 etc right and also remember a fact that you can use both continuous and discrete input data with lost aggregation so before moving ahead let's look at the graph see there are two variables one is independent and other is independent can you figure out which one is dependent and which one is independent so before moving ahead let's take a look at this graph so before moving ahead let's look at this graph so over here we have two different variables are studying and probability of passing exam can you figure out which one of them is dependent and which one of them is independent so if you have guests that are studying is your independent variable and probability of passing the exam is dependent variable and I'd say that you are 100% correct so now that you know what exactly is logistic regression let's move ahead and see why do we use logistic regression well lustick regression can be used as a tool for applied statistics and discrete data analysis why the cause it gets the output in the form of probabilities which help us to easily classify the given data okay so this is why we are using logistic regression so now that we have successfully established the basic of logistic regression by understanding what and why of it let's go ahead and see how logistic regression can be applied for classifying data with the help of an example so here we are using an example of spam email classifier we need to build a predictive model that would classify whether a mail is spam or not so let's look at the approach that we are going to take while building this model first we'll try to understand the variable duck on the basis of which we are classifying the mail next we'll plot the label data once you are done with spotting the label data will draw the regression curve and finally we'll try to find out the best fitted curve using maximum likelihood estimator all right so let's get started so step one is defining the variable so let's start off by understanding waters independent variable in our case so in our case the independent variable is count of spam words well here are some example of commonly used spam words word like bye get paid guaranteed van or unlimited etc okay so these are the kind of voids which when found in the mail the mail is treated as spam if the number of these kinds of words are more in a mail then that mail would definitely be a spam mail okay just for a better representation let me put them in a bag of spam words let's put the by so there's a bad or spam words that's but all the words in them one by one by get paid guarantee venno and unlimited now what about our dependent variable well our dependent variable is going to be the probability of mail being a spam if the probability is 1 that means the mail is spam if it's zero means it's not a spam mail well in general the mail with less number of words from the list of spam bots will be treated as a spam mail with five or more spam words in a mail would be treated as a spam mail but there can be cases where you might find mails with less spam was being spam also you might find cases where males with more number of spam words is not a spam mail so here our aim is to build a predictor model to classify the mail with minimum adder okay let's see what is our next step so our next step is plotting the label data let's say this is a set of data that we'll be using to build the model there's a very small data set but just remember that when you are using logistic regression make sure that you are using a large amount of data Gnostic regression works pretty well with large amount of data it doesn't work that good with small amount of data okay here just for the purpose of understanding we are using small data set alright so we have two variables number of spambots and the probability of male being spam against each male okay next as a step three well draw the track - in line so next what we do we'll plot our data set on x-axis and y-axis with independent variable on the x-axis and dependent variable on the y-axis so number of spam birds in a male is a independent variable right and the probability of that particular male being a spam or not a spam is a dependent variable it depends on the number of spam birds in our mail right so let's plot these voids fun by one so first we have is one word and the probability of this mail being as Thomas zero so it will be plotted up here so next we have is five spam words in a mail and the probability of that mail being a spam is 1 so it will be taught at somewhere here next is three spam word is a star mail so it will be plotted again here two words it's not a spam mail so I'll be plotting up here seven words again a spam mail up here four words not a spam mail here nine words it's a spam it spam birds again it's not a spam so once we are done with the plotting this is how a plotted data would look like now let's say that we have a new mail and now we want to figure out whether it's a spam or not so before moving ahead let me just tell you in real-world scenario to perform loss stick decoration you need a large amount of data set and also you might find many cases where a spam mail might contain only two words whereas spam mail might contain only to spam words or also it might be possible that you get a mail where you are having more than five spam words and even in that case your mail is not a spam okay so here we are building a predictive model with primary aim to reduce the error okay now let's say we have a new mail up here now we need to figure whether this mail is spam or not but how do we do that well first of all we need to plot a regression curve which would fit the best and that curve would be our logistic regression curve but now the question comes how to find out which is the best regression curve okay well this will contain three steps well the first step is to convert the y axis from the scale of probability confined between 0 and 1 to a scale of log-odds then drawing a random regression line out of the data that we already have then with the help of sigmoid function will convert the log odds to the probability of male being spam we'll plot each male on the base of their new probability values and this will form our regression curve then finally from this plot we'll find out the log likelihood values of each male now from this plot we'll find out the log likelihood values of each male the individual likelihood at last will find the log of likelihood that would be our log likelihood of the regression curve now the question comes what are these terms such as log odds or log likelihood means so before moving ahead let's discuss that so what does log of odds mean let us explore this with the help of an example so before we proceed any further let me just clarify you one thing this probability and odd these are not the same thing let me explain you this with an example suppose this guy he goes to fishing five times a week so out of five times he catches a fish two times and he failed to catch three times okay so now in this case what is the probability an odd for getting a fish for dinner so let's first calculate probability so probability is chance is four divided by total chances so chances for catching a fish so what is the probability of catching a fish that is how many times he caught a fish that is two divided by total chance he had to catch the fish so that was five right so here probability of getting a fish for the dinner s two by five okay next comes the odd chances for divided by chances against that is the ratio of how many times he caught the fish divided by the how many time he failed to catch a fish okay so he caught the fish for two times and he failed in catching the fish for three times so the odd for getting a fish for dinner is two by three okay so now that we know odds now let's see what log of odds and log odd ratio are the same let's find out for your information log odds is also called as logit function okay so in our previous example where the fish and man was catching a fish let's add another factor to his fishing let's add a factor as Vado so then we can recreate the entire scenario as he was successful two times on a rainy day but on a sunny day he was successful for three times now the odds of catching fish on a sunny day is how much it's two by three right and the odds of catching on a rainy day is 3 by 2 right as it's already mentioned that on a sunny day he catches three times so he is successful for three times and he failed for two times so let's see he was successful three nines on a rainy day and two times on a sunny day so odds for catching a fish on a sunny day is 2 by 3 that is he's successful two times sunny day and he failed for three times in a week so odd for catching fish on a sunny day is 2x3 similarly odd for catching fish on a rainy day is 3 by 2 and now log of odds of catching a fish on a sunny day is just log value of 2 by 3 and similarly log of odds of rainy day is log of 3 by 2 now log of odds ratio is nothing but the log of odds on a rainy day divided by odds on a sunny day okay next is log of odds ratio so log odds ratio is nothing but the ratio of log of the odds on a sunny day the log of the odds on a rainy day not as log of 2 by 3 by 3 by 2 which is nothing but log of zero point 4 4 so here we can say that odds an odd ratio are both different now let us go back to this step so now that we have understanding of log-odds so we are ready to perform the step okay so let's see how we converted the 0 1 axis to minus infinity to plus infinity axis so here we'll be converting the probability scale to scale of log-odds okay so for the log odds we have a formula as log of probability of spam divided by 1 minus probability of spam so here the probability of a mail being a spam is 1 ok so we get the value of log odds as log of 1 divided by 1 minus 1 that is log of 1 by 0 it's positive infinity how we got that so log of 1 by 0 is nothing but log of 1 minus log of 0 and log of 0 up here is minus infinity so minus of minus infinity is what plus infinity how log of 0 up here is minus infinity let's see so in general logarithm so we have log 0 with base B equals C so if you convert it into exponential form we get 0 equal B to the power C right so if the value of B is less than 1 so the value of C has to be extremely small or closer to minus infinity for this equation to be true okay and we'll get a positive infinity in the case where B is greater than 1 or our base is greater than 1 okay so coming back up here so here log of 1 by 0 is nothing but log of 1 minus log of 0 and we got the result as positive infinity as log of 0 up here is minus of infinity and minus of minus infinity is what plus infinity how we got the value of plus infinity let's see so log 0 with base B equals C if we convert this into exponential form we get something like this 0 equal B to the power C right so for the equation 1 to be true if the value of B or the base is less than 1 then in that case the value of C will be positive infinity okay for example zero point one to the power thousand would be smaller than zero point 1 to the power hundred right so more the value of C up here more closer will the number get to zero right and in next case is the value of B is greater than one so for this we have to make the value of C closest to minus infinity by for example we have 10 to the power minus 1 okay or 10 to the power minus 10 which is more smaller 10 to the power minus 10 right so which is much smaller or closer to 0 10 to the power minus 10 right so that's why we have to keep the value of C as less as we can so here if the value of B is greater than 1 the value of C would have to be close to minus infinity and all to make the equation true so in this case log of 1 by 0 by default we have bases 10 ok so that's why if we took the value of C as minus infinity so minus and minus of infinity is plus of infinity so that's why record the plus infinity up here I hope this thing is clear to you how we got the value of log odds as plus infinity so I'll plot this up as plus infinity log odds now next let's find the log odds of non-spam mail so here we have the formula as log of probability of amnael not being a spam divided by 1 minus priority of mail not being a spam so log of 0 divided by 1 minus 0 so we have log of 0 by 1 which is log 0 minus log 1 which tends to minus infinity ok similar concept now we have our data up here so first we'll assume 1 regression line so now we have our data up here so first we'll assume 1 regression line then we'll project our data onto the regression line ok now let's just go back to the step where with the help of sigmoid function will convert the log odds probability of mail being spam okay but what does this segment function mean so sigmoid function is the standard logistic function the logic function is defined as L multiplied by e to the power K times of K minus K dot upon 1 plus e to the power K times of X minus X naught so here L is the curves maximum value K is the steepness of the curve X naught minus X as the value of sigmoid Point okay so here the sigmoid function e to the power X divided by 1 plus e to the power X here k equal 1 and X naught equals 0 and L equal 1 so this mathematical sigmoid function from s-shaped curve which is confined between 0 & 1 let's try to understand how logic function works with the help of an example so let's say we have a set of unspecified data and on these data we need to apply a sigmoid function so let's see what a sigmoid function can do by visualizing this graph so let's say we have our unspecified data and on these data we need to apply sigmoid function so plot this data and find out the respective Y for these point okay so what a sigmoid function is doing it can be visualized from this graph right so you are giving some values on your x-axis and using sigmoid function you can predict its probability on the y-axis right so this is the reason why sigmoid function is very useful while solving the classification problem it takes any real valued number and maps on to a value between 0 and 1 ok well now that we have an idea of how sigmoid function works let us move ahead with a spam email classifier so we are ready to perform this step that is converting the log-odds graph into a sigmoid function graph now we have to find out the best MLE okay best Maxim likelihood estimator so we are going to replace the log out value of each mail to get the probability of each mail being a spam so we have the formula up here probability equal e to the power log odds divided by 1 plus e to the power log odds so one by one will place the log odds of each mail into this formula and calculate the probability for each mail okay for example I have this mail which after projecting into the regression line gives us log out value of minus of 3.
2 okay so what we'll do up here will calculate the probability using the value of minus 3.2 we'll place the value minus 3.2 and our formula e to the power log of minus of 3.2 divided by 1 plus e to the power log of minus of 3.2 so from here we will get the probability as 0.03 so plot it accordingly under the new graph between probability of mail being spam versus spam would come so on the base of probability a mail would like somewhere near to zero so according to the prediction this mail is not as family again for another meal which is projecting on a regression line this gives us a log out value of 5 point 6 so when we put the value of 5 point 6 in the formula we'll get the probability as 0.99 so the probability of this mail being a spam is 0.99 so again we put this into this graph similarly one by one will calculate for each one of them for this mail after projecting onto the regression line we get the log odds value as minus of 4 point 5 so minus of 4 point 5 and put into the formula we get the probability as 0.01 so the prediction of this mail is not a spam mail which is same as actual right so again we plot this mail onto a graph alright so similarly you can repeat this step for the rest of the email as well and finally we got the s-curve up here so there's a regression curve but you must be wondering is this the best fitted curve or how do we find out whether it's best or not well this is when the concept of maximum likelihood comes into picture so now that we have regression curve let's find out the likelihood of this curve so first find out the individual likelihood of each male again you must be thinking how do we get the likelihood value well likelihood of each male is nothing but the probability value of each male being spam so likelihood of first male being spam is 0.
01 likelihood of second male being spam again 0.01 similarly third 0.03 for 0.05 and so on till 8 male being 0.99 okay so once you get the individual likelihood of each male multiply them to find out the likelihood of the entire curve okay then calculate the log of lightly hood for calculating the log of likelihood you can just take the log of the previous result okay you can just take the log of previous multiply it as order here we are adding all the logs because log of a multiplied by B equal log of a plus log of B okay so we got the value of log likelihood of this curve as minus of 0.08 for now let us rotate this line to find out the best fitted regression line so we got the log likelihood of this curve as minus of 0.08 for now let us rotate this line to find out the best fitted regression line again we calculate the individual log likelihood of each male for this one let's say we got log likelihood which is shown on your screen so final value we got a pass - of zero point two zero seven so we got the value as minus of zero point two zero seven so now if we compare the log likelihood values for these two regression line well see that line a has bigger value of log likelihood than line B right as line a has log likelihood value of minus of 0.084 so the log likelihood value for line a is minus of 0.08 for whereas for line B is minus of zero point two zero seven so minus of 0.08 for s bigger than zero point two zero seven right so therefore we can say that line a has better likelihood value than line B now again we'll rotate the line will keep on rotating the line until we get the maximum value of log likelihood and then finally we'll choose a line which is having the maximum log likelihood and that line would be the best fitted regression line so I hope the concept of logistic regression is clear to you guys so for now this was all about the theoretical and mathematical concept of floresta regression so the confusion matrix shows the ways in which your classification model is confused when it makes predictions it is basically a summary of prediction result on a classification problem the main key to a confusion matrix is summarized account value of correct and incorrect prediction so they made shown on your screen represents our confusion matrix let's see what exactly does it mean but before that let me just tell you how to create a confusion matrix this would make things more clear for you so let's see how so for creating a confusion matrix you had been eating a test data set or a validation data set with expected outcome values then make a prediction for each row in your test data set then from the expected outcome and prediction count the number of correct prediction for each class and the number of incorrect prediction for each class organized by the class that was predicted okay let's see what exactly does it mean so here's an example we have some expected output and a predicted output for that so from here you can see that all the red color results are the incorrect predicted values and the green ones are the correct one so in total we have seven correct prediction out of ten okay so from here you can see that the accuracy of your model is 70% now here men classified as men are three one two and three and women classified as women are four one two three and four and now men classified as women men as women bun admin as women - ok so - and women classified as men one now if you create a confusion matrix out of it you will get something like this men classified as men three men classified as women one woman classified as men - and women classified as women is four so from here you can say that total action men three plus two is five total actual women one plus four again five and totally correct values men classified as men and women classified as women that is three plus four it's seven so from here you can see that there are more errors while predicting men as women rather than predicting women as men okay so this was about how you can calculate a confusion matrix now let's come back and see how to interpret a given confuse and matrix there's the sample of a confusion matrix so here we have created a confusion matrix for a fire alarm so this represents a actual alarm this represents no actual alarm here predicted for positive and here predicted far as negative so if the alarm goes on in case of fire so it's a true positive event the alarm goes on and there is no fire so it's a false negative event there is no alarm in case of fire so it's a false negative even and there is no alarm and there is no fire then that means it's a true negative event okay so let me just explain you this example this should make things more clear to you so actual alarm and predicted fire so total true positive events we have 40 and total false negative event we have 10 so from here you can say that total number of times the alarm rang was 40 plus 10 that is 50 okay here we have false positive event as 5 and true negative event as 95 so the total number of times the alarm did not ran was 5 plus 95 dhatus 100 and this one is the predicted fire or not so true positive plus false positive that is 40 plus 5 how many times the Machine positively predicted the fire so that is 40 plus 5 45 and how many times the machine was not able to predict the fire not as 10 plus 95 dhatus 105 and total number of events that is 50 plus 100 or 45 plus 105 is 150 so we have mentioned an equal 150 up here that is total number of events okay so this is how you interpret the confusion matrix so let's move ahead so now let me just show you in my jupiter notebook how you can create a confusion matrix let me just open my jevetta notebook so there's my jupiter notebook what we are gonna do is create a confusion matrix and Python so the very first thing that I'll be doing up here is importing the required libraries so from SK lon dot metrics so I'll be importing confusion matrix next let's create some expected value say expected equal so let's add some values in it like 1 1 0 1 0 0 1 0 0 & 0 now this is my expected value now let's create some predicted values for that so predicted equals first it's predicting correct next let's say 0 0 1 0 0 1 0 1 so this is a predicted value now let's calculate the confusion matrix so let's say results equal confusion and ESCO matrix inside this valve as I expected and predicted value expected comma predicted and print the result that's it let's execute it so here you got the result as 4 2 & 1 3 so what does it mean so first we have is 4 so 0 predicted as 0 is 4 times 1 2 3 & 4 so 0 predicted as 1 is 2 times 0 predicted as 1 1 0 predicted as 1/2 okay next is one predicted as 0 so one predicted as 0 is just once here ok and next is one predicted as one that is 3 times 1 predicted as 1 1 2 & 3 there is a confusion matrix and what we can say from here so total number of correct prediction made by machine is 4 plus 3 that as 0 classified as 0 & 1 classified as one okay and total number of incorrect prediction is 2 plus 1 that is 3 so we have 7 correct prediction and 3 incorrect prediction okay and the total number of times the Machine predicted the value to be 0 is 4 plus 2 that is 6 times and total number of times the Machine predicted the value to be as one was 4 times okay so from here we can say that a machine predicts the result 7 times correct and three times wrong so the accuracy of a machinist 70 percent so this was all about how you can create a confusion matrix and bite on a demo with logistic regression with the help of scikit-learn package and we went to build this floristic regression algorithm on top of the heart disease data set so let's quickly go to jupiter notebook and start with a demo right so this is Jupiter notebook guys and the first task could be to elude of the heart disease data set and for that purpose we would have to import the Bundys package so I'll just type in input pandas a speedy and I'll use this read underscore CSV method from the pandas package so I'll type in PD dot read CSV and I'll pass in the name of the which is basically hard dot CSV and I'll store this in this the does set object now let me have a glance of the first few records of this dataset so this is a dataset which comprised of all of these columns and we're going to build the logisitics regression algorithm on top of this column over here which is the ski target so now our gate should be our dependent variable and the rest of the column should be the independent variables right and this target basically means that so you have 1 and 0 values over here the 1 value means that the person or the patient has the heart disease and 0 basically means that the patient does not have heart disease right now let me also have a glance the shape of this dataset so I'll just type in print data set dot shape and this gives me a value of 303 and thirteen so 303 means that there are 303 records in this dataset and 13 columns now let me actually have a glance at the value counts of the Stargate column so this value counts basically tells me the frequency of these two values so have these two values in this column which is basically 1 and 0 so there are 165 records where the value is 1 and there are 138 records where the value is 0 so this piece key means that in this dataset there are 165 patients who actually have the heart disease and 138 patients who do not have the heart disease so I'll go ahead and actually visualize this so I'll load up the matplotlib package and Seabourn packages and i will pass in this target column onto the x-axis and the data is our data set which is basically this heart disease data set and what I'm doing is basically building a histogram and I'll plot the supper with you right so this is the power plot for the value of 0 and this is the bar plot for the value of 1 and this basically tells us the same thing so 165 is the value of the number of patients who actually have the heart disease so this basically is for all of those patients who do not have the heart disease right now let me go ahead and divide the data set into features and label sets so I am storing all the features into this X object so all of these twelve columns would be my features and this target column would be my label or would be my dependent variable right and this is how I'm going to divide the dataset so I'm going to extract all of the columns except the last column and store it in this X object and similarly I'll only take the last column and store it in this Y object right now let me have a glance at these individual independent variable and target variable so X dot head gives me all of the independent variables and Y dot head gives me the target right so now that we have our independent variables in the dependent variable let me go ahead and divide this dataset into training and testing set and for that purpose I'd have to load up the Train test split method from a scalar in that model selection and over here I'm showing the test size to be equal to 0.
2 so this means that 20% of the records are in the test set and the rest 80% records are in the training set all right I'll click on run again now we have divided the data set into training and testing sets now finally it's time to build the model and for that purpose I'll be importing law to stick regression from SK learn your model and I'm going to create an instance of this so I'll just use this method logic regression and I'll name that instance to be log model and I'm going to fit this model on top of the train set so I'm basically passing extreme in Y train as the parameters right I'll click on run right so we have successfully built the model on top of the.
Train set now we are going to go ahead and predict the values on top of the test set so I'll type in log model dot product and I'll pass in X test as the parameter and I'll store the result in white-bread so we have also predicted the values now it's time to calculate the accuracy so I will type log model dot school and I'll pass in X test and y dust so I want to calculate the accuracy for the prediction on top of the test self right so the accuracy comes out to be 73% which is actually not that bad and let me actually also build the confusion matrix so confusion matrix would give me a table of values which actually comprises of the correctly predicted values and misclassified values so I'd have to import confusion matrix from SQL or not metrics and again I'll just pass in whitest and I prayed as the parameters inside this function and I will print this out right so this left diagonal which you see thus left diagonal actually represents all of those values which have been correctly classified and this right diagonal represents all of those values which have been misclassified and if you want to get the accuracy all you have to do is add up 20 plus 25 and divide it with all of the values and you get the same accuracy so let me actually add up a new cell over here and calculate the accuracy from this confusion matrix so I have to divide this left diagonal with all of the values so that would be 20 plus 25 divided by 20 + 35 + 10 + 6 and this gives me a value of seventy three point seven seven which is the same as I got over here so the accuracy is 73% right so we have been the confusion matrix now I also go ahead and build the ROC curve so the ROC curve it sort of gives me the right trade-off between the true positive rate and the false positive rate so let me go ahead and plot this and this is what we get over here so on the y axis we have the true positive rate and on the x axis we have the false positive rate and basically you can understand this plot this.
Way so the clues or this curve as to this top right corner over here - the better the model that is this curve needs to cover greater area and this what you see red line so this basically represents our classifier which would give you around 50% accuracy and your model would be as better as it is far away from this red line that is this blue line has to be towards the top left corner right and this is how we can implement logistic regression model with the help of a scale on so guys this brings us to the end of the session and do subscribe to in telepaths youtube channel for more such intimate of content you.
2 okay so what we'll do up here will calculate the probability using the value of minus 3.2 we'll place the value minus 3.2 and our formula e to the power log of minus of 3.2 divided by 1 plus e to the power log of minus of 3.2 so from here we will get the probability as 0.03 so plot it accordingly under the new graph between probability of mail being spam versus spam would come so on the base of probability a mail would like somewhere near to zero so according to the prediction this mail is not as family again for another meal which is projecting on a regression line this gives us a log out value of 5 point 6 so when we put the value of 5 point 6 in the formula we'll get the probability as 0.99 so the probability of this mail being a spam is 0.99 so again we put this into this graph similarly one by one will calculate for each one of them for this mail after projecting onto the regression line we get the log odds value as minus of 4 point 5 so minus of 4 point 5 and put into the formula we get the probability as 0.01 so the prediction of this mail is not a spam mail which is same as actual right so again we plot this mail onto a graph alright so similarly you can repeat this step for the rest of the email as well and finally we got the s-curve up here so there's a regression curve but you must be wondering is this the best fitted curve or how do we find out whether it's best or not well this is when the concept of maximum likelihood comes into picture so now that we have regression curve let's find out the likelihood of this curve so first find out the individual likelihood of each male again you must be thinking how do we get the likelihood value well likelihood of each male is nothing but the probability value of each male being spam so likelihood of first male being spam is 0.
01 likelihood of second male being spam again 0.01 similarly third 0.03 for 0.05 and so on till 8 male being 0.99 okay so once you get the individual likelihood of each male multiply them to find out the likelihood of the entire curve okay then calculate the log of lightly hood for calculating the log of likelihood you can just take the log of the previous result okay you can just take the log of previous multiply it as order here we are adding all the logs because log of a multiplied by B equal log of a plus log of B okay so we got the value of log likelihood of this curve as minus of 0.08 for now let us rotate this line to find out the best fitted regression line so we got the log likelihood of this curve as minus of 0.08 for now let us rotate this line to find out the best fitted regression line again we calculate the individual log likelihood of each male for this one let's say we got log likelihood which is shown on your screen so final value we got a pass - of zero point two zero seven so we got the value as minus of zero point two zero seven so now if we compare the log likelihood values for these two regression line well see that line a has bigger value of log likelihood than line B right as line a has log likelihood value of minus of 0.084 so the log likelihood value for line a is minus of 0.08 for whereas for line B is minus of zero point two zero seven so minus of 0.08 for s bigger than zero point two zero seven right so therefore we can say that line a has better likelihood value than line B now again we'll rotate the line will keep on rotating the line until we get the maximum value of log likelihood and then finally we'll choose a line which is having the maximum log likelihood and that line would be the best fitted regression line so I hope the concept of logistic regression is clear to you guys so for now this was all about the theoretical and mathematical concept of floresta regression so the confusion matrix shows the ways in which your classification model is confused when it makes predictions it is basically a summary of prediction result on a classification problem the main key to a confusion matrix is summarized account value of correct and incorrect prediction so they made shown on your screen represents our confusion matrix let's see what exactly does it mean but before that let me just tell you how to create a confusion matrix this would make things more clear for you so let's see how so for creating a confusion matrix you had been eating a test data set or a validation data set with expected outcome values then make a prediction for each row in your test data set then from the expected outcome and prediction count the number of correct prediction for each class and the number of incorrect prediction for each class organized by the class that was predicted okay let's see what exactly does it mean so here's an example we have some expected output and a predicted output for that so from here you can see that all the red color results are the incorrect predicted values and the green ones are the correct one so in total we have seven correct prediction out of ten okay so from here you can see that the accuracy of your model is 70% now here men classified as men are three one two and three and women classified as women are four one two three and four and now men classified as women men as women bun admin as women - ok so - and women classified as men one now if you create a confusion matrix out of it you will get something like this men classified as men three men classified as women one woman classified as men - and women classified as women is four so from here you can say that total action men three plus two is five total actual women one plus four again five and totally correct values men classified as men and women classified as women that is three plus four it's seven so from here you can see that there are more errors while predicting men as women rather than predicting women as men okay so this was about how you can calculate a confusion matrix now let's come back and see how to interpret a given confuse and matrix there's the sample of a confusion matrix so here we have created a confusion matrix for a fire alarm so this represents a actual alarm this represents no actual alarm here predicted for positive and here predicted far as negative so if the alarm goes on in case of fire so it's a true positive event the alarm goes on and there is no fire so it's a false negative event there is no alarm in case of fire so it's a false negative even and there is no alarm and there is no fire then that means it's a true negative event okay so let me just explain you this example this should make things more clear to you so actual alarm and predicted fire so total true positive events we have 40 and total false negative event we have 10 so from here you can say that total number of times the alarm rang was 40 plus 10 that is 50 okay here we have false positive event as 5 and true negative event as 95 so the total number of times the alarm did not ran was 5 plus 95 dhatus 100 and this one is the predicted fire or not so true positive plus false positive that is 40 plus 5 how many times the Machine positively predicted the fire so that is 40 plus 5 45 and how many times the machine was not able to predict the fire not as 10 plus 95 dhatus 105 and total number of events that is 50 plus 100 or 45 plus 105 is 150 so we have mentioned an equal 150 up here that is total number of events okay so this is how you interpret the confusion matrix so let's move ahead so now let me just show you in my jupiter notebook how you can create a confusion matrix let me just open my jevetta notebook so there's my jupiter notebook what we are gonna do is create a confusion matrix and Python so the very first thing that I'll be doing up here is importing the required libraries so from SK lon dot metrics so I'll be importing confusion matrix next let's create some expected value say expected equal so let's add some values in it like 1 1 0 1 0 0 1 0 0 & 0 now this is my expected value now let's create some predicted values for that so predicted equals first it's predicting correct next let's say 0 0 1 0 0 1 0 1 so this is a predicted value now let's calculate the confusion matrix so let's say results equal confusion and ESCO matrix inside this valve as I expected and predicted value expected comma predicted and print the result that's it let's execute it so here you got the result as 4 2 & 1 3 so what does it mean so first we have is 4 so 0 predicted as 0 is 4 times 1 2 3 & 4 so 0 predicted as 1 is 2 times 0 predicted as 1 1 0 predicted as 1/2 okay next is one predicted as 0 so one predicted as 0 is just once here ok and next is one predicted as one that is 3 times 1 predicted as 1 1 2 & 3 there is a confusion matrix and what we can say from here so total number of correct prediction made by machine is 4 plus 3 that as 0 classified as 0 & 1 classified as one okay and total number of incorrect prediction is 2 plus 1 that is 3 so we have 7 correct prediction and 3 incorrect prediction okay and the total number of times the Machine predicted the value to be 0 is 4 plus 2 that is 6 times and total number of times the Machine predicted the value to be as one was 4 times okay so from here we can say that a machine predicts the result 7 times correct and three times wrong so the accuracy of a machinist 70 percent so this was all about how you can create a confusion matrix and bite on a demo with logistic regression with the help of scikit-learn package and we went to build this floristic regression algorithm on top of the heart disease data set so let's quickly go to jupiter notebook and start with a demo right so this is Jupiter notebook guys and the first task could be to elude of the heart disease data set and for that purpose we would have to import the Bundys package so I'll just type in input pandas a speedy and I'll use this read underscore CSV method from the pandas package so I'll type in PD dot read CSV and I'll pass in the name of the which is basically hard dot CSV and I'll store this in this the does set object now let me have a glance of the first few records of this dataset so this is a dataset which comprised of all of these columns and we're going to build the logisitics regression algorithm on top of this column over here which is the ski target so now our gate should be our dependent variable and the rest of the column should be the independent variables right and this target basically means that so you have 1 and 0 values over here the 1 value means that the person or the patient has the heart disease and 0 basically means that the patient does not have heart disease right now let me also have a glance the shape of this dataset so I'll just type in print data set dot shape and this gives me a value of 303 and thirteen so 303 means that there are 303 records in this dataset and 13 columns now let me actually have a glance at the value counts of the Stargate column so this value counts basically tells me the frequency of these two values so have these two values in this column which is basically 1 and 0 so there are 165 records where the value is 1 and there are 138 records where the value is 0 so this piece key means that in this dataset there are 165 patients who actually have the heart disease and 138 patients who do not have the heart disease so I'll go ahead and actually visualize this so I'll load up the matplotlib package and Seabourn packages and i will pass in this target column onto the x-axis and the data is our data set which is basically this heart disease data set and what I'm doing is basically building a histogram and I'll plot the supper with you right so this is the power plot for the value of 0 and this is the bar plot for the value of 1 and this basically tells us the same thing so 165 is the value of the number of patients who actually have the heart disease so this basically is for all of those patients who do not have the heart disease right now let me go ahead and divide the data set into features and label sets so I am storing all the features into this X object so all of these twelve columns would be my features and this target column would be my label or would be my dependent variable right and this is how I'm going to divide the dataset so I'm going to extract all of the columns except the last column and store it in this X object and similarly I'll only take the last column and store it in this Y object right now let me have a glance at these individual independent variable and target variable so X dot head gives me all of the independent variables and Y dot head gives me the target right so now that we have our independent variables in the dependent variable let me go ahead and divide this dataset into training and testing set and for that purpose I'd have to load up the Train test split method from a scalar in that model selection and over here I'm showing the test size to be equal to 0.
2 so this means that 20% of the records are in the test set and the rest 80% records are in the training set all right I'll click on run again now we have divided the data set into training and testing sets now finally it's time to build the model and for that purpose I'll be importing law to stick regression from SK learn your model and I'm going to create an instance of this so I'll just use this method logic regression and I'll name that instance to be log model and I'm going to fit this model on top of the train set so I'm basically passing extreme in Y train as the parameters right I'll click on run right so we have successfully built the model on top of the.
Train set now we are going to go ahead and predict the values on top of the test set so I'll type in log model dot product and I'll pass in X test as the parameter and I'll store the result in white-bread so we have also predicted the values now it's time to calculate the accuracy so I will type log model dot school and I'll pass in X test and y dust so I want to calculate the accuracy for the prediction on top of the test self right so the accuracy comes out to be 73% which is actually not that bad and let me actually also build the confusion matrix so confusion matrix would give me a table of values which actually comprises of the correctly predicted values and misclassified values so I'd have to import confusion matrix from SQL or not metrics and again I'll just pass in whitest and I prayed as the parameters inside this function and I will print this out right so this left diagonal which you see thus left diagonal actually represents all of those values which have been correctly classified and this right diagonal represents all of those values which have been misclassified and if you want to get the accuracy all you have to do is add up 20 plus 25 and divide it with all of the values and you get the same accuracy so let me actually add up a new cell over here and calculate the accuracy from this confusion matrix so I have to divide this left diagonal with all of the values so that would be 20 plus 25 divided by 20 + 35 + 10 + 6 and this gives me a value of seventy three point seven seven which is the same as I got over here so the accuracy is 73% right so we have been the confusion matrix now I also go ahead and build the ROC curve so the ROC curve it sort of gives me the right trade-off between the true positive rate and the false positive rate so let me go ahead and plot this and this is what we get over here so on the y axis we have the true positive rate and on the x axis we have the false positive rate and basically you can understand this plot this.
Way so the clues or this curve as to this top right corner over here - the better the model that is this curve needs to cover greater area and this what you see red line so this basically represents our classifier which would give you around 50% accuracy and your model would be as better as it is far away from this red line that is this blue line has to be towards the top left corner right and this is how we can implement logistic regression model with the help of a scale on so guys this brings us to the end of the session and do subscribe to in telepaths youtube channel for more such intimate of content you.