Machine Learning Algorithms | Data Science Algorithms | Intellipaat
Hey guys welcome to another session by Intellipaat driving a data-driven business using machine learning is considered an important aspect in today's world top companies such as Amazon Facebook Apple and many more used machine learning to perform advanced analytics and drive their business to success in today's session we're gonna have a quick look into the world of machine learning algorithms now before we begin do subscribe to Intellipaat YouTube channel so that you never miss out on any of upcoming videos now let's have a look at the agenda for today's video first we'll understand why we require machine learning algorithms then we'll further understand what these algorithms actually are after that we'll take a quick dive into the world of machine learning algorithms and finally we'll do a couple of demos using these algorithms also guys if you're looking to get certified in data science.
Intellipaat provides data science certification training courses for more details you can check out the description without much further delay let's get started what do you think is the need for something called as an algorithm well consider this situation right so let's say you're either baking a cake or you're driving your car you're even walking or singing well your body is continuously oh you know executing the set of steps that you have already trained it to do and then this is what we call as an algorithm so basically when you when you driving your car your brain is already programmed to do all the tasks that are required to pretty much help you to you don't drive your car and then when you're walking as well how do you maintain balance well as a kid if you could realize that maintaining your balance as a toddler was very difficult but then you trained yourself every day and then now you can walk very easily right so this process which involves learning and then this repetitive process is again pretty much can be termed as an algorithm as well guys well if you've been wondering if algorithms are new concepts well they're not algorithms have been used for decades together well back to this person on the screen called Alan Turing this person appears a good fact this person was the reason probably why World War two ended he was the one who decoded the very famous encrypted enigma messages from Germany and then this person decoded that and then and all the code breakers and so much more right so the entire point here is to tell you that algorithms have been used as an age-old tradition that's being used and these days we've been pushing it to our computer science field as well and then making sure that we make full use out of it guys and then again why would we require it well think of the huge amount of data that's being generated these days and then think of the methods that we'd need to process it to understand the data or to process the data and then to you know pretty much clean up the data and work with it right so for all of these we have something called as algorithms guys so on that note what are algorithms what is the formal definition of an algorithm well guys algorithms are as simple as this they are just a set of rules or you can call them as processes as well to be followed in calculations or any other problem-solving operations when done by a computer well house how simple is that well this is exactly what an algorithm means well you have a turn a symbol on the left hand side that is pretty much what a flowchart looks like as well or don't worry you'll just be checking out the flowchart sections in the next set of this slide but then right now I want to tell you guys it you guys are using algorithm as is well step one you're looking at your screen while you've programmed yourself to look at the screen that's an algorithm and YouTube is running a recommendation algorithms where you just aw let's say you search for something Python tutorials or anything for that matter right intellipaat videos are up there so how does YouTube know that you know it should recommend intellipaat's videos to its learners well again an algorithm is being said there and every time you check your mail you mails a filter in your inbox or in your spam folder and so much more so how does do a Google or gmail know what or what mail is a spam mail what mail is not a spam mail right so that again is an algorithm right there and no matter what operating system you're on Windows right now or let's say iOS let's say Mac OS Android whatever right so all these operating systems are using algorithms right now and on that note we let's quickly break it down into simple terms and check out the relationship between a pseudocode and a flowchart guys a quick info guys if you're looking to get so defined in data science Intellipaat provides the data science certification training courses do check out our website for more information let's continue the session so here it is a very simple piece of code for you guys this is what we call a pseudocode or a pseudocode is almost a high level language code it just looks a little very literal and then you can figure out what the code is doing even though you might not be a native programmer so once the code part of it and the other is what we call as the algorithm which the flowchart alongside it so pretty much we're inputting a single variable a putting the value 10 to it we're inputting our variable B or putting a value of 20 to it we're adding it pretty much so C will have the value 30 right now right a plus B is 10 plus 20 and then we out putting that the word start and stop again are a part of the pseudocode flowchart relationship and on the right side if it can just take a look this is what the flowchart of this exact pseudocode will look like guys well this was very simple so let me quickly step it up one single notch you know where we can go about checking another pseudocode flowchart relationship guys so here again we're inputting a inputting B and then we're making sure that until a becomes equal to B will be printing all the values from A to B and then we're gonna be increasing a by one so right now a is 10 it's gonna check if you know 10 is equal to 20 it's not so until 10 becomes 20 we're gonna start printing out everything so the answer is going to be 10 11 12 13 14 all the way until 20 and this is going on around an iteration in a loop if you can figure out the diamond box is called as the decision box where it has two tracks one is one can be your true/false strike or a yes/no track and in this particular case we have the yes/no track here guys so on that note we need to understand why we would require all these algorithms in machine learning right so before that why would we even require machine learning well guys again the machine learning definition can pretty much can be given you know to the world as the ability for a machine to learn something without it being programmed for that particular thing well how cool is that it is again basically the field of study where computers use a massive amount of data and they apply all of these algorithms were training themselves how here's the keyword training themselves and again making predictions on that right so again training in machine learning entails feeding a lot of data into the algorithm and allowing the machine itself to learn more about the process information well you're gonna just tell the Machine a lot of basics probably or just show it one iteration where the Machine pretty much goes on to figure out say 9 or 10 more iterations on its own it's gonna learn on its own it's kind of process on its own and pretty much you know you can work with that data later on right so again we can call this a process of converting just raw data into useful information as but then we're doing it with the help of these algorithms that we're about to learn guys so on that note or we need to check out what the types of machine learning are so we have three main types of learning which happens when we talk about machine learning guys it's supervised learning it's unsupervised learning and it's reinforcement learning guys so if I were you guys I would just suggest I would just suggest you guys just take a minute pause on the slide to note these three types of machine learning guys supervised learning unsupervised learning and reinforced learning if you're already familiar with the concepts or if you think that you got it in the bag well let's more to check out what supervised learning actually means oh well supervised learning as the name suggests requires some sort of supervision right let us talk in terms of variables so we can understand it easily again in super wise machine learning algorithms let's say we have input variables and our output variables these input variables are denoted by X and the output variables are denoted by Y so X is input.
Y is output the goal of any supervised learning system is to understand how your output variable Y changes with respect to the change made in terms of X guys so how does the output variable Y vary when we go about playing with our input variable X is pretty much the goal of for supervised learning system guys and then here will also be approximating the mapping function or to a point where we'll have new input data coming in which we haven't seen which the machine hasn't seen and then we can predict new output variables.
Y with respect to all the new data the new X data that the machine just saw so we have pre ended for a particular amount of X's and then it saw a new amount of data a new amount of input variables and then it trains itself to pretty much give us new a Y output value guys so how cool is that right and then we need to also know that we have dependent variables and the concept of independent variables right and our aim here is to pretty much understand how our dependent variable will change with respect to one independent variable so we have a couple of dependent variable with, you know goes hand-in-hand with all the variability call as the independent variable and then we need to understand what are the changes that goes into these dependent variables when they are mapped across and compared or with respect to our independent variable says just to make sure that you guys are getting the concept out here here's a very simple example showing you the same so again here our independent variable in our particular cases let's say our gender of the student we have a girl and a boy here the dependent variable can be the outcome of the educational qualification of these Students so let's say if the student either passed an examination or fail an examination this becomes our dependent variable so the independent variable is our gender the dependent variable becomes the output of what the student is trying to do and at the end of it what we're trying to do is basically trying to determine whether the student would pass the exam or not based on the person's gender let's say we're doing a survey where we need to find out how many girls have passed or how many boys have passed here again the gender becomes the independent variable and all of that depending on it in our particular cases the outcome the paths of the fail becomes dependent right so here again we trying to find out if the student would pass based on the gender or not so the dependent variable would pretty much here be again now as I've already been mentioning it's going to be the outcome and the independent variable is going to be the gender guys so do we have anything more in terms of supervised learning well yes guys here is more classification with respect to supervised learning as we have for something called as classification and something called as the regression let us quickly check out water regression is and then we can come talk about classification guys well regression is a type of supervised learning where the output variable is a continuous numeric value to what do we mean by a continuous numeric value right so let me again take another quick example to make sure you guys understand this better.
I've images of two apples for you guys one Apple cost four dollars the other Apple costs are three dollars here the output variable is the cost of the Apple it is a numeric value which is a nice value you can predict it right is the Apple ripe if it's yes then its costly if it's not yet ripe then it's cheap well is it or Shimla Apple as a Kashmiri Apple is it a Washington Apple well you can you can pretty much go on adding so many factors around this Apple and then come up with one particular outcome out of it which would be the price right so the price depends on all of these factors and in our case the price is the output variable so we're trying to predict the cost of the apple with respect to all these other factors right so again doing this in a real-world or in a mathematical situation and in this situation pretty much we call it as a regression guys a quick info guys if you're looking to get Certified in data science intellipaat provides the data science a certification training courses do check out our website for more information let's continue the session so with respect to regression again there is another type of regression which what we call it as the logistic regression and this is basically just a technique you know where our dependent variable instead of it being a country it's numerical value it is a categorical value guys so again what do we mean by this time for an example if you can take a look at the example on your screen right now what we're trying to do is we're trying to predict whether or if it's gonna rain on that particular day or not and this is being done with respect to two independent variables right so how do we check rain again pretty much it's usually done by checking the temperature or checking the humidity and if all of this is good we probably just go out take a look at the sky or to check for clouds and so much more right and you're coming back to logistic regression the dependent variable is the categorical variable right so it can have only two values a categorical variable can only have two values it is mostly binary guys so it is going to be either zero or it's gonna be one and in this logistic regression model what we call it depending on all of these attributes or we get the probability our final answer is going to be either yes or no right so if you ask someone a question is it gonna rain their answer might be either a yes or a no right so it's a binary answer again here it's the same as well again so - pretty much - graph out what it would look like we have an s-shaped curve out of this model what we call as the logistic regression case so on the Left we have a linear relationship between our dependent variables and the independent variables and it's just a straight line on the right since it's a binary value by the outcome that we were looking at the curve looks like an S so again guys take a moment pretty much pause on this slide to understand what a linear regression graph looks like versus what a logistic regression graph looks like so on that note let us quickly come back to check out the next subdivision under supervised learning which is called as classification guys oh you pretty much as the name suggests you might already know what classification means in literal terms well again classification here.
The output variable is categorical in nature so again it's going to be a binary value so you can just have a have a look at the picture on your screen and then we can categorically analyze if that person is a male or a female right so here the buyer your outcome is again the gender of the person if the person's either a man or a woman and then again the output variable is the gender of the person which is a categorical value and we are trying to classify this person into a specific gender or based on all the other factors as well well how do we know it well we could see the beard on the face it looks like a man so our brain pretty much told us it as a man right simple as that so on that note of we've pretty much checked out what supervised learning is so what is unsupervised learning well guys in unsupervised learning or all of the algorithms that we have right we have input data which has no labels so when we mean that we the data does not have any labels then there is nothing that the Machine can map to understand the data offhand very easily so if we can take a look at the raw data ourselves right so we can probably tell that it there's a couple of fishes in there there's a couple of birds in there well we know it because we have trained ourselves for that when the machine sees this there's not gonna be any label which is going to tell that this is a fish or this is a bird so our unsupervised learning algorithm is pretty much going to run through this again and at the end of it with respect to clustering what we call is the process of clustering it's going to divide all the fishes for us divide all the birds for us on its on so here the input data has no input labels has no class labels and it doesn't know what's a fish what's a bird right so again building a supervised or unsupervised model on top of this input data is again very interesting and very fun guys so here again is going to pretty much be giving out two clusters first consists of all the fishes and second consists of all the birds guys so coming to clustering which is again a major part of unsupervised learning the most important clustering algorithm the most simple one is the k-means clustering guys well k-means clustering again is an unsupervised machine learning algorithm where the aim is to pretty much go about grouping all the similar data points just like fishes and birds and making it to do one cluster race so again there must be already high I know intra cluster similarity and low inter cluster similarity out here right so what do we mean by that well all the data points you know within a cluster should be as similar as possible and all the data points in between two different clusters must be as different as possible so all the data in one cluster is simple and similar all the data when you compare two different clusters are very different to each other right so this is pretty much the k-means clustering in just a sentence guys well what is the K stand for on the k-means clustering right well k is the number of clusters that you just want the outcome to be in a particular case we have close to A cluster B and cluster C so the K value here is three because we have three different clusters right very very very simple as that guys so on that note the next type of learning that happens is what we call as the reinforcement learning guys again in reinforcement learning or there is something called as an agent and this agent pretty much runs up and returns up most effective actions for us by mapping its state at every single moment guys so to give you a better clarity just so I I hope you guys have played pac-man in your raw in your olden days guys so in this particular video game the space around or around the figure should what we call as a 2d game space again you have all you have something called is packed dots you have enemies you have walls and so much more right so the action here is to again just pretty much more around and make sure you don't bad guys and just finish your entire goal here how do you know what the who the good guys are and where you need to move and how you you're not supposed to you know get out every single time right so that particular thing you've been playing this game for a while or let's say you've been playing this game for a couple of hours couple of days in your childhood and then you realize how the game actually works well that exactly is reinforcement learning guys again to give you another example reinforcement learning is pretty much how a dog or a cat has trained in its real life as well if the dog does something right if the dog has given a handshake let's say we're training a dog to give a handshake and then if the dog is given a handshake you might see that the trainer just feeds a biscuit that instant right so the dog knows that the outcome of giving and a handshake is pretty much the right thing to do because there is a biscuit at the end of it so the reward is being hunted by the animal right so again to put it all in one single picture this would or reinforcement learning environment would look like I guess so we have an agent who performs an action in an environment and then here we can actually have two tracks where it if the agent does it right if the task is being performed right there is a reward with respect to it and everyone's happy yeah else if you do not have that particular reward then it means that something went wrong and this will have a state because something went wrong you're eventually not getting the reward let's say the dog did not give you a handshake or if you pretty much give it a biscuit at that moment it will not realize if it's doing the right thing or the wrong thing right so that we can have a state of let's say the dog did not give a handshake and that's pretty much what st means guys a reward is RP and this keeps on going in a nitration where you're just training your model better and better and better to hunt more rewards the more the rewards then the machine is doing the right thing it's as simple as that case so all that note I have two very simple demos which are in Python that I just quickly want to run it by you guys to tell you the use of machine learning algorithms anyway also on that note let me quickly jump into Google collab a quick info guys if you're looking to get so defined in data science intially path provides data science certification training courses do check out our website for more information.
Let's continue the session google collab is basically a Python or Jupiter notebook hosted on the Google cloud and I use this for most of my Python coding as well so anyway coming back to it here's the here's the first example that we'd like to discuss with you guys well just give me a second the runtime is being connected so it's almost connected now it's initializing and then it's gonna say connected any minute time and there it is so first let us take out a k-means clustering demo right so pretty much we're gonna import a couple of packages such as numpy pandas we have matplotlib to pretty much give us the output in terms of graphs we have SK learn to pretty much import of what we have the sub library called as the k-means library and then go on working with it so let me quickly import all of these libraries that we'll be making use of and then go ahead with that so to generate a data of our own instead of just picking it up from any data set for this particular case we'll be making our own data using something called us make underscore blobs case so we'll have 300 samples here and then we'll have four clusters each so this is what we mean.
Zen and disco samples is $300 we have 300 dots on your screen right now and these dots are divided pretty much into four clusters for us so let us use something called as the elbow method or we're pretty much it's called as W CSS I would recommend you guys pretty much google it what would if you want to know what W CSS means it does again a very complex part of the k-means algorithm and and i would just suggest you guys to check it out on your own because it is not on the scope of this particular tutorial and then so we'll be using that particular method and we're gonna tree in the entire model for us or to make it understand what's going on so look at this right so what does the optimal number of clusters again for us is somewhere around or say 3 or 4 as well so we have 4 clusters and we have the WCS s all the way from 2500 or till 0 right so we're gonna have to categorize this is just a graph to tell us what the data might look like right so we need to find out the centroid of what we call as the centroid in our k-means clustering algorithm of each different cluster and then we need to mark that Center right so this is exactly the red dot what you see is again exactly what's going on then so if pretty much found out that there are four clock clusters that exist and then we've pretty much mark the centroid of the of the four different clusters that you see are using k-means clustering guys it's as simple as that so that was a very simple first demo right for a second scenario I will be checking out our logistic regression and in this particular case we'll be going on to predict a heart disease prediction data set and we'll be performing our machine learning algorithms and we'll be using machine learning here to predict if a person is gonna have a heart disease or not and we're gonna be doing this entirely using the process of logistic regression guys again we're importing a couple of libraries here pandas to handle the data on numpy 200 mathematical operations.
Skype right to go on to do our computations then we have matplotlib and Seabourn - pretty much to give us visualizations and we have SK learn which is a sky kick learn which is again a very important machine learning library of Python and we're gonna import all of these guys so just before that we need oh we need the data set file which is called as the framingham data site well the data set is from the town of Framingham in Massachusetts so let me just quickly you know import the file which is called as the Framingham dataset and then we can pretty much go on to working over that guys so you know it's gonna take a second to pretty much get uploaded it's a small file and as you can see it's been uploaded so now I can go out to pretty much run this code where this is what our dataset would look like oh if it's a binary value for mail it means if it's mail equal to one then the person's mail if mail equal to zero it means the person's if email there it has the age it has if the person is a if the person is a current smoker or not and how many cigarettes per day do you have PP Mandic BB medications and their blood pressure basically and then have you had a stroke in your life are you diabetic what is your total cholesterol what is your systolic blood pressure what is your diastolic blood pressure what is your body mass index what's your heart rate what's the glucose that and then it's not check your or CHT as well and so much more so this isn't a me using data said to work with and pretty much we're gonna be just replacing the column of mail by sections command that's about it what we're doing here and then we need to find out how many missing values we have in this particular data set and there are so many values with zeros in it right so we have a about 388 missing values when it comes to glucose 50 missing values when it comes to cholesterol and so much more so let us go on to you know remove all of these missing values and say hey look it found pretty much about 500 or total number of rows with missing values right and it's fine in our particular case because it's only 12% of the entire data set so we can exclude that and we can pretty much drop it and you know it wouldn't hurt our analysis at the end of it so to begin with you have to perform some exploratory analysis where we need to show what the data is being distributed like I mean we just hunt into our data to find out what the data is telling us right so here's a couple of for quick charts which pretty much give us all of our numerical data with respect to graph so we have again the sex distribution we have the age distribution current smokers BP medications distribution cigarettes per day up again our diabetics total cholesterol is BMI systolic blood pressure the weekend diastolic blood pressure and so much more right so we're just pretty much performing some quick exploratory analysis analytics on it and then are they gonna be going about to find out what the actual this is just a 10-year raw CHD that i'm printing out and then we need to go about finding out if the person has a rate you know has a chance of forgetting a heart disease or not well here we can check out the count right so there are about 500 let's say 600 people who are in the risk of getting a heart disease while there are about 3,500 or let's say 4,000 people who are healthy and quite well this is what exploratory analysis you know pretty much helps us to do it gives us a sort of an analytics number where it can find out of the person might you know suffer from our heart disease or so in the near future and so much more right so let us quickly you know go about plotting that and we can go out from that well as you guys could see that pretty much took about a minute of processing because it has to plot so many values for us right I'm sorry let me quickly scroll down so we can get a better view again this is respective this is a seaborne access grid plot and then you can see all the concentration of all the values at every particular instant right this is for every single aspect that we are using to compare so let us quickly use describe to pretty much tell us what we're just looking at and yeah so we have a count of about three thousand seven fifty one males thieves it's gonna give you the age of so many people it's gonna give you all the cigarettes BB Mets prevail and stroke and so much more right so coming to the process of logistic regression out here from all these data set we need to make we need to have an inference at the end of it right so to do that we pretty much be running a couple of functions one of those functions is lambda function and then we can have this very nicely optimized output printed for us and then as you can check out as it already says the tenure or CH D is pretty much our dependent variable will be using logistic regression so much more right so it's going to give you all the standard errors all the values of we call it the Z method it's going to be the Z method value it's gonna check if your probability of your outcome is greater than or the value of Z with respect to all of these single categorical variables that were checking and then when it comes to backward elimination will pretty much be using our off each of selection to go about doing it and the end of it we can have a summary very nice looking somebody printed for us oh well again the somebody looks nice right so we need to make more sense out of it such that okay this is the odds this is the ratio around so here we have something called as the p-values we have the odds ratio and the CI 95% value is out here so here we can pretty much go on to analyze what actually causes or you know the the outcome of let's say our heart disease and so where we can make sense out of it to use our model to make sense out of us let's quickly split our row one single dataset into a training data set and our testing dataset and let us make our model give us the answer for us right so checking out model accuracy using our raw skycat law library again you can pretty much find out that our model is almost accurate for about 90 percent right so eighty-eight point one four percent is a big number and it's been training well not for many times right so the number of high iterations again is very less so here's our subplot is what we call as an access subplot and here as well you can pretty much check out the actual predicted outcome values which is predicted one predictor zero the actual outcome values is this color while the actual values blue color right so the color distribution here again will let you know if what's going on there as well well here is another step to pretty much print out what's you know what's a true or true positive rate of the data true negative date of the data and so much more to put it all into one single print statement to make it sure it looks very nicely.
The accuracy of our entire model is about 88% the miss classification is pretty much 1 - so what the accuracy is right so we've missed about 11 percent of accuracy true positive rates we are somewhere about 4 percent - negative rates we have somewhere around 99 percent positive prediction rate is 80 percent negative prediction rate is somewhere around 88 percent and so much more right so look at this amount of data look at this amount of data that our machine learning algorithm is up is pretty much giving us right so if you put it literally you know in terms of for use cases in terms of medicine then this is going to help a lot of people right so that was a quick walk through you know pretty much on how you can go about using gain means clustering and logistic regression algorithm sketch all right guys.
I hope this video is helpful to you if you have any further queries do let us know in the comment section below we'll reach out to you immediately so guys thank you so much for watching this video and giving us your precious time.
Intellipaat provides data science certification training courses for more details you can check out the description without much further delay let's get started what do you think is the need for something called as an algorithm well consider this situation right so let's say you're either baking a cake or you're driving your car you're even walking or singing well your body is continuously oh you know executing the set of steps that you have already trained it to do and then this is what we call as an algorithm so basically when you when you driving your car your brain is already programmed to do all the tasks that are required to pretty much help you to you don't drive your car and then when you're walking as well how do you maintain balance well as a kid if you could realize that maintaining your balance as a toddler was very difficult but then you trained yourself every day and then now you can walk very easily right so this process which involves learning and then this repetitive process is again pretty much can be termed as an algorithm as well guys well if you've been wondering if algorithms are new concepts well they're not algorithms have been used for decades together well back to this person on the screen called Alan Turing this person appears a good fact this person was the reason probably why World War two ended he was the one who decoded the very famous encrypted enigma messages from Germany and then this person decoded that and then and all the code breakers and so much more right so the entire point here is to tell you that algorithms have been used as an age-old tradition that's being used and these days we've been pushing it to our computer science field as well and then making sure that we make full use out of it guys and then again why would we require it well think of the huge amount of data that's being generated these days and then think of the methods that we'd need to process it to understand the data or to process the data and then to you know pretty much clean up the data and work with it right so for all of these we have something called as algorithms guys so on that note what are algorithms what is the formal definition of an algorithm well guys algorithms are as simple as this they are just a set of rules or you can call them as processes as well to be followed in calculations or any other problem-solving operations when done by a computer well house how simple is that well this is exactly what an algorithm means well you have a turn a symbol on the left hand side that is pretty much what a flowchart looks like as well or don't worry you'll just be checking out the flowchart sections in the next set of this slide but then right now I want to tell you guys it you guys are using algorithm as is well step one you're looking at your screen while you've programmed yourself to look at the screen that's an algorithm and YouTube is running a recommendation algorithms where you just aw let's say you search for something Python tutorials or anything for that matter right intellipaat videos are up there so how does YouTube know that you know it should recommend intellipaat's videos to its learners well again an algorithm is being said there and every time you check your mail you mails a filter in your inbox or in your spam folder and so much more so how does do a Google or gmail know what or what mail is a spam mail what mail is not a spam mail right so that again is an algorithm right there and no matter what operating system you're on Windows right now or let's say iOS let's say Mac OS Android whatever right so all these operating systems are using algorithms right now and on that note we let's quickly break it down into simple terms and check out the relationship between a pseudocode and a flowchart guys a quick info guys if you're looking to get so defined in data science Intellipaat provides the data science certification training courses do check out our website for more information let's continue the session so here it is a very simple piece of code for you guys this is what we call a pseudocode or a pseudocode is almost a high level language code it just looks a little very literal and then you can figure out what the code is doing even though you might not be a native programmer so once the code part of it and the other is what we call as the algorithm which the flowchart alongside it so pretty much we're inputting a single variable a putting the value 10 to it we're inputting our variable B or putting a value of 20 to it we're adding it pretty much so C will have the value 30 right now right a plus B is 10 plus 20 and then we out putting that the word start and stop again are a part of the pseudocode flowchart relationship and on the right side if it can just take a look this is what the flowchart of this exact pseudocode will look like guys well this was very simple so let me quickly step it up one single notch you know where we can go about checking another pseudocode flowchart relationship guys so here again we're inputting a inputting B and then we're making sure that until a becomes equal to B will be printing all the values from A to B and then we're gonna be increasing a by one so right now a is 10 it's gonna check if you know 10 is equal to 20 it's not so until 10 becomes 20 we're gonna start printing out everything so the answer is going to be 10 11 12 13 14 all the way until 20 and this is going on around an iteration in a loop if you can figure out the diamond box is called as the decision box where it has two tracks one is one can be your true/false strike or a yes/no track and in this particular case we have the yes/no track here guys so on that note we need to understand why we would require all these algorithms in machine learning right so before that why would we even require machine learning well guys again the machine learning definition can pretty much can be given you know to the world as the ability for a machine to learn something without it being programmed for that particular thing well how cool is that it is again basically the field of study where computers use a massive amount of data and they apply all of these algorithms were training themselves how here's the keyword training themselves and again making predictions on that right so again training in machine learning entails feeding a lot of data into the algorithm and allowing the machine itself to learn more about the process information well you're gonna just tell the Machine a lot of basics probably or just show it one iteration where the Machine pretty much goes on to figure out say 9 or 10 more iterations on its own it's gonna learn on its own it's kind of process on its own and pretty much you know you can work with that data later on right so again we can call this a process of converting just raw data into useful information as but then we're doing it with the help of these algorithms that we're about to learn guys so on that note or we need to check out what the types of machine learning are so we have three main types of learning which happens when we talk about machine learning guys it's supervised learning it's unsupervised learning and it's reinforcement learning guys so if I were you guys I would just suggest I would just suggest you guys just take a minute pause on the slide to note these three types of machine learning guys supervised learning unsupervised learning and reinforced learning if you're already familiar with the concepts or if you think that you got it in the bag well let's more to check out what supervised learning actually means oh well supervised learning as the name suggests requires some sort of supervision right let us talk in terms of variables so we can understand it easily again in super wise machine learning algorithms let's say we have input variables and our output variables these input variables are denoted by X and the output variables are denoted by Y so X is input.
Y is output the goal of any supervised learning system is to understand how your output variable Y changes with respect to the change made in terms of X guys so how does the output variable Y vary when we go about playing with our input variable X is pretty much the goal of for supervised learning system guys and then here will also be approximating the mapping function or to a point where we'll have new input data coming in which we haven't seen which the machine hasn't seen and then we can predict new output variables.
Y with respect to all the new data the new X data that the machine just saw so we have pre ended for a particular amount of X's and then it saw a new amount of data a new amount of input variables and then it trains itself to pretty much give us new a Y output value guys so how cool is that right and then we need to also know that we have dependent variables and the concept of independent variables right and our aim here is to pretty much understand how our dependent variable will change with respect to one independent variable so we have a couple of dependent variable with, you know goes hand-in-hand with all the variability call as the independent variable and then we need to understand what are the changes that goes into these dependent variables when they are mapped across and compared or with respect to our independent variable says just to make sure that you guys are getting the concept out here here's a very simple example showing you the same so again here our independent variable in our particular cases let's say our gender of the student we have a girl and a boy here the dependent variable can be the outcome of the educational qualification of these Students so let's say if the student either passed an examination or fail an examination this becomes our dependent variable so the independent variable is our gender the dependent variable becomes the output of what the student is trying to do and at the end of it what we're trying to do is basically trying to determine whether the student would pass the exam or not based on the person's gender let's say we're doing a survey where we need to find out how many girls have passed or how many boys have passed here again the gender becomes the independent variable and all of that depending on it in our particular cases the outcome the paths of the fail becomes dependent right so here again we trying to find out if the student would pass based on the gender or not so the dependent variable would pretty much here be again now as I've already been mentioning it's going to be the outcome and the independent variable is going to be the gender guys so do we have anything more in terms of supervised learning well yes guys here is more classification with respect to supervised learning as we have for something called as classification and something called as the regression let us quickly check out water regression is and then we can come talk about classification guys well regression is a type of supervised learning where the output variable is a continuous numeric value to what do we mean by a continuous numeric value right so let me again take another quick example to make sure you guys understand this better.
I've images of two apples for you guys one Apple cost four dollars the other Apple costs are three dollars here the output variable is the cost of the Apple it is a numeric value which is a nice value you can predict it right is the Apple ripe if it's yes then its costly if it's not yet ripe then it's cheap well is it or Shimla Apple as a Kashmiri Apple is it a Washington Apple well you can you can pretty much go on adding so many factors around this Apple and then come up with one particular outcome out of it which would be the price right so the price depends on all of these factors and in our case the price is the output variable so we're trying to predict the cost of the apple with respect to all these other factors right so again doing this in a real-world or in a mathematical situation and in this situation pretty much we call it as a regression guys a quick info guys if you're looking to get Certified in data science intellipaat provides the data science a certification training courses do check out our website for more information let's continue the session so with respect to regression again there is another type of regression which what we call it as the logistic regression and this is basically just a technique you know where our dependent variable instead of it being a country it's numerical value it is a categorical value guys so again what do we mean by this time for an example if you can take a look at the example on your screen right now what we're trying to do is we're trying to predict whether or if it's gonna rain on that particular day or not and this is being done with respect to two independent variables right so how do we check rain again pretty much it's usually done by checking the temperature or checking the humidity and if all of this is good we probably just go out take a look at the sky or to check for clouds and so much more right and you're coming back to logistic regression the dependent variable is the categorical variable right so it can have only two values a categorical variable can only have two values it is mostly binary guys so it is going to be either zero or it's gonna be one and in this logistic regression model what we call it depending on all of these attributes or we get the probability our final answer is going to be either yes or no right so if you ask someone a question is it gonna rain their answer might be either a yes or a no right so it's a binary answer again here it's the same as well again so - pretty much - graph out what it would look like we have an s-shaped curve out of this model what we call as the logistic regression case so on the Left we have a linear relationship between our dependent variables and the independent variables and it's just a straight line on the right since it's a binary value by the outcome that we were looking at the curve looks like an S so again guys take a moment pretty much pause on this slide to understand what a linear regression graph looks like versus what a logistic regression graph looks like so on that note let us quickly come back to check out the next subdivision under supervised learning which is called as classification guys oh you pretty much as the name suggests you might already know what classification means in literal terms well again classification here.
The output variable is categorical in nature so again it's going to be a binary value so you can just have a have a look at the picture on your screen and then we can categorically analyze if that person is a male or a female right so here the buyer your outcome is again the gender of the person if the person's either a man or a woman and then again the output variable is the gender of the person which is a categorical value and we are trying to classify this person into a specific gender or based on all the other factors as well well how do we know it well we could see the beard on the face it looks like a man so our brain pretty much told us it as a man right simple as that so on that note of we've pretty much checked out what supervised learning is so what is unsupervised learning well guys in unsupervised learning or all of the algorithms that we have right we have input data which has no labels so when we mean that we the data does not have any labels then there is nothing that the Machine can map to understand the data offhand very easily so if we can take a look at the raw data ourselves right so we can probably tell that it there's a couple of fishes in there there's a couple of birds in there well we know it because we have trained ourselves for that when the machine sees this there's not gonna be any label which is going to tell that this is a fish or this is a bird so our unsupervised learning algorithm is pretty much going to run through this again and at the end of it with respect to clustering what we call is the process of clustering it's going to divide all the fishes for us divide all the birds for us on its on so here the input data has no input labels has no class labels and it doesn't know what's a fish what's a bird right so again building a supervised or unsupervised model on top of this input data is again very interesting and very fun guys so here again is going to pretty much be giving out two clusters first consists of all the fishes and second consists of all the birds guys so coming to clustering which is again a major part of unsupervised learning the most important clustering algorithm the most simple one is the k-means clustering guys well k-means clustering again is an unsupervised machine learning algorithm where the aim is to pretty much go about grouping all the similar data points just like fishes and birds and making it to do one cluster race so again there must be already high I know intra cluster similarity and low inter cluster similarity out here right so what do we mean by that well all the data points you know within a cluster should be as similar as possible and all the data points in between two different clusters must be as different as possible so all the data in one cluster is simple and similar all the data when you compare two different clusters are very different to each other right so this is pretty much the k-means clustering in just a sentence guys well what is the K stand for on the k-means clustering right well k is the number of clusters that you just want the outcome to be in a particular case we have close to A cluster B and cluster C so the K value here is three because we have three different clusters right very very very simple as that guys so on that note the next type of learning that happens is what we call as the reinforcement learning guys again in reinforcement learning or there is something called as an agent and this agent pretty much runs up and returns up most effective actions for us by mapping its state at every single moment guys so to give you a better clarity just so I I hope you guys have played pac-man in your raw in your olden days guys so in this particular video game the space around or around the figure should what we call as a 2d game space again you have all you have something called is packed dots you have enemies you have walls and so much more right so the action here is to again just pretty much more around and make sure you don't bad guys and just finish your entire goal here how do you know what the who the good guys are and where you need to move and how you you're not supposed to you know get out every single time right so that particular thing you've been playing this game for a while or let's say you've been playing this game for a couple of hours couple of days in your childhood and then you realize how the game actually works well that exactly is reinforcement learning guys again to give you another example reinforcement learning is pretty much how a dog or a cat has trained in its real life as well if the dog does something right if the dog has given a handshake let's say we're training a dog to give a handshake and then if the dog is given a handshake you might see that the trainer just feeds a biscuit that instant right so the dog knows that the outcome of giving and a handshake is pretty much the right thing to do because there is a biscuit at the end of it so the reward is being hunted by the animal right so again to put it all in one single picture this would or reinforcement learning environment would look like I guess so we have an agent who performs an action in an environment and then here we can actually have two tracks where it if the agent does it right if the task is being performed right there is a reward with respect to it and everyone's happy yeah else if you do not have that particular reward then it means that something went wrong and this will have a state because something went wrong you're eventually not getting the reward let's say the dog did not give you a handshake or if you pretty much give it a biscuit at that moment it will not realize if it's doing the right thing or the wrong thing right so that we can have a state of let's say the dog did not give a handshake and that's pretty much what st means guys a reward is RP and this keeps on going in a nitration where you're just training your model better and better and better to hunt more rewards the more the rewards then the machine is doing the right thing it's as simple as that case so all that note I have two very simple demos which are in Python that I just quickly want to run it by you guys to tell you the use of machine learning algorithms anyway also on that note let me quickly jump into Google collab a quick info guys if you're looking to get so defined in data science intially path provides data science certification training courses do check out our website for more information.
Let's continue the session google collab is basically a Python or Jupiter notebook hosted on the Google cloud and I use this for most of my Python coding as well so anyway coming back to it here's the here's the first example that we'd like to discuss with you guys well just give me a second the runtime is being connected so it's almost connected now it's initializing and then it's gonna say connected any minute time and there it is so first let us take out a k-means clustering demo right so pretty much we're gonna import a couple of packages such as numpy pandas we have matplotlib to pretty much give us the output in terms of graphs we have SK learn to pretty much import of what we have the sub library called as the k-means library and then go on working with it so let me quickly import all of these libraries that we'll be making use of and then go ahead with that so to generate a data of our own instead of just picking it up from any data set for this particular case we'll be making our own data using something called us make underscore blobs case so we'll have 300 samples here and then we'll have four clusters each so this is what we mean.
Zen and disco samples is $300 we have 300 dots on your screen right now and these dots are divided pretty much into four clusters for us so let us use something called as the elbow method or we're pretty much it's called as W CSS I would recommend you guys pretty much google it what would if you want to know what W CSS means it does again a very complex part of the k-means algorithm and and i would just suggest you guys to check it out on your own because it is not on the scope of this particular tutorial and then so we'll be using that particular method and we're gonna tree in the entire model for us or to make it understand what's going on so look at this right so what does the optimal number of clusters again for us is somewhere around or say 3 or 4 as well so we have 4 clusters and we have the WCS s all the way from 2500 or till 0 right so we're gonna have to categorize this is just a graph to tell us what the data might look like right so we need to find out the centroid of what we call as the centroid in our k-means clustering algorithm of each different cluster and then we need to mark that Center right so this is exactly the red dot what you see is again exactly what's going on then so if pretty much found out that there are four clock clusters that exist and then we've pretty much mark the centroid of the of the four different clusters that you see are using k-means clustering guys it's as simple as that so that was a very simple first demo right for a second scenario I will be checking out our logistic regression and in this particular case we'll be going on to predict a heart disease prediction data set and we'll be performing our machine learning algorithms and we'll be using machine learning here to predict if a person is gonna have a heart disease or not and we're gonna be doing this entirely using the process of logistic regression guys again we're importing a couple of libraries here pandas to handle the data on numpy 200 mathematical operations.
Skype right to go on to do our computations then we have matplotlib and Seabourn - pretty much to give us visualizations and we have SK learn which is a sky kick learn which is again a very important machine learning library of Python and we're gonna import all of these guys so just before that we need oh we need the data set file which is called as the framingham data site well the data set is from the town of Framingham in Massachusetts so let me just quickly you know import the file which is called as the Framingham dataset and then we can pretty much go on to working over that guys so you know it's gonna take a second to pretty much get uploaded it's a small file and as you can see it's been uploaded so now I can go out to pretty much run this code where this is what our dataset would look like oh if it's a binary value for mail it means if it's mail equal to one then the person's mail if mail equal to zero it means the person's if email there it has the age it has if the person is a if the person is a current smoker or not and how many cigarettes per day do you have PP Mandic BB medications and their blood pressure basically and then have you had a stroke in your life are you diabetic what is your total cholesterol what is your systolic blood pressure what is your diastolic blood pressure what is your body mass index what's your heart rate what's the glucose that and then it's not check your or CHT as well and so much more so this isn't a me using data said to work with and pretty much we're gonna be just replacing the column of mail by sections command that's about it what we're doing here and then we need to find out how many missing values we have in this particular data set and there are so many values with zeros in it right so we have a about 388 missing values when it comes to glucose 50 missing values when it comes to cholesterol and so much more so let us go on to you know remove all of these missing values and say hey look it found pretty much about 500 or total number of rows with missing values right and it's fine in our particular case because it's only 12% of the entire data set so we can exclude that and we can pretty much drop it and you know it wouldn't hurt our analysis at the end of it so to begin with you have to perform some exploratory analysis where we need to show what the data is being distributed like I mean we just hunt into our data to find out what the data is telling us right so here's a couple of for quick charts which pretty much give us all of our numerical data with respect to graph so we have again the sex distribution we have the age distribution current smokers BP medications distribution cigarettes per day up again our diabetics total cholesterol is BMI systolic blood pressure the weekend diastolic blood pressure and so much more right so we're just pretty much performing some quick exploratory analysis analytics on it and then are they gonna be going about to find out what the actual this is just a 10-year raw CHD that i'm printing out and then we need to go about finding out if the person has a rate you know has a chance of forgetting a heart disease or not well here we can check out the count right so there are about 500 let's say 600 people who are in the risk of getting a heart disease while there are about 3,500 or let's say 4,000 people who are healthy and quite well this is what exploratory analysis you know pretty much helps us to do it gives us a sort of an analytics number where it can find out of the person might you know suffer from our heart disease or so in the near future and so much more right so let us quickly you know go about plotting that and we can go out from that well as you guys could see that pretty much took about a minute of processing because it has to plot so many values for us right I'm sorry let me quickly scroll down so we can get a better view again this is respective this is a seaborne access grid plot and then you can see all the concentration of all the values at every particular instant right this is for every single aspect that we are using to compare so let us quickly use describe to pretty much tell us what we're just looking at and yeah so we have a count of about three thousand seven fifty one males thieves it's gonna give you the age of so many people it's gonna give you all the cigarettes BB Mets prevail and stroke and so much more right so coming to the process of logistic regression out here from all these data set we need to make we need to have an inference at the end of it right so to do that we pretty much be running a couple of functions one of those functions is lambda function and then we can have this very nicely optimized output printed for us and then as you can check out as it already says the tenure or CH D is pretty much our dependent variable will be using logistic regression so much more right so it's going to give you all the standard errors all the values of we call it the Z method it's going to be the Z method value it's gonna check if your probability of your outcome is greater than or the value of Z with respect to all of these single categorical variables that were checking and then when it comes to backward elimination will pretty much be using our off each of selection to go about doing it and the end of it we can have a summary very nice looking somebody printed for us oh well again the somebody looks nice right so we need to make more sense out of it such that okay this is the odds this is the ratio around so here we have something called as the p-values we have the odds ratio and the CI 95% value is out here so here we can pretty much go on to analyze what actually causes or you know the the outcome of let's say our heart disease and so where we can make sense out of it to use our model to make sense out of us let's quickly split our row one single dataset into a training data set and our testing dataset and let us make our model give us the answer for us right so checking out model accuracy using our raw skycat law library again you can pretty much find out that our model is almost accurate for about 90 percent right so eighty-eight point one four percent is a big number and it's been training well not for many times right so the number of high iterations again is very less so here's our subplot is what we call as an access subplot and here as well you can pretty much check out the actual predicted outcome values which is predicted one predictor zero the actual outcome values is this color while the actual values blue color right so the color distribution here again will let you know if what's going on there as well well here is another step to pretty much print out what's you know what's a true or true positive rate of the data true negative date of the data and so much more to put it all into one single print statement to make it sure it looks very nicely.
The accuracy of our entire model is about 88% the miss classification is pretty much 1 - so what the accuracy is right so we've missed about 11 percent of accuracy true positive rates we are somewhere about 4 percent - negative rates we have somewhere around 99 percent positive prediction rate is 80 percent negative prediction rate is somewhere around 88 percent and so much more right so look at this amount of data look at this amount of data that our machine learning algorithm is up is pretty much giving us right so if you put it literally you know in terms of for use cases in terms of medicine then this is going to help a lot of people right so that was a quick walk through you know pretty much on how you can go about using gain means clustering and logistic regression algorithm sketch all right guys.
I hope this video is helpful to you if you have any further queries do let us know in the comment section below we'll reach out to you immediately so guys thank you so much for watching this video and giving us your precious time.