Linear Regression from Scratch in C++ (2020)

In your regression so today you can learn this simple machine learning tool to fit line given some data points though you may have guessed this we have a bunch of points with you do things. We're trying to find a trend. That's this different way. You can do it you can like if you biopark it. You just take one point that looks like in the middle and then at the end of it let us find it looking like he's in the middle any fitted what if you have like tons and tons of data like this and you just want to fit it once without even looking at a beta you need to programmatically do it. We're going to do that in C++ so in a regression what do we have we have this. This is the equation and here we have intercept. We have the slope right. Basically with data points bunch of x and y's want to try to find those two that's basically it so how we do this which i to minimize a cost function. So we're gonna try a bunch of lines and then it's gonna really suck anyone get tried to make it success. That's it so let's say we we fit a line like this and it's not that good anything like this. It's a bit better and we start moving the line until we hit the threshold where you have the best fitting line so to know what the best fitting right here. We use the cost function. So here this is it. We're trying to minimize. Write this thing which is the mean square. Pretty much is ringing square error. But what you have you have your predicted. I why with the real wise right you subtract those and you put this to enhance ^. Tube and then you sum all of those of your prediction all your data points and then you divide it by n. That's it just trying to minimize its value. So this is what we have. How do we fix or aides were always given this and we use gradient descent is something you will see quite often so tomato does everything to weights reduce the cross function. That's great error so that's pretty much it. How do we update it. Wait we take we go from the mean square error right and we're just using linear algebra basically we we substitute everything over here and calculate the gradient and it's the gradient that we're going to subtract basically so to do that we need to take that they vary and the first derivative of a 0 and a 1 of this function over here.

So if you fast forward this we get to here you should know. It's a linear algebra. Just know how you got there but that's not read the point enough so this is how we update this. We have a zero. The new is your equals the old one - this is a learning rate times two to the N and then this is the value interpret some and for a one it's more the same thing pretty multiplied by X. I we here. Why because that's it so we're gonna iterate many times until we get to that we'll get to the global minimum. Who are using to the next item. We know we're gonna get to something like this and it's super simple because we have. I computing library like not by but we're gonna do it in support see blah so it's more what I lower and level we're gonna get to. I'm doing awfully good right. This is something I did say. How do you know that's the line that we fit here is good enough so to do that. We can copy the are to score. The first point is the way that we read. How much mess it up. It's a coefficient of determination it. So um the formula the most general formula is this. So it's 1 minus the sum of square of the residual. Yep so just look at what they actually are and just read it a boost. So here the sum of square of residual is this so the actual true value minus what you predicted. Right sum them up you. You are good power of to sum all of them up. And that's it you get the sum square residual and for the total. It's almost the same thing to like. It's the true value minus mean to value and to and sum them up. And then this is your question of the generation so this is how this will tell us put doing good already put in worse so if we get to the clothes um so yeah the main here you have your X anyway this thing you do.

Is you vote on data so you read. CSV you read the final of this thing and this is what we're trying to it. It's just 1 0. This is my y axis this is the. Y's and we're just trying to fit with this. This should give us something like ye. O 1 plus 1 which is the line. I'm going to create increasing like this so we we just really fit the thing regression function at me. So you have it fix your wise and you have to put length everywhere because those are pointers systematic so though we market a bunch of stuff over here right with a weight in this prediction here these are ever learning rate. Yes the intercept doesn't matter what they are the beginning then just iterate one time when. I'm done to reach the way. But what do i do first. I fit given the X and some weights. I fit them and I get back. Why prediction right and then given that this is the the prediction is out here what do I do. I update the weights now given my wife addiction and the x and y's so the weights will be changed. I just happening here. It's like big waves bed in length this great so just that's it that's how. I fit now here. Never go to update here we are. How do we update the new is your equal easier - is Austin. I'm just done with you remember remember I do it. What do i do i do my weight. They were equal you - the multiplier which is this thing and times these some functions when. I believe we're here this is for this book so I do both of those things and that's it. There is nothing else so now if we look at what actually here a bunch of thing that are even from you. It's a mean square. Error didn't intercept some the slaps on slope some output potential that each other. If you look at them what is this bottom is literally this way. I'm take a quick this. Here's that and then this is the prediction - well would actually. Yeah that's all. I'm doing so you need to look at this book. I've put intercept 0 Joe I for all of the things it in theory stick with my residual and I sum them.

There is. This is same thing to the spoke. I found that as we will do a sum them and act and this if we look at mean squared error right well in addition to raises your summer square. That's waiting square. There is divided by the length. That's it if we look at. What is the residue of sum of square its residual must bear right. The mean square error is this divided by M. So that's it. You calculate your residue all over here. Yeah is this big code right now no this is bad way of. I think rhythm why because it's not a stage there's a lot of between to pass round that should be. There should become a go over both within this classifier class for instance and this is acting why they get fashion and also food. Not there's no added value to that so we're gonna refer to this and we're gonna. I'm gonna show you a new version of this which is a bit more cream okay so we did some clean up and now it's actually looking pretty good so we have the main over here. This is the variable initialization like before this is our regression variable before you the word they were in the function now. They're in the global in the enemy. And now this is how we train like you have a model which would give some data and then you make this model and after that you train it right and that is how you start the model and over here this is. I will tell ya next. I'm gonna try to predict what's there. Why should be giving this text and he was defending another stuff. But this is in your main and this is the class based model map so XY length weights are now mul wearable and this is the constructor right give you X or Y and your length and then you just gonna copy the X's and Y's I just change them and if you're doing anything jumping tomorrow and then over here we set up some weights and that's that this is the constructor this is in the structure and this is a helper function. Thanks wait this is where the thing happened. It's the same thing as before if you have way less cluttering horrible over here.

They are getting passed around the whole algorithm. Look leaner and yeah that's it and this is the product which is we're just returning intercept plus slope times the X and everything we used to actually make the calculation over here are not believed by the user or in the private section of the class. So that's it and in last things that need to be done is we take this thing. You ship it into another file and you never look at it again we need to tweak it all right so it's actually um see the output of this. Let's compile this thing. Thank You Gus and then let's just run this don't forget we should get at the end. Y equal 1 plus 1 times X because it's a line with an intercept of 1 the slope of 1. Look at that first situation we have this at 11 that means quite error because we were at 0 0 and then it goes down and then we could have stopped here of the 100 iteration but here when we get to when we run through the thousand get to 1 X plus 1 and this is actually good because the X was 123 and it should be 123 times 1 plus 1024. So it's a success perfect so I hope that was helpful and the code will be available on the github page and let me know if you have any question.