DAX Fridays! #135: Linear Regression in Power BI
So hello everybody how are you today this Friday. So it's time for another - bye this need. I explain tune every single Friday. Um in today's dice try days. Were actually going to continue with the statistical series and we're going to talk about linear regression very very very useful and very interesting ok so first of all linear regression is used to predict one variable based on the value of another one. Okay so based on the value of x we try to predict. Y for example the data I have now impervia is from my youtube channel the number of views and the number of subscribers and what. I would like to know if this I can predict the number of subscribers based on the number of views. And you're probably gonna say let's hope easily because you can see that channels that have a lot of you are going to have a lot of subscribers not always but that's most often the case but how about if we would have a formula it allow us to predict how much that would be based on the number of years. I'm based on you know previous performance so this what we're going to do today. I'm going to demonstrate that so to be able to see. The relationship between two variables is very very useful to plot them. Use them as scatter plot. So that's what we're going to do we're going to go here to power bi and I'm going to throw in views and subscribers in a scatter plot scatter plots are really really useful really useful and we want to have every single point plotted okay. We don't want to have any summarized data. We just want to see the point. So this is how the data looks like and as you can see has a specific pattern in this case has a line pattern so if we would try to fit the pattern of the with a line it would look more or less like this so that would be their line where most of the points are close to something like that and that is the regression line you can actually get that line in power bi from the analytic panes trend line. Let's put it in see how well we did. Oh good go. Do you see the black line in there.
So you can actually get the trend lines of regression line in power bi but you will not give you the formula so if you would like to predict future performance you would have to do it in paper for example so let me see if I would like to predict how many subscribers I would get if I would get 11,000 views so I would go in there. Let me grab the pan. You will go faster so I would go in there and I will extend this and there might not be necessary as this slope is not that big so here we have 8. That's about nine ten eleven thousand so I would have to go up here. I would have to extend my line and somewhere in here that would predict the number of subscribers or is about that. I don't know it's like mmm 90 or 85 subscribers or if I get it 11 thousand views I would get 95 verified subscribers. Okay but obviously we don't want to do it with a pen on paper. We want to be able to calculate future performance and for that you need the ink uation for that you need the formula so the line the formula for a line it is y equal MX plus B so Y in our case is the number of subscribers and X is the number of views and M we need to calculate based on that line is the slope and B is the y-intercept and y-intercept is. This is the y line you can see is the point at where the line process of y axis. I'll tell you what those things very important will tell us it once. We have calculated now right so now we're going to find out what is the QA team for that line for our lines reviews and subscribers and once we have that we will be able to predict future performance. How cool is that. Let's see delete what we see in the screen. So here is the the formulas for linear regression are all over the internet. You're going to get this file for downloads. You will have everything in there okay. So you're going to get all the calculations. I'm going to go very quickly with them because I don't want to bore you by writing everything you know. So let's see here. We have the views the subscribers.
Would you need to calculate these views 10 subscribers then you need to calculate view squared subscribers squared and then you need to calculate this sum of all of those so the sum of views the sum of subscribers the sum of the x squared the sum of the Y squared and once you have that you know the number of points in your data set and once you have that then you calculate the intercept which is calculated. Here you're going to get the value would get these easy busy and then you will calculate. I've wrote actually here with the definition for the Interceptor. You can actually do not come to the video when you download the file you'll you'll see it there but it the intercept as we talked about it is they know this is the y-axis and this is where the line crosses the y-axis and it basically says at C review so when views are zero how many subscribers would you suspect to get and in this case once we've done all the calculations let me show you you get away get away last minute. So this is how our equation looks based on the data right so in here if you remember the formula it was y equal B plus MX so this is P and this is 2 so it is intercepting at 2 so if I would get x equals 0 that means no views whatsoever. I would get two subscribers. How likely is that. I don't know but it doesn't sound so unlikely. Actually there are chambers. That have your. I don't know see reviews. I don't know if they send each other house your views for very few views and subscribers. Maybe if you are the first one to subscribe in your mouth that would be too okay so that is it. B and then you have m m is the slope and what it does is let me show you so. It is how much our subscribers are determined by the number of views so if we get a certain amount of views. I will get 0.01 times of you subscribers. So let's say that I get a hundred views. I'll get one subscriber plus the intercept so three okay. So that's the way it actually works. It's actually quite useful now with a creation you can start predicting performance so predicting subscribe so one way to do that.
Will actually be to create a parameter. So we're going to do a what-if scenario is a new parameter and this is number of use and the minimum is zero. The maximum is. I don't know 100,000 and an increment or a thousand. That's nothing thousand a thousand okay. So now we have here a parameter if you remember a prime me to just create a table with all these the numbers inside right so create seven dot which is the selected value. And there you have it so this is the thing that we can play with now. We need to create the regression formula once more. So let's go in here. New measure number. Oh predicted or we call like that predicted subscribers is B which is our intercept plus M times the number of these so the parameter that we just created and I really we have it now get away are the types editor. It just drives me nuts nowadays this is so annoying okay so here we have the parameter and this is the print subscribers pretty record and now you can start predicting so based on the number of views so you can predict anybody that you want here so if I were to get 11,000 as we said uh we were not that bad so 82 subscribers so if we get eighty thousand views 600 subscribers you see so this is regression linear regression. You see how. I calculated you see how you create scenarios for it and why you see. Two useful is very very useful to see to determine the value of one variable based on the value of the other one. Okay so this is over today. I hope you are enjoying the statistical series. I'm definitely doing it and I'll see you again on Monday as always with another video. I hope you're enjoying your holidays and I'll see you again on Monday bye bye.
So you can actually get the trend lines of regression line in power bi but you will not give you the formula so if you would like to predict future performance you would have to do it in paper for example so let me see if I would like to predict how many subscribers I would get if I would get 11,000 views so I would go in there. Let me grab the pan. You will go faster so I would go in there and I will extend this and there might not be necessary as this slope is not that big so here we have 8. That's about nine ten eleven thousand so I would have to go up here. I would have to extend my line and somewhere in here that would predict the number of subscribers or is about that. I don't know it's like mmm 90 or 85 subscribers or if I get it 11 thousand views I would get 95 verified subscribers. Okay but obviously we don't want to do it with a pen on paper. We want to be able to calculate future performance and for that you need the ink uation for that you need the formula so the line the formula for a line it is y equal MX plus B so Y in our case is the number of subscribers and X is the number of views and M we need to calculate based on that line is the slope and B is the y-intercept and y-intercept is. This is the y line you can see is the point at where the line process of y axis. I'll tell you what those things very important will tell us it once. We have calculated now right so now we're going to find out what is the QA team for that line for our lines reviews and subscribers and once we have that we will be able to predict future performance. How cool is that. Let's see delete what we see in the screen. So here is the the formulas for linear regression are all over the internet. You're going to get this file for downloads. You will have everything in there okay. So you're going to get all the calculations. I'm going to go very quickly with them because I don't want to bore you by writing everything you know. So let's see here. We have the views the subscribers.
Would you need to calculate these views 10 subscribers then you need to calculate view squared subscribers squared and then you need to calculate this sum of all of those so the sum of views the sum of subscribers the sum of the x squared the sum of the Y squared and once you have that you know the number of points in your data set and once you have that then you calculate the intercept which is calculated. Here you're going to get the value would get these easy busy and then you will calculate. I've wrote actually here with the definition for the Interceptor. You can actually do not come to the video when you download the file you'll you'll see it there but it the intercept as we talked about it is they know this is the y-axis and this is where the line crosses the y-axis and it basically says at C review so when views are zero how many subscribers would you suspect to get and in this case once we've done all the calculations let me show you you get away get away last minute. So this is how our equation looks based on the data right so in here if you remember the formula it was y equal B plus MX so this is P and this is 2 so it is intercepting at 2 so if I would get x equals 0 that means no views whatsoever. I would get two subscribers. How likely is that. I don't know but it doesn't sound so unlikely. Actually there are chambers. That have your. I don't know see reviews. I don't know if they send each other house your views for very few views and subscribers. Maybe if you are the first one to subscribe in your mouth that would be too okay so that is it. B and then you have m m is the slope and what it does is let me show you so. It is how much our subscribers are determined by the number of views so if we get a certain amount of views. I will get 0.01 times of you subscribers. So let's say that I get a hundred views. I'll get one subscriber plus the intercept so three okay. So that's the way it actually works. It's actually quite useful now with a creation you can start predicting performance so predicting subscribe so one way to do that.
Will actually be to create a parameter. So we're going to do a what-if scenario is a new parameter and this is number of use and the minimum is zero. The maximum is. I don't know 100,000 and an increment or a thousand. That's nothing thousand a thousand okay. So now we have here a parameter if you remember a prime me to just create a table with all these the numbers inside right so create seven dot which is the selected value. And there you have it so this is the thing that we can play with now. We need to create the regression formula once more. So let's go in here. New measure number. Oh predicted or we call like that predicted subscribers is B which is our intercept plus M times the number of these so the parameter that we just created and I really we have it now get away are the types editor. It just drives me nuts nowadays this is so annoying okay so here we have the parameter and this is the print subscribers pretty record and now you can start predicting so based on the number of views so you can predict anybody that you want here so if I were to get 11,000 as we said uh we were not that bad so 82 subscribers so if we get eighty thousand views 600 subscribers you see so this is regression linear regression. You see how. I calculated you see how you create scenarios for it and why you see. Two useful is very very useful to see to determine the value of one variable based on the value of the other one. Okay so this is over today. I hope you are enjoying the statistical series. I'm definitely doing it and I'll see you again on Monday as always with another video. I hope you're enjoying your holidays and I'll see you again on Monday bye bye.