Quantitative data analysis


Hello class welcome to sir researcher tv where learning research is made easy this is your story searcher teacher ian and let's learn research for our daily lives today we will be learning about quantitative data analysis now this part of our research process is expected to be done towards the end meaning when you have already completed your data collection and that you are now approaching the conclusion part of your research so this is also just like any other steps in your research process data analysis is also one of the most crucial and critical part of our research because this will now lead us into the findings and finally coming up with our conclusions that would answer our research problem but for this lecture video we will be discussing or we will be learning about the basic or the the simple we will learn the basic or the simple um process which is more on descriptive analysis so this part uh we will be expecting two parts for a quantitative data analysis so we have the descriptive data analysis and then we will have another video for the quantitative data analysis so let's get started so here remember that all these varied analytical studies that you pour into your research become significant only if prior to finalizing your mind about these activities you have already identified the measurement level or scale of your quantitative data that is why it is important that we have to identify and to know what kind of measurement scale the data that we gather since our data analysis procedures will solely depend on the type of measurement scale okay so whether your study measures a data through a ratio or interval scale and not by means of nominal or original scale because these last two levels of measurement are for qualitative data analysis so what will happen for instance if we have let's say qualitative data but we are conducting a quantitative data is that possible so researcher yes it's actually very possible so we will be knowing or we will discover later how are we going to um use or how are we going to process the qualitative data into a quantitative research so let's now talk about the steps in quantitative data analysis so there are actually um how many steps we have here um if i'm not mistaken yeah there are two basic steps okay or two um parts and how we are going to analyze quantitatively your collected data so the first one here is your data preparation so keep in mind that no data organization means no sound data analysis that is why prior to uh your data analysis in your methodology or in your uh data analysis plan you have already stated or you have already emphasized uh or listed the organization procedure or the preparation procedure of your data so that you will not find it difficult when the time comes that you have to do um do it in actual or implement or uh organize your data once it's already available hence preparing the data for analysis by first doing these two preparatory substeps so the first substep class is what we call your coding system now this is to quantify or change the verbally expressed data into numerical information now remember that converting the words images or pictures into numbers they become fit for any analytical procedure requiring knowledge of arithmetic and mathematical computations because basically we cannot add up words or we cannot summarize pictures unless if we wanted to set a theme but if it's quantitative in nature the approach is quite different now but it's not possible for you to do that and likewise um in order for us to have a uniform data processing and a quantitative research type it's very important that we have to code our non um quantitative uh data okay so how do we do this okay so for instance you have the variable gender okay i will give you an example so let's say you have the the variable gender so we know that gender is a dia okay let's let's try to be more diverse and not use gender but rather biological sex okay so let's use biological sex as our variable here and as we know there are only two biological sex we have the male and the female okay now how can we quantify male and female because in the first place they are dichotomous nominal variable so we have this what we call your coding system so what we do is we assign a certain numerical value or a certain numeral or a certain number for each variable so let's say if it's a male we assign number one and if it's a female we assign number two but remember this the numerical representation of our nominal variables here does not have any values meaning it doesn't mean that if you assign number two for female it has more value over the number one for male just remember that these are just numerical representation so that later on if we wanted to know the if we wanted to count the frequency of male and the frequency of female it would be easier for us to identify and it would be easier for um softwares to count because there are there is i mean a numerical representation and likewise if there are variables that are quite long or let's say uh like a particular phenomenon or a particular statement that's quite long coding system really helps shorten those variables so that when we encode or when we classify into our spreadsheets for data processing on softwares that would um be quite easy to undertake and to um yeah quite easy to undertake and it's not so hard to to do okay so that's what we do under your coding system now the second part or the second way or the second process in your um step one which is prepare preparation of your data is your data tabulation so from the name itself we have tabulation so that would lead us now to table here so for easy classification and distribution of numbers based on a certain criterion you have to collate them with the help of a table so here we use frequencies and percent distribution so remember class that for data tabulation we use frequencies okay so frequencies meaning the number of uh the number of occurrence of a particular variable and of course we also use your percentage so i think we don't have to discuss what percentage is because we already are familiar with that so it's part-to-whole um representation okay so remember that this kind of graph or your table is an excellent data organizer that researchers find indispensable so your data or your table i mean class is um aside from the fact that it presents okay it helps you organize your data it also reflects somehow the summary of your um collected data because as we said we we use frequencies and we use percentage so if we place that on our table and if we see that uh we automatically would see okay so this is the this is the variable with the highest frequency and this is the percentage distribution so we would know right away just by looking at the table how the data are or how um the data are distributed all right so let me give you an example of uh a data okay of a table okay so this is not this is just one way of presenting your table here so we have a minimum of two columns so the first column or the leftmost column mostly will be the the variable and then as it goes towards the right it depends on how many other columns you're going to add or or how you're going to represent that particular uh variable but in this case the frequency um the the sub variable which is here okay let me highlight a little bit okay wait so this is now your your first your your gender here is your first um your for your first column so that's your variable your male and female here are your sub variables your number 11 and 13 here are your frequency while 46 and 54 percent here are your um percentage but in some other ways you could make a separate column for the male and female another column for the frequency and another column for the industry or the percentage but this is a minimalist take on the table as long as all pertinent information are present the way you present your table really doesn't matter but pro tip make your table as simple as possible don't make it too complex because uh people will or your readers will will see your table in order for because they don't want to read all those long paragraphs so they just look into the table and see how the data are or how the data goes and that would help you or assist you with whatever things that you wanted to do so that's approaching class all right i think we done with that so that is our activia data tabulation okay now let's proceed to step two now you have prepared you already finished your um preparation of your data you're done with the coding you're done with the tables you have placed all of your um data into your frequencies and assigned uh person district percent distribution it is time for us to analyze our data now remember this lecture video will only talk about the descriptive statistical techniques so we will cover this okay while on the second video we will be um covering our influential statistics because that's quite a lot and it's a little bit more complex so we will have a separate dedicated video for that all right now dave uh though the one thing that beginning researchers often use tells us the aspect of categories of data and in here we now begin with the descriptive statistical techniques and um your descriptive statistical techniques would cover most likely your frequency of distribution your measures of central tendencies and even your standard deviation however remember that your descriptive beat uh your descriptive statistical technique um does not really give you the information about the population where the sample came from so this is just like an overview okay okay this is just like an overview or let's say like a preview of the type of data that you have gathered while your inferential statistics would now make inferences and that would now lead us to more conclusion okay all right so i think we are now ready and because here the second or your international statistics fits higher level of research competence because this involves complex statistical analysis requiring a good foundation and thorough knowledge about statistics therefore um for and for a little bit more advanced of data analysis then we will be talking about inferential statistics and we'll be learning about that but of course you're lucky because you're here in researcher tv so we will cover both your descriptive and your inferential statistical techniques yay now let's start with your descriptive statistical technique now as what i have initially mentioned descriptive statistical techniques provide a summary of the orderly or sequential data obtained through the data gathering instruments and in here um we it includes three essential parts your frequency distribution your measures of central tendency and your measures of variability so let's take a look here so the first one is frequency distribution so we have already talked about tables so this is really is the same and um usually if during the data preparation you have already accomplished your table so well and good so this gives us a frequency of distribution and percentage of the occurrence of an item in a set of data in other words it gives you the number of responses given repeatedly for one question so in here okay let me also show you another uh way of putting in your your frequencies so yes uh the very basic ones we have here okay wait okay we have here all right your variables can you see there on the first column then there is a code because these variables are in num are in words or phrases so we have to code them this is not your frequency distribution distribution i'm sorry so how many answered strongly agree and your percent distribution so that's also another way of putting in your uh tabulation there okay next up is measures of central tendency now this is a review because in your statistics and probability you might have covered this one but let me just uh walk you through one more time with our measures of central tendency now remember that when we say measures of central tendency what are we talking here okay we are talking about the what okay the different okay the different positions or values of the items okay so positioning but technically because it's central tendency suppose the central position of your data set and we do the following and it i mean it's composed of the following so we have your mean your median and your mode so just a quick um description for each so your mean here class is your average of all the items or score average okay and uh this is usually done um yeah okay wait so the summation okay i'm not quite sure but it's the summation of your scores over and okay well i i i don't want to really put a like a statistical formula for that but of course when you do your average you just have to add all of these scores and then you divide it as two how much or how many scores are there okay so that is your mean your median on the other hand is the score in the middle of a set of items score okay in the middle of oh the middle of a data set and now for your media okay i just like to add a little input here so uh in order for us to uh identify the median the first thing that we need to do is first arrange your data in an array so an array could be from highest to lowest or vice versa and then if there are scores that are repeated so let's say for example there are five uh number threes you don't have to write the five threes but just the three only okay so um do not write repeated uh numbers so just uh if they if there's only one score but multiple um occurrence of that score just write that score in there and usually it's it's quite easy to identify the median if you have a an odd array okay because if you have an odd array you just have to look at the central number or the central score so for example your array is one seven three ah no no not three because it's uh an array so 1 7 10 um 14 12 18 20 okay so you count them one two three four five six seven now 14 here is your median okay but if you have an even array okay let's say you add here uh okay nine for example so uh okay one wait okay so 1 7 9 10 14 oh sorry the 12 and 14 should exchange my bad okay so okay wait wait so 12 14 18 and 20.

Okay so please note of this it should be 12 and 14 not 14 and 12. so one two three four five six seven eight so you have a total of eight so what you're going to do you look into the two scores in the middle and then you just have to add these scores and divide into two so these are the two middle scores you're 10 and 12 so 10 plus 12 is 22 divided by 2 you get 11. so 11 now is your median score and of course the mode is the most okay a recurring score okay it's the most recurring data or score so um it could be bimodal all right so when i say buy modal you have two modes or it could be trimodal so if there are let's say again our array let's say there are five number seven and the rest are less than five then our mode is number seven okay there now let's go to our measures of variability so we have three measures of variability here we have your range so okay what do we mean by measures of variability first okay this describes the amount of difference and spread in a data set okay so please take note of this so this would now describe the amount of difference amount okay it's the amount of difference okay it's the amount of difference and at the same time uh the spread in your data in your data set now this could be um identified by three parts or three components so we have your range your standard deviation and your variance so how do we do this so your range is usually um the difference between your highest and the lowest score so you identify the highest score and then you just deduct it with your lowest score and that's your range your standard deviation on the other hand extent of the difference of the data from the mean so this would now cover the extent of the data from the mean okay so this would now um show the extent of difference of your data from the mean likewise uh okay wait likewise um your variance is now the informal measures are of how far okay this is how far a set of random numbers okay are spread out now usually your variance here is um okay your variance is actually the squared of your standard deviation so you get your standard deviation by okay uh your standard deviation is the square root of your variance okay so don't do not forget that okay let's just have a quick um run through and how we're going to uh solve for the standard deviation and the variance so i'll let you show here okay so the first step is of course you need to um solve for the mean okay and then at the same time um you ha you need to uh the second step is compute the deviation or the difference okay wait i'll just have to here okay so the first one is solve for the mean and after you solve for the mean okay after you solve for the mean um you get the difference so if if you've noticed uh the data item here are in an array but you have to include each okay numbers whether they are repeated or not so it has to be uh stated there uh very listed okay so one two six six and so on and so forth and then for the step two you compute the deviation and difference between each respondent with the mean so one one minus seven so this is now your mean here so you get negative six you do that for the rest and then you square the deviation so that's step three so technically you'll be able to come up with all positive values here and then once you get the uh once you get all the square of the deviation you uh look for the sum of squares so that's 196.

And then after that and after that you have to get the variance and how do we get the variance okay i'm getting the variance you just have to um okay we divide the sum of squares with some number of beta items so this is 196 divided by nine so because we have one two three four five six seven eight nine we have total nine items so we get 21.

78 and in order for us to come up with our standard deviation we have to square the 21.78 and that's four point six and seven okay so that's it now i'm going to discuss the relevance of your measures of variability and why is it important that we need to identify your measures of variability so one moment okay so we will appreciate how uh your measures of of your standard deviation would actually um tell us okay all right so let's take a look here at the importance of our measures of variability okay so let's take a look first at our range so what is the relevance of our range now we know that our range class is the difference between the highest score and the lowest score right okay however there are other factors that would affect um the range of a particular data set so we have this what we call your outliers okay what do we need to say outliers so for example if the lowest score okay let's say in a test of okay and the test of a 20 item test okay in a 20 item test let's say your lowest score is 2 and your highest score is um 17 okay let's see uh that is our um lowest and highest score so technically our range is 17-2 it's 15 right okay so our range is 15. so that's quite a lot now for for 15 we could say that the range is very spread out but it might be influenced by an outlier and what do we mean by this so let's say you have a total uh your your population or you have a total of uh let's say 10 students and from these 10 students nine of them got a score between two to 12. okay only nine students got a score between 2 to 12 while only 1 student got a score of 17 so if you have noticed most of the students most of the students scores are only within 2 to 12 and only one student got a score of 17. so this 17 here is an outlier so the range is not really a very good determinant of distribution because as what i said it can be influenced by that outlier a score that deviates a lot into the the rest of the scores that's why we have to solve for the standard division so that we would really know um the the influence of outliers into your distribution because the common sense would tell us that if your range is high if your range is big then your scores are dispersed but not in all cases or not in all scenarios because as vadim said there are outliers or data scores that really separates from the rest of the scores now what is the implication now of your standard deviation so now let's take a look here at your standard deviation or your sd now usually if you have a high sd it means that your distribution this uh there is a wide data spread okay there is a y data spread and most likely your data is heterogeneous they're heterogeneous okay so why data spread and have erosion is varying varying data the data are not really very similar with each other while your low sd on the other hand we have your um cohesive spread okay we have a more cohesive spread therefore giving us the impression that our data could be homogeneous or somewhat like similar with each other so there uh in here you would have multiple modes because there are scores that are quite similar with each other okay and so on and so forth and probably the median would also be close to the mean because the data are not to spread out all right now what is okay let let us put it into uh a more complex perspective not really more complex but into a more contextualized perspective so let's say we have here class a and we have class b all right so this is our example now let's say the mean score for both class is 27.

81 and 27.81 now just by looking at the just by looking at the mean score we can say that both classes perform the same because of their mean because of their similar mean so both have would have 27.81 as their mean so that means that means that their mean is the same so therefore they perform similarly however if we looked at the standard deviation and assuming that the standard deviation for class a is 1.82 while the standard deviation for class b is two point oh let's make it bigger 3.

47 this will now tell us that okay even though the mean scores are the same there might be some difference why because it's what we've said the lower the sd the more homogeneous or the more unvarying the scores are as compared to a high sd so if the standard deviation is small meaning the students scores are really close to the mean so most of them are somehow performing quite the same or their level of of their performance is also quite the same there are less outliers there are no extreme scores that really deviates from the mean so therefore we can probably conclude that class a performs better than class b because the scores are more closer or are closer to the mean compared to the scores of class b it might be that in class b there are really very good performing students who got really high scores but then um they try to pull the other students with low scores that's why they achieve that kind of need okay so i think that quite makes sense and i hope um you have understood that okay so i do hope class we have learned something today and please if you have any questions regarding our measures of central tendencies and measures of variability or if you have some disputes or arguments regarding our lecture discussion please feel free to comment down below and i will look into it and if there are some lapses or clarifications or any content that um was not properly discussed or any enhancement contents as well just comment it and i will create a dedicated video for that so once again this is okay so don't forget to hit like to this video subscribe and ring the notification bell for a new contents weekly so this is once again your sir researcher teacher ian saying class dismissed thank you and see you next time.