Bibliometrics (20): Factor Analysis And K-means Clustering Using Co-Citation Matrix


Hello welcome to another video by research hub in this video. I'll be doing two. Multivariate analysis based on a co-citation matrix file that is factor analysis and cluster analysis in the previous video. I showed you how you can use bib. Excel to make the specific co-citation matrix file in this video i'll be using spss specifically with the same co-citation matrix file first we need to load in the co-citation matrix file into SPSS and in the previously I saved the co-citation matrix file as an excel. So if you import data you can import. Excel and you select the co-citation matrix file and you import it into SPSS but also mention a previous video is that the number for a pairing between the same author should be set to 1 for multivariate analysis. So make sure that all the pairings with with themselves or set to 1 you should also control that there's no extra columns or extra rows of rivals or else it might. Something might go wrong for now. It seems fine so we need to transform these numbers into correlations. The way you do it is simply running a factor factor analysis a dimension reduction for factor analysis to obtain the Pearson correlation so select all the variables except for variable 1. Because it's the author names and make sure that the coefficients are set. The rest is not as important and that you have the correlation matrix set. When you've done it you can click. OK what you now. Receive is a correlation matrix using the numbers from the co-citation matrix. So we should now save this correlation matrix as a separate file or separate the file here so we can now save it as an Excel worksheet. Now we need to open. Excel we paste it in make sure that it's formatted correctly for importing into SPSS again and we save it. I'll put in the same folder previously. I will call the national culture 500 correlation matrix. And it's done. We need to make a new new dataset for the correlation matrix numbers and again you import the correlation matrix that we save it as an excel file click.

OK and make sure we don't have any unnecessary variables that we're not going to use and confirm that we don't have any extra columns or rows for the correlation matrix file in this case. We had an extra row. Now we have now transformed the correlation matrix from single references accounts to collation counts. And this is what will be the base for your multivariate analysis so first. I'll demonstrate how you can use this fact correlation matrix for a factor analysis in a similar manner. We also want to do effect analysis by dimension reduction. You want to mark all the items except for the variable with the names then you can click on descriptives and you can have coefficients chemos and determinants but for this case we want coefficients. We make sure the collation matrix is set on and if interested you can also have a scree plot if you want to base an eigenvalue more than one in the rotation you can do a normal very max rotated solution for the scores. You might save it. It's not needed and you should sort this by size. And if necessary if you do have a loading criteria you can suppress them already here however it will not suppress the negative ones so for now. I will leave those open. Then you click ok so now you receive several analysis but the one that are most interesting should be. The total variance explained. You will see that. Out of this factor analysis we receive seven components having cumulative cumulative variance explained 95% and the eigen values are different cut-offs because the component one is explaining 51% the next one is playing explain twenty five seven four etc so this will indicate there are seven factors within the hundred most cited references in my pool of retreat collection. This can also be further highlighted in the scree plot because there are so many components it will be hard to just read from this table but you can decide to cut off to specific component number if you want to look at the actual articles which fit into each factor you can look at the rotated component matrix in our case we see that major of the output of the publication's would fit into factor one because the loadings are the highest for component 1 or factor 1 next article the next component which starts from.

Tiffany in our case factor 2 and I'll I'll cheer for factor three etc so you should copy this rotated component matrix for further analysis for example by importing into Excel or other similar software's so these this is how you receive your specific factors from a multivariate analysis from a co-citation matrix now. I will show you how you will do this. Through through the clustering analysis and clustering analysis specifically using the hierarchical connectivity based clustering method or the world method. And you follow it up with a centroid based cluster procedure or the k-means. Nothing sounds very complicated. I will show you exactly how so we will also use the same correlation numbers the correlation matrix that we use previously but instead instead of going to dimension reduction we will go to classify at first we will classify the hierarchical cluster. You select a hundred references click on statistics and you want to proximity matrix as well. You leave the cluster membership unknown because we want to see the all possible solutions. We also also wish to see a dendrogram the further analysis any method make sure you select words method and you can leave these on then you click okay. In contrast effect analysis which base bait is based on eigenvalue and Francis explained now hard hierarchical cluster analysis you need to look on the telegrams and have a feeling for what number of clusters or groups. You want to use you look at the agglomeration schedule and you can see each of the stages for all the hundred articles but what's most important is to look at the dendrogram and you can see you have different levels or different rescale distances. With with unique numbers of clusters who each of them at level 0 we have all the hundred but as it increases you will see that the cluster solution becomes smaller and smaller so in the first solution we can count 1 2 3 4 5 6 7 8 possible clusters.

You can also count them by looking at the dashes out of these groups that is on the first level on the next level we can now count the bigger ones. We have 1 2 3 4 for possible bigger groups and as you expand the distance you can get less amount of clusters now if you would compare to what we got. In the fact analysis here it was seven components which could have good high percentage of where. Isaac's plane similary received eight clusters meaning that they seemed to be similar so now that we get an idea that the hierarchical cluster method proposed up to eight clusters we now can use this number for the k-means method because in the k-means method. You need to specify the number of cluster solutions. So now if you click on analyze classify k-means clustering. We need to select a hundred again. But make sure you don't select any extra variables here you put in the number of clusters. Let's choose eight. As eight was the first number we saw. In the first level we want to save the cluster membership and this is for cluster. Center in order for us to track any movements if you want to do multiple analysis with different number solutions. Then you click. OK now you you will see that. Each of the articles will have different clusters centers and these will be hard to interpret. But what's most interesting is to look at the number of cases for you for each article in each cluster. And here we can see that there is a large group of 39 articles. Then you have subset groups and if you go back to your original. SPSS dataset file. That is the correlation matrix file. At the end we have now saved. Which part of the cluster would they fall under. So the idea is that you can do you can run several k-means analysis to compare the differences in the cluster memberships.

So let's say we want to explore those seven because we found seven factors in the factor analysis. Make sure you have the hundred articles and don't add in edit any of these extra variables you click. OK in our case. We now have 40 articles in the largest group and the substance have increased in a few so you can now compare for each reference and each article. What type of memberships did they have with eight. And what did they have with seven so these types of multivariate analysis can help you to factor or group for specific. Co cited references for your Co citation analysis so this was showing you how you can do. Multivariate analysis with a factor and cluster analysis. I hope you enjoy the video thank you.