Basics of Scientific Literature Analysis, Part 4: Network analysis/visualization with Gephi
Hi in this video. I will show you how to load the network data from a literature review to gappy and make network visualization of the citation network. Now when you open up gap him. You will see usually this screen first. Here we can just click the new project and now it's all empty and we can go to data laboratory where we can see a button saying import spreadsheet we click that then first we will need to load a notes table and let's change that to notes table and find the file and in the output of the review. There will be lots files the ones we want to use. Here are your citation notes and citation. HCS these two files contain the information on citation networks. There's also auto notes and outer edges in case you want to make collaboration network of the out authors. So let's choose citation notes open that and press next then we could change your pop list which is a numeric value change that to indicate integer. I don't know if it really matters but let's do it anyway and hip things if if the network is large it will take a little while for the get P to load but ok here we can see all the notes then we can go straight ahead and load also the edges table so we choose here. It's a staple and now the gap he is complaining that the currently selected file is and not in a correct format. It's okay we go here and find citation edges it open and throw goes away and can press next we can again change your published to integer hit finish and the kepi will our load also the information on the edge so now here we have note and if we click this we can see all the edges and they are all directly because citations go on long way. Now we're going to go to overview and here is our network. It looks quite horrible at the moment. That's just way too much stuff about here. You can see there's about 17 thousand nodes and twenty thousand edges but it's okay. We can filter it but before doing that. Let's move to the statistics. We could calculate for example modularity which tries to group the nodes together based on on their connections and the more they have two more two nodes have connections together the more likely they are to be in on some group and then we could also do a page rank which is the famous Google algorithm for finding out important note.
We will use this information later when we visually visualize the network more but next after doing those calculations we probably should filter the network a bit. Because it's way too big we can call the filters and there choose topology and for example filter by in degree range which is the number of incoming connections for notes and maybe we only see nodes that have two incoming at least two incoming connections. See what we get when we then we can hit filter okay now. We only have 1600 notes visible and that might be a much more easier network easier to visualize we already can see a little bit of structure then to make things a little bit more orderly. Here's a layout menu we can choose a layout. There's quite many to use nowadays and I've done it a little bit more but force Atlas 2 is usually good one to go it then we can just hit run. We can see the network starting to take form in front of ours. There's also tea you can. On to the characteristics of the network for example if you get raided hubs they will tries to put the hops farther away or we can prevent overlap so all the nodes will be separated and it tries to keep things getting on the way ok the network starts to look as good as it gets so we can press stop to make 10 here we can color the nodes and edges if we want. We'll when we use get repress we will get the new things that we calculated over here we can also use them so we can choose. Molly module already class and then apply and now we can see different groupings that JP has found then we can go to ranking. We can change color or size or weight here but maybe it's probably change size because we already did two color and we could use for example in decree to change sizes so now now the nodes that have lots of incoming connections are large so we can start to see which papers are important here and again we can know there see a little bit of overlap so we could run layout algorithm a little more to get rid of overlap.
Okay and now we have a nice Network that so some how the some of the papers are connected again. This is quite heavily filtered. There's only nine nine percent of papers visible but still might show us a little bit of the structure and other another thing we could do here is we could change color. We want to see if here we are not interested in groupings but the importance we could maybe change color according to page rank score and when we hit apply we will see the more important papers by the PageRank in total target colors. And if we click here this little symbol we can now go and check what these papers actually are for example this one is let's see let's see the whole. ID bonnibel 1999 swarm intelligence. It's the reference to this paper and one last partisan thing that we could do is we could use origin for coloring. The papers so here here we could see the papers who were originally in the downloaded data set colored in blue and the papers that we only mentioned in the references of those papers are colored in red and if we apply that we can see that most of the important papers. We're actually not tells that. I'd had downloaded by the only mentioned in the references. Okay so this concludes our short term on epi and hope you enjoy but.
We will use this information later when we visually visualize the network more but next after doing those calculations we probably should filter the network a bit. Because it's way too big we can call the filters and there choose topology and for example filter by in degree range which is the number of incoming connections for notes and maybe we only see nodes that have two incoming at least two incoming connections. See what we get when we then we can hit filter okay now. We only have 1600 notes visible and that might be a much more easier network easier to visualize we already can see a little bit of structure then to make things a little bit more orderly. Here's a layout menu we can choose a layout. There's quite many to use nowadays and I've done it a little bit more but force Atlas 2 is usually good one to go it then we can just hit run. We can see the network starting to take form in front of ours. There's also tea you can. On to the characteristics of the network for example if you get raided hubs they will tries to put the hops farther away or we can prevent overlap so all the nodes will be separated and it tries to keep things getting on the way ok the network starts to look as good as it gets so we can press stop to make 10 here we can color the nodes and edges if we want. We'll when we use get repress we will get the new things that we calculated over here we can also use them so we can choose. Molly module already class and then apply and now we can see different groupings that JP has found then we can go to ranking. We can change color or size or weight here but maybe it's probably change size because we already did two color and we could use for example in decree to change sizes so now now the nodes that have lots of incoming connections are large so we can start to see which papers are important here and again we can know there see a little bit of overlap so we could run layout algorithm a little more to get rid of overlap.
Okay and now we have a nice Network that so some how the some of the papers are connected again. This is quite heavily filtered. There's only nine nine percent of papers visible but still might show us a little bit of the structure and other another thing we could do here is we could change color. We want to see if here we are not interested in groupings but the importance we could maybe change color according to page rank score and when we hit apply we will see the more important papers by the PageRank in total target colors. And if we click here this little symbol we can now go and check what these papers actually are for example this one is let's see let's see the whole. ID bonnibel 1999 swarm intelligence. It's the reference to this paper and one last partisan thing that we could do is we could use origin for coloring. The papers so here here we could see the papers who were originally in the downloaded data set colored in blue and the papers that we only mentioned in the references of those papers are colored in red and if we apply that we can see that most of the important papers. We're actually not tells that. I'd had downloaded by the only mentioned in the references. Okay so this concludes our short term on epi and hope you enjoy but.