Danyliw Seminar 2019: Stephan de Spiegeleire


Like I mentioned Thank You Dominique and to the whole different from the other presentations but we did want to present you two different presentations actually the first one is about a larger project wouldn't you mean the right name for it it's more sort of a program of research which we call roux base it's all talking about more about that and then we'll give a specific example of how we've applied some of the stuff from roux base to the case of Russian coercion to its Ukraine over the past decade it's really all about this evidence base that we talked about before and how we can get it better at least thing that what used to be I mean I often show this slide you know Russia's back you know there he is and he says that he's willing to use all means necessary to acquire what what he wants and then on our side the westerns Russia specialist site this is what it mostly looked like people my age with my stomach with my gray hair still pretty much doing what we did 20 or 20 25 years ago I mean frankly I started I mean I started my career as a Soviet specialist so I was at the RAND Corporation UCLA and in those days you know we didn't have any of the tools that we had available now and so after the collapse of the Soviet Union had worked for thing thanks for most of my life and the work of think-tanks reflects the priorities of our governments and so for most of my career I've been doing the in defense work which has become much more quantitative of course and so then after 2014 our government's wake up again to the fact that washes back and so you know III reenter the field and it's pretty much the same people doing the same type of research with the same type of tools despite the fact that the world has really transformed revolutionarily out there I talk about three different modes of doing research we still very much are in the mode what we call 1.0 which is the pre-madonna my apply sense guy here another thing it's these experts who have devoted their life through a certain topic they've become almost the gurus of their particular field you have to listen to them they have their own priors they they know a lot about the topics that they work on but of course they also have a lot of in their growing up they've sort of tend to have their priors their ideologies their believes their ideological predilections their theoretical predilections so you definitely want to listen to them but you don't want to listen to just one of them you want to try to bring them together to bring have discussions have focus groups have conferences where different people from different parts come together to exchange views often not very elegantly right I mean I think it's great that the Neela actually seminar really tries to have more interaction in most seminars but it's still very dissatisfying so that's what we are sort of pushing for three points Oh which is a bit like web 1.

0 2.0 3.0 right so the idea is most of these experts right the progress that has been made in natural language processing a national language understanding over the past even just two years has been so phenomenal as you all know from your mobile phones when you do a search query um you know the the quality is much better so how could we use some of these new tools to collect not only the textual data but also the more numerical data and tried to see if it can help us in our stories irrespective of where do we think of ourselves as qualitative or quantitative people how can we get more evidence than we've had so data comes in all sorts of different forms in our project we focus primarily on numbers and text I think it's great that others also focus on video there's just sort of a step too far for us right now they did themselves don't say anything you want to put them into a certain context you want to organize them you want to validate them when they become information ultimately you hope it leads to a level of knowledge by the way this is a graph from the real eye our international relations people think that IR stands for international relations the much bigger field of course of information retrieval which is what this is from but so the knowledge layer is what we often have now because of our almost Unbearable Lightness of the data that we have and the information that we've extracted from that we have we have to base decisions on knowledge mostly by these prima donnas mostly working themselves in christina ball streams of evidence of how that's the case in our field so we've constructed this program which we call roo base rune stands of course for russia and base has a dual meaning it's based in the sense of a knowledge base how can we bring together all the evidence that's out there texts and that evidence and data evidence bring it together in a data warehouse and offer it to our colleagues to be able to use with them and that becomes a second meaning of the word base base as a foundation for new forms of collaboration we have been a particularly on collaborative Bunch I mean you'll see some of the evidence afterwards how can we improve that so what you see here is a the base you sometimes think it's like a foundation of a house it's static our base is sort of much more dynamic it constantly ingests new data both text and an element and other elements we make knowledge graph so we recently started to make knowledge graphs out of those kind of things and then we can still write our articles in our books but we can increasingly also move to the right to a sort of interactive visualizations where we can start exploring what we know about what happened for instance take a number your case for instance right imagine that we could create a real ontology where the actors can be coded whether the validity of the agency of certain actors can be tested by a variety of different sources and then we can do all sorts of different things with that so there are three main lines of develop and the first is over text incidentally this this work is funded by the Carnegie Corporation of New York $700,000 for two years and we just got two million dollars from the Pentagon from the Minerva research initiatives initiative for the next three years so we're going to be doing serious work on this for the next three years and we're doing it together with Georgia Tech the us-based University and so their University Library has incredible coverage of almost all academic sources that are out there we're doing quite a bit of a blow metric work which is a very low hanging fruit for most researchers Cristina will show you some examples as well on the Russian side we have bought access to East you press which is a us-based a Grenadier unfortunately there's no equivalent for Ukraine it would be great to have one but this is really nice because it has really very good coverage for both central newspapers regional newspapers official publications press statements academic journals from from across Russia and so what we do is we use various search creole gorillas how do we find the right types of documents that we're looking for and that we put a lot of time in this a typical search career sometimes five words or so we have a search query that takes about two pages where we tried to identify it what is Russian agency you could just work for Russia but you'll miss a lot of elements of Russian agency that you know in our particular how you want to look for so the actor who's Russia what the project about coercion what kind of elements of verbs nouns would you look for if you want to create a corpus about coercion how do we focus on international coercion because we are not interested in northern caucuses or in domestic coercion of which is a lot of Russia of course we're really interested international coercion so out of this we create a number of different text corpora which can consist out of a few hundreds to multiple thousands of documents which we then start coding so we're trying to extract an ontology out of a document and there are some new programs out there that allow you to do that so if you code about a hundred documents and this is an agent of Russia and you have a corpus of 10000 documents the program which is called prodigy for instance will try to look for similar context and then ask you is this guy a Russian agent and then you as an expert say yes no yes no and in like an hour or two of additional coding you can actually model you can create a model that will help you in identifying these kinds of things and so that allows you to do all sorts of visualizations out of that these are some of the corpora that we have now so you see for instance for coercion we have both in English and in Russian it's probably not exhaustive but it's much more than a typical 35 average site bibliographical references that people have in their in their papers and how do you know when you quote right that you really covered the entire field you couldn't do this five years ago you can now there's a fantastic new database called lens org which anybody can use it's really available it's much bigger than Google Scholar or or a web of science or Scopus so all right on the data sides we've collected also - all the data sets that have relevant information about motional international coercion we've put them together in in a data warehouse we also have some Russia specific ones and this goes from public opinion surveys to elite surveys to economic data to open open open data from the Russian government as a whole we try to put all of us together in a data warehouse and then finally we're building some knowledge graphs we don't have much time and I want to leave enough time for Christina so I'll skip over that but the knowledge graphs are sort of the new technology that the googles and the Facebook's and the Baidu's and $0.

10 of this world are using to extract knowledge from the academic literature it's almost a shame I mean knowledge building used to take place in academia.

These big companies have now used modern technologies to do this for the entire field of knowledge Microsoft academic you may know Google Scholar you may know and from that they're starting to create these knowledge graphs and we scholars are barely even aware of these things that are occurring I think we still have an advantage over the googles because Google's one Google wants to do the whole world and we are just interested in Ukraine or in Russia or in other things but we have to do this work and it's not being done very much very much quite yeah all right the final slide I want to show you is that this is really construct as an open project so we have a core which exists of the Higgs Center for stearic studies where I work a small dutch think tank but most of our team is based in Ukraine so Christina has a new entity within the Kiev School of Economics which we call strap base which is doing most of work for this particular thing this combines two great strengths of Ukraine it's data science specialties which certain parts of Ukraine are really quite good at and very mean Ukraine's unlike hearing that they're still they still know Russia better than we do and in fact I think Ukrainian knowledge about Russia is you know not as great as sometimes made out to be but it's still much better than ours our knowledge in Western Europe of Russia is dismal knowledge the United States of course much worse though so there is still a comparative advantage that Ukrainians have in helping us Westerners who try to make sense of this country around us we're looking for students who are willing to learn these things so we have meticulously documented all these data sets how do you use them where do you find them when you search what kind of search queries can you do we cannot make the corporate vailable because there are of course copyrighted but the entire process that will lead you to if you want to be create your own corpora its documented so if you're interested in is.

Come and see us afterwards we'll give you access to all these things because that's what we got money for from Carnegie to do this then we have a soundboard group is some more senior Russian experts pone ours the program new approaches to Russian security plays a big role in that and finally we have a broader ecosystem where we're trying to incentivize people to at least share their powerpoints like people who teach like you know Russian foreign policy there's hundreds of them and we don't even share our powerpoints so there's so much low-hanging fruit in which we could collaborate more I mean that's a bit over the background and Cristina will now show you some examples mind you we only got accepted to the conference how many weeks ago two weeks ago three weeks ago okay so we didn't have very much time so the papers are fully finished yet but so somebody's slides that you see will show you we're still double-checking but it's already quite impressive presentation I would submit you.