# R Programming for Beginners | R Programming for Data Science | Intellipaat

Hey guys, welcome to this session by Intellipaat. So, 'R' is a programming language developed by statisticians for statisticians. So, if you are interested in any sort of statistical analysis, then R should be your go-to language, and also 'R' is a versatile language. It is a great visualization tool, and it provides multiple libraries for Machine Learning algorithms. So, in today's session, we are going to learn about 'R' programming. Now before going ahead, do subscribe to Intellipaat's YouTube channel so that you never miss out on any of our upcoming videos, and also if you are interested in doing an end-to-end certification course on Data Science with R, then Intellipaat provides just the right course for you. Now, let's go through the agenda for today's session. We'll start off with a quick introduction to R programming and then we will learn about R-Studio GUI. We will also learn about different data structures in R; data structures like vector, list, matrix, and data frames. Going ahead, we will work with some inbuilt functions in R and also you can put down all your queries in the comment box; we would love to help you. So now, without any further delay, let's get started. So, 'R' is an open-source language. It is available free of cost for anyone to learn and apply. 'R' is a language which is little different from the traditional software building languages like C, C++, or Java. It was meant specifically or rather customized for statisticians who would not like to do much of coding but spend more time on understanding data patterns. So, most of the functions are pretty straightforward, easy to learn, and easy to remember as well. R is a language for data analysis and statistical analysis. R is a visualization tool. Before R, there's a language called S. R is derived from this language. If you read the history you'll get to know but that's not very important here. In R you can do the data visualization. It's an open-source cross-platform compatible software. It is a Turing complete language.

I'm not sure of what this is. I'll go back and check what this means. So, installing R: If you want to install R, you can go to this link and copy it. You can use this for installing R. So, all you have to do is open this link and click on download R for whichever operating system you have, like Linux, Mac, or Windows. Abdul is asking: Can you give an example of model building? That is what we are going to learn in this session. So when I say model building, quick example is: How do you predict data? We're talking a lot about sales. Sales is dependent on what? Sales is dependent on stock, the price, the quantity and quality. Let's take only these two variables. So let us say we have sales. I'll give you a field before we jump onto R. So, all of you can look at the link and search for this download for R for Windows, Mac, and Linux. this is where you have to click and download the software. So, sales is, say, dependent on your stock and when I say stock, it's the goods stock and not the stock price and then price of every product or maybe discount whatever you offer, and let me call it x1 x2, and x3, respectively. Sales is Y. Y is what you're going to predict. So, what is model building here? if I want to predict sales, I would like to multiply this with x1. Say, 2 times X1 plus 2 times X2 plus 3 times X3 is my sales. This is what you've assumed that given a stock of 10 units and the price is $5 per unit, and the discount you offer is 10%. Your total sales will be 20 plus 10 plus 30, that is 60. So, given these values of X variables, the sales will be 60, maybe in dollars or rupees or whatever. What do we need to find here? When we are building a model and when you want to predict something, what is missing here? We want to predict this coefficients. We know X1, we know X2, we know X3, and we want to predict Y. So, the idea of model building is solving this equation. this is just one example. Solving this equation to find out these parameters, and these parameters are represented by beta 1, beta 2, and beta 3.

So, if you're able to solve these equations and find out these values of beta, the predictions will be straightforward. It's just like replacing the variable. This is the exercise of model building. What are the methods, what are the algorithms, and what are the statistical processes you can apply in order to find the values of these beta 1, beta 2, and beta 3. I will take up the next question. Is model building dependent on categorical data or business requirement? Of course, both. So any kind of data: categorical, or on business requirement or anything. Whatever the business wants; that is the end goal. What is the vision, and based on that you look at the categorical, numerical, whatever data it is and then try to build the model, and building a model I think I explained now. Just a quick info guys: If you are interested in doing an end-to-end certification course in Data Science with R, then Intellipaat provides just the right course for you. You can find the course link in the description box below. Now let's continue with the session. Nishan is asking why only 30 percent for testing data, train and test split which I talked about. So that you can experiment with. Standard is 30 percent but people also do 60:40 or 80:20 depending on how much data you have. If you have sufficient data, even 20 percent will do because you want to keep aside some data for testing and that should be a good amount of data so that the test sample is sufficient to confidently validate the model. So moving on, once able to install R, the next one we have to install R studio. So R is the programming language, and once installed R, you will get an interface where you can work on, but it's not really user friendly and that's why we recommend installing R studio which is the industry wide and commonly used IDE or front-ends ID which can be used to work on R. So, the prerequisite to R studio is you have to have R installed in your system.

So do all of you have the R studio link? you have in the links, and this is what you have to click on, R studio desktop. So once you install R and R studio, you'll have a screen something like this. Typically an R studio screen will have four windows. So, let us understand these four windows. So this layout is pretty common and famous in the analytics space, so even if I'm working on Python, there are IDEs which support R studio kind of layout for Python because this is what everybody is used to in analytics. So the first one on the top left is where you write your code. This is where you your code and these are R files which you are going to go through when you start learning the syntax. So this is your space where you write your code, top left. So, let's say you are executing a line of code Y arrow sequence 1, 5 by=0.5. Don't worry about what this means. This is typically trying to create a sequence of numbers with a 0.5. So, I execute this. As soon as I execute this, the same code replicates here. How do I execute? Either I select this whole line and click on this run, else click anywhere on this line, bring the cursor anywhere on this line and use ctrl enter as a short cut. As soon as you run this, you see a replica here. This is your console. So you write the code here, and it runs in the console. Now there are times where you don't want to retain your code. You just want to use some code for testing. So you want to look at what is there in Y. So Y has 1, 1.5, 2, 2.5, etc. This is a sequence. So, if you just want to test it and not retain it, then you can use the console. Of course, both of these are running the same session, same R session. If you want to retain the code, write it here, and you do a file save, you can do a file save here. You have a save as option. If you don't want to retain the code, you can just type in the console and look at the results. The top right one is where you have the environment. So you can look at which variables you have created, which tables you have created, what are the contents of this tables as well.

For example, this is one table, this has 40 observations and 10 variables. Click on it. I am not able to see; you can click on the content here, but this is not the best way of looking at it, so if it is a data frame, if you can also find it somewhere here. So I'll talk about that when we run the datasets. So for now, you can look at the contents at a high level. This will give you a summary of data. All the variables which you create will be available in this window. The last one, bottom right, you can look at the plots. So, any plot which is generated will be available here. I can export the plot as well using the save as image or save as PDF. There are a lot of other options. As and when the requirement comes, I'll share those. So you should understand, we have four windows, that is good for now. So R studio is a set of integrated tools designed to help you be more productive in R. It includes a console, syntax highlighting editor that support direct code execution and a variety of robust tools for plotting, viewing history, debugging, and managing your workspace. So, the first and foremost thing which you should know here is your working directory. As soon as you open your R studio, you should find out what is your working directory. So how do I find my working directory. I'll just use getwd. This is a function in R which helps you locate your working directory. So getwd. So this is an inbuilt function not a custom function. It is available in R. This is what your working directory is. Now let's say you want to import all your files from this path this file, but if you want some other path from where you want to import your files, you can set your working directory, set using setwd. You can just change your path. Say for example I want only D Drive Intellipaat. So I'll just type in D Drive Intellipaat and click enter. Now to check that I can use get working directory and see that my working directory has changed.

Now I'll get back to my previous one. I will roll it back for now. So, we will look at it when we start importing R files. So I altered it to my previous working directory, so, setwd and getwd. R studio options are accessible from the options dialog. Tools options. So we have General R Options. we're talking about Tool Global options. These are not mandatory, but it is good to have. You should know what are these and what are the capabilities of R studio. So General R Options: Default CRAN mirror, initial working directory, workspace and history behavior. Source Code Editing, Appearance and Themes, Pane Layout. These are more to do with appearance and default settings. All you need to do is you open your R and start working. No customization required. If you want the black screen and a white text font, you can do that under Tools Global options, but those are not mandatory. No we are not going to use GUI. GUI is something R does not offer. There are different startups which builds GUI on top of R. That is not free of cost. and I'm not sure about R GUI, what exactly that, is but this is industry-wide accepted format and R studio is widely used everywhere. R studio is also very easy to install the packages so what are the packages in R? So as soon as you start working, you would need different packages to work. dplyr is one package which you are going to learn in detail which helps in data manipulation, data wrangling with a very easy to understand syntax. For example, you want to do filtering of data, then summarize, then adding columns, doing a sum and group by. Then you can use dplyr. Now all the packages will not be installed by default in R. What you have to do is you have to go to this packages. Under tools, go to install packages. so it'll connect to the CRAN repository. CRAN is a global repository for R which is maintained by certain group of individuals. although it's an open source, there has to be some regulation.

So that is the website which I shared for the R installation. that is a CRAN website and it maintains all the packages contributed by different programmers and associations. So, if you want to install the dplyr package. Type dplyr and click install. I am not doing it now but soon as I do it, it will start running some command in the console and install it for you. Just a quick info guys: If you are interested in doing an end-to-end certification course in data science with R, then Intellipaat provides just the right course for you. You can find the course link in the description box below. Now let's continue with the session will not take more than a minute the bladder but other package which is called ggplot2 which is used for data visualization will take some time before it installs all those themes other information don't give and when you do it when you do that you'll observe that this is what is on in the console installed packages and will in bracket we have beam plan something like this little random and good if they're solid and you know packaging it with a default solution that is good news phones but you have this default packages like the blur and you've got to I would recommend you know how to install packages because we would need install a lot of new packages based on for example people who are who prefer SQL ready so we have SQL dear package which can be used which can be used to write sql-like syntax na so all this data wrangling at least not may not be an expert in our syntax in this SQL syntax to do the data wrangling in our that requires a custom installation of SQL package which I'll show when we when we learn the classification emergence then the next one is people who are comfortable and are used to table in a Cell so same drag and drop and pivoting like operations you can do in are using the fact is our pivot table so that's why it is important to learn how to install the package okay other.

Things are not very important get SVN publishing and when please ignore for now is rarely used get an SVM or if you know what is SV in his radar gated so these are like Central Depository is where you can maintain the versioning of all of your code but yeah I mean you barely use it of course there are ways to install packages from outside the cran as well which is not recommended just like the Play Store or Apple Store in specially pays place to Android devices against all different have sorry but that's not recommendation that may not be secure they may not be stable way so it's good to install from crammed and if required you know you can also quantity will do this in open source setup radius also contribute free packages let us say there is a complex alpha complex operation for which you need to write a lot of lines of code multiple lines of course you can package it and release it in cram so there's a different process which you can read about it but the interests you can do that both are and buy them so this we have seen our studio GUI script window console window and environment window and the plots we have looked at are packages packages are so formally if you see packages and collections of our functions detained compiled code in a well different format the directory where packages are stored is called the library are comes with a standard set of packages or this are available for download and installation so Abdul is asking can we use our for data science lifecycle definitely that is why we are using hybrid so most of the things complete life cycles yes you can because our helps in will help you import data from different sources right you can do all kinds of data manipulation you can do all kinds of and all kinds of algorithms model building and then you also do data visualization right yes you can use but companies kind of use a combination of different tools sometimes they spend a lot of money on dedicated data visualization tools as well but.

Yes you can use startups though sometimes rely only on our end by ten lengths either of these for the complete end to end said but then the problem is like when the data is at scale you will need some distributed systems with our is an in-memory in-memory kind of set of plate so what happens is let's say you have four million customers as soon as you do an import this rate dot CSV is doing an import so all the 4 million records gets loaded in this memory that is in your RAM so that's why you would rely on some other scale system like big data right odd is not sufficient yeah on top of this high cloud is useful so I think the important ones will cover so install dot package is used for installing a package you can also use a GUI for our studio and if you want help for a package you can use library help is equal to package name or alternatively you can also go to this help window along this bottom a window and click on the help and find a topic anything say GG plot - do we have the help here oh this is search window sorry let me type in and check if you're able to search yeah so every time this is plot do you get the full details about who is maintaining it and where the contributors you you also get the github link you can look at the source code that's the beauty if you want to add more to it say in ggplot2 this one charter is not available one fancy chart you want like a diagram or donut chart or whatever or some animations which is not available you can always contribute you can report bugs and then you can read more about it they wish is asking what is the data limit are can handle or manage easily I would say not for them 300 MB because three enemies data size post that you do not a lot of data manipulation when you do that data gets replicated in your ramp so more than three an MB not recommended for data size and the structured data will not be more than that right usually it also depends on a ramp so if you are having a 64 she will have to ram laptop.

Who knows machine people are buying those GPU based machines today in these days then you can as well run like 4gb of data in your ram itself locals estimates if the data is in huge volume which tool is preferred instead of fire it's not about which tool is preferred or are you do all the things in our and you leverage the distributed systems which is an architecture layering our sits on top of spark or hives or Amazon Web Services in cloud services like - ah - your Google cloud platform so it has to sit on top of it in order to handle so much data you install are in your like we have installed are in my laptop Noren it also are in these big systems are is not integrated with any DV that is what they wish is asking is are integrated with any DV it's not integrated but you can always have older music connections there is no integration are is own format propriety format to store in the memory in the RAM and that's all it does is it better to have the working directory as an cloud folder line onedrive folder so row it is looking for a cloud for yes you can if you have enough space in your onedrive folder or any other cloud you can do that and not exactly sure you can do is in onedrive because one time especially I know it maps to your local system right you can really use a path if you have enough space yeah they wish our stores its own designers provide deform and in the what is ODB see ODBC but it's kind of a connector which connects so different databases ODBC is a protocol or an API which helps in connecting to different databases like if R has to connect with Oracle or teradata the backend systems I it cannot be always flat files then ODBC is used and all these setups will be done by the admin guys you don't worry about the setters and the data science enthusiasts in the scientist you should concentrate on hop once you get the data what you do about it how do you get insights out of it so all you do is we have an import data from CSV this is the most commonly.

Used data format in the industry and most readers and this will get this CSV data for data exploration model building and once they're done we did a model building engineers will help you scale your model in the systems like connecting it to ODBC JDBC or - we are we get a hybrid airbases raid all these things we've done don't worry about that yeah I think Sai has given the full form of what do we see how do we decide if we have to go for our files and I think both are pretty much capable of doing everything pretty much everything Pythian is a general-purpose programming language I mean I don't owe you a biased review but python has a wider coverage because you can do everything right if you want to design an API or you know to run some networking protocols all use it for the designs you can do everything in Python you can do scripting as well right but the issue is all the fighting is pretty famous you would realize when you talk to full stack software engineers is not very fast and the learning curve is also slow you know it is built on top of C and C++ contrary to that R is a very easy to use and the syntax is pretty much simple a simple copy of a table from other table right in Python is pretty complicated if you not just do a say table 1 is equal to be able to do okay just a quick info guys if you are interested in doing an end-to-end certification course in data science with our then in telepods provide just the right course for you you can find the course link in the description box below now let's continue with the session r is also built on CC plus this what I'm trying to highlight here is python is marketed as a general-purpose and very robust language I do not completely agree with it although I use Python our Lord whatever I'm trying to series says R is equally good and something special makes it very much complicated when you want to do trivial stuffs like hopping and table so there is an indexing concept which which is which was meant to make.

It faster but it makes things very much complicated right so for any data science work I think R is equally good I would say R has to make sure more on the deep learning stuff the neural networks and the AI stuff you know it's not very then the open suppose contributors are not very active although I've seen like each a name algorithm which is available in our is in Python and vice versa till late and there are few statistics functions rate or statistical tests which are not really mentioned in Python for example I was looking for a mean when you test and some other hypothesis string which I do not find in Python so it's difficult to decide which are in pythons I don't have a conclusion yeah both are equally good it depends on the requirement yeah one complaint I have I would not say complain but one observation which data engineers are sharing with 3ds this like when when we develop some algorithm and share with engineers to scale it right we also of course sit with them and do that but one of vision is it's difficult to debug the our code when it is integrated with cloud systems so the logs are not very elaborate to understand where the system is free while fine 10 because on this path and everything rained all the systems have good compatibility it's easier to give up so that's that's one difference which I have heard lately ok I think we can take up this kind of questions I am fine with it will slowly increase in pace so really asking about how is data handled if data is in a language other than English ok this is an interesting question thanks for this so let's talk about some machine translation this thing right let's say you want to do some recommendations and you have all the data in the form of tests this a Chinese some and green paint and all it is and say there is a product which is written in Chinese and it has some characters say the product is bread which is in Chinese so number one there are different EPS language ApS which understands is correct.

Us and you provide a screen you know a science and ASCII or they were number two it and you can uniquely identify a character if not is there are many interesting algorithms for example what - well what this essentially does is it kind of - all your Delta irrespective whatever language it is and based on the context window each in which it appears the co-occurrences of different words it can identify what are the other words which are similar to bread maybe synonymous to bread just based on the conference's so there is a Beautyblender science where you can irrespective of the language in which it is written you can leverage the maths behind it the the one of the features is co-occurrence to understand the synonyms okay and there are different aps which does it so far even form and really have utf-8 beauty of it I think is compatible with all the languages most of the common languages I hope that answers the question ok so so that's all about our main thing the bay six of our we will learn the functions in a bit so installation wise I hope you guys are able to install yeah so we can move on to variables so when we start learning our we should understand what are variables in our so variable is a temporary storage space where you can keep changing values like any the programming language we have this concept of variables and it is as easy as just typing in say X X is equal to two and then you type in X you get it - unless you want to do some operation X is equal to two plus three and then say X so this is fine so do you observe something there's a first time I'm using X and never declared if it's an integer or a float or direct or whatever so it dynamically it's dynamic based on the data you store in it it is related of course every programming language will have additive otherwise it cannot work as expected so here it is dynamic unlike okay we can compare with C and C++ you have to say in the fangs first right and then you say X is equal to declare the variable.

With the reading so it is user-friendly of course you know there are a lot of operations which happens in a back-end as soon as you say X you do in the back and it kind of reads and - as an integer and then assigns in as a property of this X variable rate so yeah it adds an overhead in the back end which has to do lot of operations but for the user it is user friendly so data types in are so data types are numeric character logical and complex so numeric all of us know any number like bead positive negative or even decimal is numeric detente and if you store X is equal to 5 which is an integer X equal to 5 point 5 it both of a decimal ok then X character you'll also store X is equal to hello world let's try it out I did not show everything so here it is as simple as Nick saying I want to convert X into a can't do it so I can directly write X is equal to hello world I now this becomes a character let's say I want to check the tile so how do we do I'm not sure about the command because okay this is class so I also kind of confuse between our invite them and that happens when you use these two languages interchangeably so use of class class of s gives you character weight and if you now say X is equal to once again and then say class of X is been it becomes numerator okay so numeric character and then we have logical logical is true false so it is more like 1 & 0 so if else condition you do an interest-rate you get true and false as two outputs so based on this you can design your code and develop some logic we'll look at it when we run some functions like checking if a string is a continuous in a string or if a character is present in a string we'll see and the last one is complex rarely used but all of you know what is a complex number rate it will have a real and imaginary component to it so like 30 minus 2 I so that is the real component twice the imagery component you can think of it like X and y axis the two coordinates 30 comma 2 and not sure how to.

Use it let's see because I have never used complex numbers let us say X is equal to 30 but he s can let me work this a use case for it we can try that out yes so now X is equal to 30 minus 2 right and if I say type of things you can think of a use case for it I'm sorry class of X so it is a complex ok I'm gonna take some few questions is there an order in which we take self-paced courses I would say this is the order because what I do is I start teaching the designs from the basis as far as to the best of my knowledge so I think this is perfect to start with itself because data sense doesn't have much of a prerequisite we will leverage the elementary mathematics and knowledge about simple mean median mode all those things right which are using your high school mathematics we just leverage that those knowledge there no not much physics so you can start with this and on top of this you can start building up your knowledge on if you want to learn big data and other stuff slightly the visualization right but I think this this is where you should start with their interesting Excel yeah excel is a platform but whatever I was trying to highlight was you should know what this mean you know what is average rate mean then why do we need any medium what is the need for even having a concept called median so I'm talking about the basics times that is where we should start with do we need to brush up elementary maths if so from where can we do it nickel is asking ok don't worry about brushing it up now if required you know during the sessions you realize that something is going on top of it over your head and length and talking really very high level then do that although I make sure what we learn is like on the very basics and any 813-c a student can understand you not going to do any rocket science here just the basic data and try to learn the art of how to leverage that data and how to treat Anila very basics so don't worry much what mathematics the mathematics what is.

Required I am going to discuss in little detail ok say you can declare both single quotes and double quotes theory and implement are side by side so the art file which you see on the screen that two of the three are files that is what we going to discuss as part of introduction to our and then we'll move on to the tribute or individual ation so we'll just work our data set we will have likes goes and columns in a structure and we will try to do some summarization like some and glow by and mean all those things ok so first we'll learn some data wrangling ok and then we'll learn some data starts and we'll come back to our to implement those charts so they'll follow the sequence which is more complex our advisor Python is more complex so we can get started with the concepts and send back an semantics with our there will be very emotional arts and classes will not stop multiple core data science or any data handling activities will not cover anything all those with it's fine' I mean this is pretty important to start with this will help you get started with or on board you know how you can leverage our for any of the basic statistical functions okay so the first one is data exploration where we'll cover objects in our meaning any programming language lab some multi site it could be in the form of placeholders where you can store your data right so we it is and these are all the rates from the traditional programming languages most of the objects are similar so we'll go through objects then flow control statements flow control is word about something like if this case when are maybe you know for loops while loops rate although you would not have need the loose a lot or the parasitic analysis when it's a good to have thing in it if you can automate your reports ultimate your data science this is this helps you you know do that then we'll cover a few inbuilt functions and move on to also you know defining how we can define our user-defined functions a course that will.

Cover it and manipulation and if time permits will also jump onto data visualization so that's the agenda for today so objects in are you see this tree diagram I think it's pretty exhaustive one which helps you understand all the objects so broadly classified into one-dimensional and multi-dimensional what is one-dimensional say if we just want to store say color of all the apples in your database so it's like one column right so one dimension and then we can have multi-dimensional way there you want color and size and something else so think of try to relate it to the reader so one-dimensional multi-dimensional under one-dimensional we have homogeneous and hydrogen's and same for the multi-dimensional array of homogeneous and heterogeneous so homogeneous will explain what is normal innocent understand so it's an open source world there times when few functions you know may not work in the latest version and may work in the older version and vice-versa so what the only companies do they they kind of stick to one version of our which is like latest but may not be very latest could be like one or two months old for which everything is very interested and all the packages stable so you do not worry about that but that is what happens in the industry so maybe installed three point six point one you speak to that for at least six months or not unless you observe a major releases coming okay so homogeneous and heterogeneous homogenous example is erected and it is another example is list and at a multi-dimensional we have matrix as matrix as homogeneous and datas remains heterogeneous so what are these vectors vector is a linear object which contains homogeneous elements so it is a collection of values that all have the same middle name so number one it is a linear object and second it contains all the values of the same data type so if we look at it see one two three so all the contents are in pieces the second one C true false all the contents are logical either.

True or false right you cannot mix one Andrew and two and false like that so this how it is just go to the chat window so the easiest way to check if R is working once you open heart you'll get a similar console right for something like X is equal to two and then I finish you should get two if R is working Oscar is not working you might consider installing or studio once again but based on whatever you are getting or what is the issue you're facing you can you can write to support the presentation is available for everyone you can just log into a little portal and now load from the I believe we just started Krishna nothing as we missed capital is asking what do you mean by linear object okay so when I say linear object as I said it is one-dimensional we don't have two dimensions to it no rows no columns it's like this one row or one column that's it if you think about a flat file so going back to the presentation here so when I say linear object that means either 1 2 or 1 column so you can think of a data set are a regular data set in any of the ER peas or back-end systems then how does it look like it has rows and columns right so you have a customer table so a customer will have each row represents one customer and each column will be the customer attributes the customer ID lion to file the customer is customer income and so on and so forth right so when we say linear object you can think of just one column say customer leaves and that is what weather can represent it cannot mix anything it's just one single one single column and with only one data type next we evaluate homogeneous so see 1 2 3 see true false or maybe in the in the in terms of AIDS see 20 30 40 okay look so most of the operations and most of the details is what you're gonna do will be on something called data frame does not even use vectors for any analysis right but it is good to know all that it did data objects which are available in our but most of the things what we will be doing is.

On top of a data stream okay psy is asking if only one row is present then can we save cylinders because data set has only one row or one column if one row is present it is one row okay so there is no confusion only one row is present is called one-one draw any sample or we can use vector okay not really now so what I will do is we are only we are only introducing this terms rate so vectors definitely can be leveraged when you want to automate so I will give you a scenario and not show the solution for now so scenario is if you want to automate let's say you want to impute the missing values you have thousand columns in a dataset you need a frame and you want to import and fill all the missing values with some random number say zero or something for now although there are different strategies to fill how to the missing values what you want to do that you can leverage the vectors okay fine so creating a vector it's pretty simple C is this function so if you wanna create an America vector you just is C and then 1 comma 2 comma 3 comma 4 comma 5 this means every element separated by a comma is one element Prakash impute means replacing any value is something so let's say you have a lot of missing values null so n is and if you would like to replace all the nerves with a zero right and that is called including more on that later that's a simple table so that's where you can apply the vector so reading a mentor is it's pretty simple you can just use the function C and within brackets you can pass on the elements in a comma several format right so this is one way of doing it and the most the basic way of doing it the second way is say if you want a sequence Yunus do none to see of 10 to 20 so what this will do is it will create a sequence from the range of sequence 1 10 to 20 10 11 12 and so on till 20 okay if you want to go with the character vector so remember it's how much in a state you cannot have that a number in the same editor so all the elements have to be.

Character so the format remains same see of it's a ABC so as it is a character it is important to enclose it with double quotes okay so you can also extend this to words anything close within a double quotes read everything includes double quotes is a character it will be one letter or you know combination letters like a swing so second vector is add to sign C within brackets this is sparta so this is this this is the first element is is the second element and Sparta is the third element right so see guys see here stands for create if you read the documentation C stands for create and the adder which you see here it's an arrow which is like an assignment operator it's just like equal to so you can either use equal sign or an arrow pointing towards the letter name so can two is the red team name right so this arrow is pointing towards well can name so let's really create these two vectors if I have this similar examples so you look at it vector one arrow see one two three four five six right you can either write it here or this copy this command copy this function and run it here in your console so what this will do this will create a vector with the elements 1 2 3 4 1 2 3 5 & 6 and how do you check it yesterday Fillion baby we learn this class right class we're Quan so what happens here is when you're working in our and say you have a complex code of thousand lines of code read and you want to edit or do some enhancements it's not like you'll always get a finish cold right so this is how you debug and checked each and every object in our you use the class command now here it says numerated as soon as it says numeric you should understand it's a numeric vector okay we will look at other you know read objects how what it denotes so far less will give you a list for metrics will give you metrics but they definitely will deter him so we'll check that in some time so we can also run the so I'm running this line now num1 see 10 to 20 so this is giving you old num1.

What is num1 now you can check here so then 11 12 13 14 15 till 20 so what i'm doing here is if you can observe the code snippet which i want to retain are written in this file and if you were to do some just some debris and always i don't want a number one to build in here like this I just want to check what num 1 contains right so I did not I like you know make it from here and directly type in here to check so this code will give you the same output this like what you've done for the second line the only difference is this code will not the retain its only for your debugging ok num 1 is not an array guys num 1 is still a vector and they equal to sign and the arrow sign can be used interchangeably this is interchangeable to the best of my knowledge ok so the only advantage of an arrow sign is if you use a vector say this is my definition of vector ID I can just type in the vector first like whatever you want in the editor and then you can also change the arrow direction and say this is my num 1 so this just gives you as a comfort of you know where to place the definition of the very thin hand where to give the contents contents on the left alright that's it sign if you're getting some errors I'm not sure whatever you're getting yeah unexpected comma between values you just need to copy this murder should have that with you this be careful you know how many commas you have in how many elements you have but guys I recommend please do not practice while we do this class okay I really don't recommend that the reason is I mean you waste a lot of time these are pretty simple functions don't feel insecure you can always practice at any time it is a very very simple file so don't feel inside of the syntax in tax is pretty easy R is a very easy to use language you can see these are like those simple functions just one time run and we will remember okay so did we create character vector we don't have okay that's fine I need not create it and they know everything so.

This is calculator okay next one is creating a logical letter so it is very similar you just need to say C of true/false to draw force whatever you want so to represents one falls represent seal and these are key words okay these are key words you cannot say TR EU or something it has to be TR UE from really and tftf is also it also represents true/false reverse it with us it is a short form of - okay so one thing we can check quickly is I am able to create these vectors so I have one vector which is which says to false oh I'm sorry I just messed it up okay so vector 2 has true and false two elements now you can ask me where can you use this true and false so this is where you can leverage this for any kind of automations say or you are attempting a very complex programming no challenge where you have to automate all your data manipulation this true-false true-false will really help you want to store say do you have is it a null or not right for all that occurs then you can store it in a forward and then pass it on to some of the system I mean these are the scenarios but you cannot define a specific scenario in like once you understand what can ever task you have in analytics it will be easier for me to explain where you can use a relative to automate your designs stuff later angling stuff okay now if you want to impute your data rate impute all of you know impute means replacing an element so let's say if your vector vector has some null values and null in R naught means something okay in any database we have something called null so not a little bit has nothing a missing value or a bland so gnarling R is represented by any this n a so let's say we have a vector which we create we have few pennies if you look at it vector - has anyone to nxd5 any it has like how many beep steel seven animals late and we have three missing values how do you tell takes a very very long because a huge vector right so you can just say is that any it returns a logical relative true.

False true false was rated so true means yes the value is any false means the well is not in a raid so this function is that in a will stay with you for a long thing this is helpful in imputing missing values which is one of the most important data science activities so we'll spend some time here on this topic okay now let's say if you want to impute and replace all this NH with zero so just use a simple if-else statement inference is a function it's again an inbuilt function if else and within brackets thus we have three parameters here okay one two and three so first one is the condition is not any first one is the condition the second one is what do you want to impute it with if this condition satisfied then what do you want imperial and if it does not what do you want that's the the way you write - statement so let me run this and let us take the victim ah so 0 1 2 0 3 5 0 all the anisa imputed by zero until the arrows are just for direction so if you write the definition first then you use the second one second option which you have given and if you have the milton name first when you give the first one it is just 2 it just gives a direction of what to process first and assign the world so the processing part will be first rate and then assigned to some variable so either way is same so this basic data object Inanna moving on okay so you can also find out the length so if you say length of the vector name will get the number of elements in the relative okay so few functions you can try yourself pretty straightforward accessing the elements of a vector so how do you access if we just want one of these right you can this a vector name and within bracket which element given axis 1 2 3 so let's say we have I think I have it somewhere else let me just find out accessing the elements I don't die if anything for now yeah I create fewer editors okay so length of number we have num1 is having five elements rate so we have five then length of Calvin Calvin has.

Three elements so it gives us three right then if you want to access a second element of can to will is capital this is can to right and we want a second element so all you have to do is just say care to care to and password to so this will give you it's either the second element range if you want the first element you just say care to one basically with this way additionally you can also go with it so if you want the one more than one element will print a rate or return so for example for my love to you want the first and third element so what do we do basically expects a vector of 1 and 3 so this is like we won first and third elements so true and true first and third it starts with this one here you'll observe your rate when we are saying data to of 1 it gives me this okay I think we have seen this imputing missing value next one is lizard list is a linear object which contains heterogeneous elements a list allows you to gather a variety of objects under the name a list may contain a combination of replacement rates those data frames and even other lists so so pretty powerful so what it's trying to say it's again linear object it's like although I would say it's not strictly linear village we'll we'll see that how it is like one single list of the names of this pixel in a table there it has one single list and it can contain heterogeneous elements as well and a syntax is you just say list within brackets say an integer a number one not one and then a string Spartan now these are two different data types but can we fit into one single list this can be packaged or zipped into one single list right so that's that one days of having a list you can have multiple data types in one single object now what does the last sentence mean it may contain a combination of letters matrices data frames and even other less so this means within a list you can have let's say this one number one string and you can also have other list matrices data frame so the first element could.

Be just one vector filter of length one second element could be a vector of length 1 1 element could be a whole data stream meaning thousand rows and n columns or the fourth element could be a matrix matrix so that's the beauty of list now how we apply it depending on the requirement Frank I have not used such a complex a still not because most of the operations are in detail frame you can think of some way how you can leverage the compose the most complex form of the list so creating a list using the list function you just say list within the brackets you pass on the elements ok so you can observe here the first list has all the elements of same length one a and true although these are different types first one is numeric second one is factor third one is logical but a lot of same length same for the list tube we have list C of 1 to C or a B and C of true/false all of same length there is two elements it in every element of the list are real answers so you just say within double brackets you give the element number which element your analysis so say second element is say mild s1 within level brackets to okay what happens in a list is because we have multi levels here right within an element I can have multiple elements so I would I might want to access an element with an element of a list so for that we have this double bracket Authority so say I won the second element of the third element in the list so I say my list too and then within double brackets three and then within single brackets or two so this will give me false so three and the second element is false right this will give me false okay let us execute these we see if I have this handy yeah so my list one it can always print it all your reference one a Andrew and then my list to say my list - I have tweaked the length the third element is has three sub elements right so you can also do that it need not be always of the same length first element is of has two elements second elements here two elements.

Third one has three okay and you'll observe I am using the C C's index within a list so we didn't list we can have a back tonight these are all vectors these are all relatives so if you want the second element of list one what is the second element two lists one just to make it more interactive my list one what is the second element two second element is - is it it's a right second element is a so if we do this - one of two the two of my list one this will be okay let's tweak it a little bit so I'll say - I'm sorry two and one so I what what am i doing it what gives - two of - one what is the expected output if you guys follow me Lily - - I am accessing the list - two and two to one I passed within double brackets to every single brackets one okay let's see perfect so this is what we expected in no we'll go back and so we can also name the elements of a list so what happens in a complex now is when you want to create you don't want to make it more user-friendly right to access the list you can just give a label to every element so syntax almost remains the same because we can say list and say first element is 85 second is 45 third is hundred what you want though specifically provide a label you can say Apple is already five most likely it is like the price of Apple per kg and then ISO banner per kg and by the world per kg rate so labeling the individual elements of the list is or you read list of label equal to the element level element like that so what this does is it makes it easy to access the list elements we just saved the list name and then dollar the element label and you it has 85 so I'd read this list and I can access 85 if you're - banana got a label just do this and replace apple vana our put should be 45 right no can just scrolling through the chat window to check them there are some queries so nanda is asking is object always associated with storage only so it's about store is yeah it's about storage and how you store it yeah and not sure.

What else normalization could be this for how you store it and then while data wrangling although you can leverage it so yes then we can define an element with just data within an object we can do that alien is like one row of data it's one dimensional okay if I happen to miss some question let's add windows in this coffee place and put it once again yeah dollar is used to get the value when you have a labeled list only in the case of label list number the name of elements has to be unique that is true so if you want to use the names within the little beyond the list right you have to use the dollar per atom this is named or a banana this way it will it is very very difficult to relate this to the real name datasets only object which is closer to the ultimate assets is a data frame and then is worth even over the most of the times when it's my job to explain all of the datatype state before jumping on to data frame okay yeah creepy so C of one two is given so that we can return multiple elements in one go so if I just say my log of one it will give me first element my log of 2 will give me second element but if I want both one first and second I have to pass C of one two and C means field create a vector so it is a vector of all the indices within a list and yes there are always guidelines for naming the variables you need to follow proper guidelines based on domain which you are working these are just examples but will me work with real time today you'll see how the way it was locally at least can you try the duplicate one at your end you know I just need to really cover other things so you can try and let me know if you get some error if we're gonna get we learned something new okay fine remember these are all I would say just the introduction to all these objects most of the times or I would say 99% times he'll be working with a data frame so that is where we will spin spend a lot of time on the data frame him so like a lot of objects which are seldom used.

You also have metrics for that matter in the next one is matrix I don't remember I have used it any time in the past till not until you go for some recommender systems where you only have one single and I read his number you know convert everything to a relative for example in matrix we have like thousands of movies right so every movie has to be the power editor in the fourth numbers so there may be where this will be useful but even that can be done using a little flame so data frame is a one-stop-shop for you all the little manipulations can be done using an inner frame but let's see what is the matrix matrix is a 2d object which contains homogeneous elements again two-dimensional meaning rows and columns if you see the output here we have rows and columns we have two rows and four columns in and the way you create is so the easiest way to do it is matrix c128 so we give a sequence and say 1 1 2 8 as my elements 1 2 3 4 5 6 7 8 and we want it in the form of two rows so it automatically splits your Delta into two rows okay so creating a matrix the syntax is just say matrix so for creating a vector you have C right just you see that represents feeling a vector how do you just pass one list lis see lists within the brackets you person elements how do you create a matrix you just say matrix maybe Ras matrix and within the brackets we want elements but because matrix is a two-dimensional object we need to be careful with how do you want your image to be so the first the first parameter is you have to say what elements you want element is so let's say I want one two three four which is a vector all these elements I want in a man wherever you see you observe this sea rail this guide to see this means it is creating a military so mental of these elements are there one two three four I want this in my mantis and how do I want it I want two rows so two rows so automatically if you have four elements and do rows you'll have two column sale so we'll have 1 2 3 & 4.

And the third one is by 0 is equal to true that means not assigning the elements to the placeholders and fill that Oh first so 1 2 & 3 & 4 ok then a slight variation you can save see ABCD again we have four elements and or equal to ok here we don't have any variation it's only filling the characters instead of numbers so you can do that if you do by du equal to false okay if we do by Rho equal to false so what this will do is it will first fill the first column a B and then C and D that's the only difference and you can also do it for the logical elements TFTs and you give the same output now how do you access the matrix so you need to use basically you need to pass on two things here one is lower number and the column number so if you same act one of two one two represents the row number and one represent the column number so the first one is the row number second column number now what happens if you just say mat 1 1 comma another that means first row and all the columns and if you same at one-nothing comma 1 comma 1 so that means all the rules and first column so this has length coordinates x and y coordinates to access different elements using these two parameters row number and column non-symmetric name within square brackets roll number column number if you remember the transpose so if you want to change the rows and columns in a matrix so you can use a transpose function okay so mat one is one two three and four so let us see one of our transpose so you just say P of mad one so what this will represent this see one and four will remain same because this was 1 1 into 2 so it cannot exchange the rows and column number 8 but the other two elements 3 & 2 rate see was 2 comma 1 & 2 was 1 comma do she just exchanges so all the elements except diagonal elements will get it exchanged numbers but not a combination of number and complex numbers so if I go back to the side matrix is homogeneous so whenever we have homogeneous rate only one biggadike is allowed so.

That's where you can use a data frame which is more realistic okay I'm just going through we did not get by row okay we'll see that now so let me open up and get this matrix so this is my mad one well it'll be pass we passed four elements one two three four we pass this to the matrix function and we said we want two rows and we also said by equal to true so what this did it filled elements one two three four so it is filling it by row row first first row fill everything then go to second row let me try by Rho equal to false so it did work and what is happening now we are not filling it by row now we're filling it by column first fill it in the sequence how to fill these elements in a sequence so fill the first column one two then go to the next column three and four we didn't miss any question Manish is asking you do not use C in the fruit list yeah the fruitless tree is the label list right there C is not required because for every element we have a label so that's a couple a difference saint-denis don't worry not the number Sona syntax of this data science is not about your hombres indices don't worry about the coding what do you guys can do is you can just leverage this PDF for this our fire rated as a I would say it's a reference ready reference first in taxes if you don't defend ring-opening right so that's all you should use it for don't try to remember it by heart I mean nobody expects and if you go for interviews right in a one-hour interview I guess right inverse will be a coding one they just want to know if you can if you know RN leverage all these things that's it nobody's expecting you to do a lot of coding when they the frame is available why would one need matrix I am so I can give an example matrix is faster when you have homogeneous elements and when we want to do something like cosine similarity that's what you might not know what is cosine simulant if you know well and good so things like you know vector multiply mean two matrices multiplication.

May it is easier when you have the matrix object instead of data frame data frame is assumed to be much more complicated in terms of how it stores data in the backend right because it has to be in the form of rows and columns with every column being heterogeneous right it's not similar one column will be number the column would be a date type format rate so what I'm trying to say matrix will be useful when you want to do a lot of mathematical operations on bigger matrices remember the mathematics where you would do mathematical sorry matrix multiplications so it is difficult is explaining now once you understand how a recommender system works and how this cosine similarity is you know used then I can give an example of how matrix is lit and my second answer is to this question I mean this programming language is a war rate it has to have all the objects and then it goes to Datagram it does name is the most immature object so initially these all things came this may not be used all of these may not be used but these are there in the programming language data frame is enough for most of the operations and that is why as a program as a if you have something in armory you have to check how you can really leverage them so one thing I remember is matrix gives you better performance when you have homogeneous kind of scenario you have to multiply two matrix and by cosine similarity and give you a little bit detail so what happens is every again then it will example every movie every movie can be represent in the form of matrix when I say matrix matrix will be in the form of say the feature rate movie could be from a Yahner movie has a length like how long is the movie movie has some actors or whatever so all these features within a movie could be the sum of numbers and that will nothing BMN nothing but a matrix if you want to understand the simulation into movies that is nothing but finding the similarity between two matrix so if you go by the geometry and you.

Know calculus we shall learnt in the high school you can multiply these two matrices and find out one single number which will give you the angle between these two so it might go over your head for now but bear with me I am giving a high level example so that angle will tell you how different of these two movies based on all the features which were embed in the matrix and in these cases matrix might be little faster compared to this multiplying two data frames okay and by the way into the frame I used no need to remember all the syntaxes not at all okay I is asking if placing by or equal to falls it all already looks like a transpose matrix right even in a transpose okay once you already fill a matrix with say thousand rows and thousand columns do not refill the matrix ray it's only while filling them into his first rain and pose that if you want to transpose it then swear transpose is used and by the way transpose has a lot of other applications it's not only just here so you might need transpose to find out the matrix inverses if you go with the linear algebra concepts say you want to find out same matrix a multiplied by X gives me a matrix B so if you want to find out matrix X that's where you will need lot of transpose you know operations so more on that later if you want to go deep you can start learning linear algebra concepts then linear programming concepts on top of matrices so transpose is a useful function for that okay so did we create went this one yeah matrix Y is created we also created matrix 2 which is all is all kindest so we killed it now ABC and B then matrix 3 we all all of these are logical elements true false true false you can print it John yes when two then man 3 all these you can print then you can access the elements if you want the first first draw all columns you can say mat 1 of 1 comma everything you get 1 & 3 then all the rows first column then you get 1 & 2 then second row and first column within this 8 to 1 and then if we do.

A transpose gives us 1 2 3 & 4 which is the Dynel event will remain same one for only the non diagonal elements will change and if you want to find out the average of say all the rows only second column rate so what do we have in the second column let's see in the matte one what do we have in the second column second column is 3 & 4 rate so what this function is saying mean of mat 1 comma 2 so what is there in mat 1 comma 2 we have three and four brain all the rows only second column and what were the mean mean would be does the average of three and four which is three point five three plus four is seven and seven we're way to do is three point five nine same thing and do four mean mad went to comma this which is three okay so observe the earth that the original matrix and the transpose matrix adjust length reversing the coordinates row and column that's it okay for the non square matrix I think I just executed hit here so what I meant by saying non square matrixes the rows and columns are the role length environment they're not equal in your two rows and three columns here right and on top of that if we do what and suppose D of man so what it does is it just change the coordinates so for example for the element one what was the row number and column number it was one one rate so if we just change it again b11 outward three three was one common to so it will become 2 comma 1 see ya two comma one right how about five five is 1 comma three in the original matrix now if you just

I'm not sure of what this is. I'll go back and check what this means. So, installing R: If you want to install R, you can go to this link and copy it. You can use this for installing R. So, all you have to do is open this link and click on download R for whichever operating system you have, like Linux, Mac, or Windows. Abdul is asking: Can you give an example of model building? That is what we are going to learn in this session. So when I say model building, quick example is: How do you predict data? We're talking a lot about sales. Sales is dependent on what? Sales is dependent on stock, the price, the quantity and quality. Let's take only these two variables. So let us say we have sales. I'll give you a field before we jump onto R. So, all of you can look at the link and search for this download for R for Windows, Mac, and Linux. this is where you have to click and download the software. So, sales is, say, dependent on your stock and when I say stock, it's the goods stock and not the stock price and then price of every product or maybe discount whatever you offer, and let me call it x1 x2, and x3, respectively. Sales is Y. Y is what you're going to predict. So, what is model building here? if I want to predict sales, I would like to multiply this with x1. Say, 2 times X1 plus 2 times X2 plus 3 times X3 is my sales. This is what you've assumed that given a stock of 10 units and the price is $5 per unit, and the discount you offer is 10%. Your total sales will be 20 plus 10 plus 30, that is 60. So, given these values of X variables, the sales will be 60, maybe in dollars or rupees or whatever. What do we need to find here? When we are building a model and when you want to predict something, what is missing here? We want to predict this coefficients. We know X1, we know X2, we know X3, and we want to predict Y. So, the idea of model building is solving this equation. this is just one example. Solving this equation to find out these parameters, and these parameters are represented by beta 1, beta 2, and beta 3.

So, if you're able to solve these equations and find out these values of beta, the predictions will be straightforward. It's just like replacing the variable. This is the exercise of model building. What are the methods, what are the algorithms, and what are the statistical processes you can apply in order to find the values of these beta 1, beta 2, and beta 3. I will take up the next question. Is model building dependent on categorical data or business requirement? Of course, both. So any kind of data: categorical, or on business requirement or anything. Whatever the business wants; that is the end goal. What is the vision, and based on that you look at the categorical, numerical, whatever data it is and then try to build the model, and building a model I think I explained now. Just a quick info guys: If you are interested in doing an end-to-end certification course in Data Science with R, then Intellipaat provides just the right course for you. You can find the course link in the description box below. Now let's continue with the session. Nishan is asking why only 30 percent for testing data, train and test split which I talked about. So that you can experiment with. Standard is 30 percent but people also do 60:40 or 80:20 depending on how much data you have. If you have sufficient data, even 20 percent will do because you want to keep aside some data for testing and that should be a good amount of data so that the test sample is sufficient to confidently validate the model. So moving on, once able to install R, the next one we have to install R studio. So R is the programming language, and once installed R, you will get an interface where you can work on, but it's not really user friendly and that's why we recommend installing R studio which is the industry wide and commonly used IDE or front-ends ID which can be used to work on R. So, the prerequisite to R studio is you have to have R installed in your system.

So do all of you have the R studio link? you have in the links, and this is what you have to click on, R studio desktop. So once you install R and R studio, you'll have a screen something like this. Typically an R studio screen will have four windows. So, let us understand these four windows. So this layout is pretty common and famous in the analytics space, so even if I'm working on Python, there are IDEs which support R studio kind of layout for Python because this is what everybody is used to in analytics. So the first one on the top left is where you write your code. This is where you your code and these are R files which you are going to go through when you start learning the syntax. So this is your space where you write your code, top left. So, let's say you are executing a line of code Y arrow sequence 1, 5 by=0.5. Don't worry about what this means. This is typically trying to create a sequence of numbers with a 0.5. So, I execute this. As soon as I execute this, the same code replicates here. How do I execute? Either I select this whole line and click on this run, else click anywhere on this line, bring the cursor anywhere on this line and use ctrl enter as a short cut. As soon as you run this, you see a replica here. This is your console. So you write the code here, and it runs in the console. Now there are times where you don't want to retain your code. You just want to use some code for testing. So you want to look at what is there in Y. So Y has 1, 1.5, 2, 2.5, etc. This is a sequence. So, if you just want to test it and not retain it, then you can use the console. Of course, both of these are running the same session, same R session. If you want to retain the code, write it here, and you do a file save, you can do a file save here. You have a save as option. If you don't want to retain the code, you can just type in the console and look at the results. The top right one is where you have the environment. So you can look at which variables you have created, which tables you have created, what are the contents of this tables as well.

For example, this is one table, this has 40 observations and 10 variables. Click on it. I am not able to see; you can click on the content here, but this is not the best way of looking at it, so if it is a data frame, if you can also find it somewhere here. So I'll talk about that when we run the datasets. So for now, you can look at the contents at a high level. This will give you a summary of data. All the variables which you create will be available in this window. The last one, bottom right, you can look at the plots. So, any plot which is generated will be available here. I can export the plot as well using the save as image or save as PDF. There are a lot of other options. As and when the requirement comes, I'll share those. So you should understand, we have four windows, that is good for now. So R studio is a set of integrated tools designed to help you be more productive in R. It includes a console, syntax highlighting editor that support direct code execution and a variety of robust tools for plotting, viewing history, debugging, and managing your workspace. So, the first and foremost thing which you should know here is your working directory. As soon as you open your R studio, you should find out what is your working directory. So how do I find my working directory. I'll just use getwd. This is a function in R which helps you locate your working directory. So getwd. So this is an inbuilt function not a custom function. It is available in R. This is what your working directory is. Now let's say you want to import all your files from this path this file, but if you want some other path from where you want to import your files, you can set your working directory, set using setwd. You can just change your path. Say for example I want only D Drive Intellipaat. So I'll just type in D Drive Intellipaat and click enter. Now to check that I can use get working directory and see that my working directory has changed.

Now I'll get back to my previous one. I will roll it back for now. So, we will look at it when we start importing R files. So I altered it to my previous working directory, so, setwd and getwd. R studio options are accessible from the options dialog. Tools options. So we have General R Options. we're talking about Tool Global options. These are not mandatory, but it is good to have. You should know what are these and what are the capabilities of R studio. So General R Options: Default CRAN mirror, initial working directory, workspace and history behavior. Source Code Editing, Appearance and Themes, Pane Layout. These are more to do with appearance and default settings. All you need to do is you open your R and start working. No customization required. If you want the black screen and a white text font, you can do that under Tools Global options, but those are not mandatory. No we are not going to use GUI. GUI is something R does not offer. There are different startups which builds GUI on top of R. That is not free of cost. and I'm not sure about R GUI, what exactly that, is but this is industry-wide accepted format and R studio is widely used everywhere. R studio is also very easy to install the packages so what are the packages in R? So as soon as you start working, you would need different packages to work. dplyr is one package which you are going to learn in detail which helps in data manipulation, data wrangling with a very easy to understand syntax. For example, you want to do filtering of data, then summarize, then adding columns, doing a sum and group by. Then you can use dplyr. Now all the packages will not be installed by default in R. What you have to do is you have to go to this packages. Under tools, go to install packages. so it'll connect to the CRAN repository. CRAN is a global repository for R which is maintained by certain group of individuals. although it's an open source, there has to be some regulation.

So that is the website which I shared for the R installation. that is a CRAN website and it maintains all the packages contributed by different programmers and associations. So, if you want to install the dplyr package. Type dplyr and click install. I am not doing it now but soon as I do it, it will start running some command in the console and install it for you. Just a quick info guys: If you are interested in doing an end-to-end certification course in data science with R, then Intellipaat provides just the right course for you. You can find the course link in the description box below. Now let's continue with the session will not take more than a minute the bladder but other package which is called ggplot2 which is used for data visualization will take some time before it installs all those themes other information don't give and when you do it when you do that you'll observe that this is what is on in the console installed packages and will in bracket we have beam plan something like this little random and good if they're solid and you know packaging it with a default solution that is good news phones but you have this default packages like the blur and you've got to I would recommend you know how to install packages because we would need install a lot of new packages based on for example people who are who prefer SQL ready so we have SQL dear package which can be used which can be used to write sql-like syntax na so all this data wrangling at least not may not be an expert in our syntax in this SQL syntax to do the data wrangling in our that requires a custom installation of SQL package which I'll show when we when we learn the classification emergence then the next one is people who are comfortable and are used to table in a Cell so same drag and drop and pivoting like operations you can do in are using the fact is our pivot table so that's why it is important to learn how to install the package okay other.

Things are not very important get SVN publishing and when please ignore for now is rarely used get an SVM or if you know what is SV in his radar gated so these are like Central Depository is where you can maintain the versioning of all of your code but yeah I mean you barely use it of course there are ways to install packages from outside the cran as well which is not recommended just like the Play Store or Apple Store in specially pays place to Android devices against all different have sorry but that's not recommendation that may not be secure they may not be stable way so it's good to install from crammed and if required you know you can also quantity will do this in open source setup radius also contribute free packages let us say there is a complex alpha complex operation for which you need to write a lot of lines of code multiple lines of course you can package it and release it in cram so there's a different process which you can read about it but the interests you can do that both are and buy them so this we have seen our studio GUI script window console window and environment window and the plots we have looked at are packages packages are so formally if you see packages and collections of our functions detained compiled code in a well different format the directory where packages are stored is called the library are comes with a standard set of packages or this are available for download and installation so Abdul is asking can we use our for data science lifecycle definitely that is why we are using hybrid so most of the things complete life cycles yes you can because our helps in will help you import data from different sources right you can do all kinds of data manipulation you can do all kinds of and all kinds of algorithms model building and then you also do data visualization right yes you can use but companies kind of use a combination of different tools sometimes they spend a lot of money on dedicated data visualization tools as well but.

Yes you can use startups though sometimes rely only on our end by ten lengths either of these for the complete end to end said but then the problem is like when the data is at scale you will need some distributed systems with our is an in-memory in-memory kind of set of plate so what happens is let's say you have four million customers as soon as you do an import this rate dot CSV is doing an import so all the 4 million records gets loaded in this memory that is in your RAM so that's why you would rely on some other scale system like big data right odd is not sufficient yeah on top of this high cloud is useful so I think the important ones will cover so install dot package is used for installing a package you can also use a GUI for our studio and if you want help for a package you can use library help is equal to package name or alternatively you can also go to this help window along this bottom a window and click on the help and find a topic anything say GG plot - do we have the help here oh this is search window sorry let me type in and check if you're able to search yeah so every time this is plot do you get the full details about who is maintaining it and where the contributors you you also get the github link you can look at the source code that's the beauty if you want to add more to it say in ggplot2 this one charter is not available one fancy chart you want like a diagram or donut chart or whatever or some animations which is not available you can always contribute you can report bugs and then you can read more about it they wish is asking what is the data limit are can handle or manage easily I would say not for them 300 MB because three enemies data size post that you do not a lot of data manipulation when you do that data gets replicated in your ramp so more than three an MB not recommended for data size and the structured data will not be more than that right usually it also depends on a ramp so if you are having a 64 she will have to ram laptop.

Who knows machine people are buying those GPU based machines today in these days then you can as well run like 4gb of data in your ram itself locals estimates if the data is in huge volume which tool is preferred instead of fire it's not about which tool is preferred or are you do all the things in our and you leverage the distributed systems which is an architecture layering our sits on top of spark or hives or Amazon Web Services in cloud services like - ah - your Google cloud platform so it has to sit on top of it in order to handle so much data you install are in your like we have installed are in my laptop Noren it also are in these big systems are is not integrated with any DV that is what they wish is asking is are integrated with any DV it's not integrated but you can always have older music connections there is no integration are is own format propriety format to store in the memory in the RAM and that's all it does is it better to have the working directory as an cloud folder line onedrive folder so row it is looking for a cloud for yes you can if you have enough space in your onedrive folder or any other cloud you can do that and not exactly sure you can do is in onedrive because one time especially I know it maps to your local system right you can really use a path if you have enough space yeah they wish our stores its own designers provide deform and in the what is ODB see ODBC but it's kind of a connector which connects so different databases ODBC is a protocol or an API which helps in connecting to different databases like if R has to connect with Oracle or teradata the backend systems I it cannot be always flat files then ODBC is used and all these setups will be done by the admin guys you don't worry about the setters and the data science enthusiasts in the scientist you should concentrate on hop once you get the data what you do about it how do you get insights out of it so all you do is we have an import data from CSV this is the most commonly.

Used data format in the industry and most readers and this will get this CSV data for data exploration model building and once they're done we did a model building engineers will help you scale your model in the systems like connecting it to ODBC JDBC or - we are we get a hybrid airbases raid all these things we've done don't worry about that yeah I think Sai has given the full form of what do we see how do we decide if we have to go for our files and I think both are pretty much capable of doing everything pretty much everything Pythian is a general-purpose programming language I mean I don't owe you a biased review but python has a wider coverage because you can do everything right if you want to design an API or you know to run some networking protocols all use it for the designs you can do everything in Python you can do scripting as well right but the issue is all the fighting is pretty famous you would realize when you talk to full stack software engineers is not very fast and the learning curve is also slow you know it is built on top of C and C++ contrary to that R is a very easy to use and the syntax is pretty much simple a simple copy of a table from other table right in Python is pretty complicated if you not just do a say table 1 is equal to be able to do okay just a quick info guys if you are interested in doing an end-to-end certification course in data science with our then in telepods provide just the right course for you you can find the course link in the description box below now let's continue with the session r is also built on CC plus this what I'm trying to highlight here is python is marketed as a general-purpose and very robust language I do not completely agree with it although I use Python our Lord whatever I'm trying to series says R is equally good and something special makes it very much complicated when you want to do trivial stuffs like hopping and table so there is an indexing concept which which is which was meant to make.

It faster but it makes things very much complicated right so for any data science work I think R is equally good I would say R has to make sure more on the deep learning stuff the neural networks and the AI stuff you know it's not very then the open suppose contributors are not very active although I've seen like each a name algorithm which is available in our is in Python and vice versa till late and there are few statistics functions rate or statistical tests which are not really mentioned in Python for example I was looking for a mean when you test and some other hypothesis string which I do not find in Python so it's difficult to decide which are in pythons I don't have a conclusion yeah both are equally good it depends on the requirement yeah one complaint I have I would not say complain but one observation which data engineers are sharing with 3ds this like when when we develop some algorithm and share with engineers to scale it right we also of course sit with them and do that but one of vision is it's difficult to debug the our code when it is integrated with cloud systems so the logs are not very elaborate to understand where the system is free while fine 10 because on this path and everything rained all the systems have good compatibility it's easier to give up so that's that's one difference which I have heard lately ok I think we can take up this kind of questions I am fine with it will slowly increase in pace so really asking about how is data handled if data is in a language other than English ok this is an interesting question thanks for this so let's talk about some machine translation this thing right let's say you want to do some recommendations and you have all the data in the form of tests this a Chinese some and green paint and all it is and say there is a product which is written in Chinese and it has some characters say the product is bread which is in Chinese so number one there are different EPS language ApS which understands is correct.

Us and you provide a screen you know a science and ASCII or they were number two it and you can uniquely identify a character if not is there are many interesting algorithms for example what - well what this essentially does is it kind of - all your Delta irrespective whatever language it is and based on the context window each in which it appears the co-occurrences of different words it can identify what are the other words which are similar to bread maybe synonymous to bread just based on the conference's so there is a Beautyblender science where you can irrespective of the language in which it is written you can leverage the maths behind it the the one of the features is co-occurrence to understand the synonyms okay and there are different aps which does it so far even form and really have utf-8 beauty of it I think is compatible with all the languages most of the common languages I hope that answers the question ok so so that's all about our main thing the bay six of our we will learn the functions in a bit so installation wise I hope you guys are able to install yeah so we can move on to variables so when we start learning our we should understand what are variables in our so variable is a temporary storage space where you can keep changing values like any the programming language we have this concept of variables and it is as easy as just typing in say X X is equal to two and then you type in X you get it - unless you want to do some operation X is equal to two plus three and then say X so this is fine so do you observe something there's a first time I'm using X and never declared if it's an integer or a float or direct or whatever so it dynamically it's dynamic based on the data you store in it it is related of course every programming language will have additive otherwise it cannot work as expected so here it is dynamic unlike okay we can compare with C and C++ you have to say in the fangs first right and then you say X is equal to declare the variable.

With the reading so it is user-friendly of course you know there are a lot of operations which happens in a back-end as soon as you say X you do in the back and it kind of reads and - as an integer and then assigns in as a property of this X variable rate so yeah it adds an overhead in the back end which has to do lot of operations but for the user it is user friendly so data types in are so data types are numeric character logical and complex so numeric all of us know any number like bead positive negative or even decimal is numeric detente and if you store X is equal to 5 which is an integer X equal to 5 point 5 it both of a decimal ok then X character you'll also store X is equal to hello world let's try it out I did not show everything so here it is as simple as Nick saying I want to convert X into a can't do it so I can directly write X is equal to hello world I now this becomes a character let's say I want to check the tile so how do we do I'm not sure about the command because okay this is class so I also kind of confuse between our invite them and that happens when you use these two languages interchangeably so use of class class of s gives you character weight and if you now say X is equal to once again and then say class of X is been it becomes numerator okay so numeric character and then we have logical logical is true false so it is more like 1 & 0 so if else condition you do an interest-rate you get true and false as two outputs so based on this you can design your code and develop some logic we'll look at it when we run some functions like checking if a string is a continuous in a string or if a character is present in a string we'll see and the last one is complex rarely used but all of you know what is a complex number rate it will have a real and imaginary component to it so like 30 minus 2 I so that is the real component twice the imagery component you can think of it like X and y axis the two coordinates 30 comma 2 and not sure how to.

Use it let's see because I have never used complex numbers let us say X is equal to 30 but he s can let me work this a use case for it we can try that out yes so now X is equal to 30 minus 2 right and if I say type of things you can think of a use case for it I'm sorry class of X so it is a complex ok I'm gonna take some few questions is there an order in which we take self-paced courses I would say this is the order because what I do is I start teaching the designs from the basis as far as to the best of my knowledge so I think this is perfect to start with itself because data sense doesn't have much of a prerequisite we will leverage the elementary mathematics and knowledge about simple mean median mode all those things right which are using your high school mathematics we just leverage that those knowledge there no not much physics so you can start with this and on top of this you can start building up your knowledge on if you want to learn big data and other stuff slightly the visualization right but I think this this is where you should start with their interesting Excel yeah excel is a platform but whatever I was trying to highlight was you should know what this mean you know what is average rate mean then why do we need any medium what is the need for even having a concept called median so I'm talking about the basics times that is where we should start with do we need to brush up elementary maths if so from where can we do it nickel is asking ok don't worry about brushing it up now if required you know during the sessions you realize that something is going on top of it over your head and length and talking really very high level then do that although I make sure what we learn is like on the very basics and any 813-c a student can understand you not going to do any rocket science here just the basic data and try to learn the art of how to leverage that data and how to treat Anila very basics so don't worry much what mathematics the mathematics what is.

Required I am going to discuss in little detail ok say you can declare both single quotes and double quotes theory and implement are side by side so the art file which you see on the screen that two of the three are files that is what we going to discuss as part of introduction to our and then we'll move on to the tribute or individual ation so we'll just work our data set we will have likes goes and columns in a structure and we will try to do some summarization like some and glow by and mean all those things ok so first we'll learn some data wrangling ok and then we'll learn some data starts and we'll come back to our to implement those charts so they'll follow the sequence which is more complex our advisor Python is more complex so we can get started with the concepts and send back an semantics with our there will be very emotional arts and classes will not stop multiple core data science or any data handling activities will not cover anything all those with it's fine' I mean this is pretty important to start with this will help you get started with or on board you know how you can leverage our for any of the basic statistical functions okay so the first one is data exploration where we'll cover objects in our meaning any programming language lab some multi site it could be in the form of placeholders where you can store your data right so we it is and these are all the rates from the traditional programming languages most of the objects are similar so we'll go through objects then flow control statements flow control is word about something like if this case when are maybe you know for loops while loops rate although you would not have need the loose a lot or the parasitic analysis when it's a good to have thing in it if you can automate your reports ultimate your data science this is this helps you you know do that then we'll cover a few inbuilt functions and move on to also you know defining how we can define our user-defined functions a course that will.

Cover it and manipulation and if time permits will also jump onto data visualization so that's the agenda for today so objects in are you see this tree diagram I think it's pretty exhaustive one which helps you understand all the objects so broadly classified into one-dimensional and multi-dimensional what is one-dimensional say if we just want to store say color of all the apples in your database so it's like one column right so one dimension and then we can have multi-dimensional way there you want color and size and something else so think of try to relate it to the reader so one-dimensional multi-dimensional under one-dimensional we have homogeneous and hydrogen's and same for the multi-dimensional array of homogeneous and heterogeneous so homogeneous will explain what is normal innocent understand so it's an open source world there times when few functions you know may not work in the latest version and may work in the older version and vice-versa so what the only companies do they they kind of stick to one version of our which is like latest but may not be very latest could be like one or two months old for which everything is very interested and all the packages stable so you do not worry about that but that is what happens in the industry so maybe installed three point six point one you speak to that for at least six months or not unless you observe a major releases coming okay so homogeneous and heterogeneous homogenous example is erected and it is another example is list and at a multi-dimensional we have matrix as matrix as homogeneous and datas remains heterogeneous so what are these vectors vector is a linear object which contains homogeneous elements so it is a collection of values that all have the same middle name so number one it is a linear object and second it contains all the values of the same data type so if we look at it see one two three so all the contents are in pieces the second one C true false all the contents are logical either.

True or false right you cannot mix one Andrew and two and false like that so this how it is just go to the chat window so the easiest way to check if R is working once you open heart you'll get a similar console right for something like X is equal to two and then I finish you should get two if R is working Oscar is not working you might consider installing or studio once again but based on whatever you are getting or what is the issue you're facing you can you can write to support the presentation is available for everyone you can just log into a little portal and now load from the I believe we just started Krishna nothing as we missed capital is asking what do you mean by linear object okay so when I say linear object as I said it is one-dimensional we don't have two dimensions to it no rows no columns it's like this one row or one column that's it if you think about a flat file so going back to the presentation here so when I say linear object that means either 1 2 or 1 column so you can think of a data set are a regular data set in any of the ER peas or back-end systems then how does it look like it has rows and columns right so you have a customer table so a customer will have each row represents one customer and each column will be the customer attributes the customer ID lion to file the customer is customer income and so on and so forth right so when we say linear object you can think of just one column say customer leaves and that is what weather can represent it cannot mix anything it's just one single one single column and with only one data type next we evaluate homogeneous so see 1 2 3 see true false or maybe in the in the in terms of AIDS see 20 30 40 okay look so most of the operations and most of the details is what you're gonna do will be on something called data frame does not even use vectors for any analysis right but it is good to know all that it did data objects which are available in our but most of the things what we will be doing is.

On top of a data stream okay psy is asking if only one row is present then can we save cylinders because data set has only one row or one column if one row is present it is one row okay so there is no confusion only one row is present is called one-one draw any sample or we can use vector okay not really now so what I will do is we are only we are only introducing this terms rate so vectors definitely can be leveraged when you want to automate so I will give you a scenario and not show the solution for now so scenario is if you want to automate let's say you want to impute the missing values you have thousand columns in a dataset you need a frame and you want to import and fill all the missing values with some random number say zero or something for now although there are different strategies to fill how to the missing values what you want to do that you can leverage the vectors okay fine so creating a vector it's pretty simple C is this function so if you wanna create an America vector you just is C and then 1 comma 2 comma 3 comma 4 comma 5 this means every element separated by a comma is one element Prakash impute means replacing any value is something so let's say you have a lot of missing values null so n is and if you would like to replace all the nerves with a zero right and that is called including more on that later that's a simple table so that's where you can apply the vector so reading a mentor is it's pretty simple you can just use the function C and within brackets you can pass on the elements in a comma several format right so this is one way of doing it and the most the basic way of doing it the second way is say if you want a sequence Yunus do none to see of 10 to 20 so what this will do is it will create a sequence from the range of sequence 1 10 to 20 10 11 12 and so on till 20 okay if you want to go with the character vector so remember it's how much in a state you cannot have that a number in the same editor so all the elements have to be.

Character so the format remains same see of it's a ABC so as it is a character it is important to enclose it with double quotes okay so you can also extend this to words anything close within a double quotes read everything includes double quotes is a character it will be one letter or you know combination letters like a swing so second vector is add to sign C within brackets this is sparta so this is this this is the first element is is the second element and Sparta is the third element right so see guys see here stands for create if you read the documentation C stands for create and the adder which you see here it's an arrow which is like an assignment operator it's just like equal to so you can either use equal sign or an arrow pointing towards the letter name so can two is the red team name right so this arrow is pointing towards well can name so let's really create these two vectors if I have this similar examples so you look at it vector one arrow see one two three four five six right you can either write it here or this copy this command copy this function and run it here in your console so what this will do this will create a vector with the elements 1 2 3 4 1 2 3 5 & 6 and how do you check it yesterday Fillion baby we learn this class right class we're Quan so what happens here is when you're working in our and say you have a complex code of thousand lines of code read and you want to edit or do some enhancements it's not like you'll always get a finish cold right so this is how you debug and checked each and every object in our you use the class command now here it says numerated as soon as it says numeric you should understand it's a numeric vector okay we will look at other you know read objects how what it denotes so far less will give you a list for metrics will give you metrics but they definitely will deter him so we'll check that in some time so we can also run the so I'm running this line now num1 see 10 to 20 so this is giving you old num1.

What is num1 now you can check here so then 11 12 13 14 15 till 20 so what i'm doing here is if you can observe the code snippet which i want to retain are written in this file and if you were to do some just some debris and always i don't want a number one to build in here like this I just want to check what num 1 contains right so I did not I like you know make it from here and directly type in here to check so this code will give you the same output this like what you've done for the second line the only difference is this code will not the retain its only for your debugging ok num 1 is not an array guys num 1 is still a vector and they equal to sign and the arrow sign can be used interchangeably this is interchangeable to the best of my knowledge ok so the only advantage of an arrow sign is if you use a vector say this is my definition of vector ID I can just type in the vector first like whatever you want in the editor and then you can also change the arrow direction and say this is my num 1 so this just gives you as a comfort of you know where to place the definition of the very thin hand where to give the contents contents on the left alright that's it sign if you're getting some errors I'm not sure whatever you're getting yeah unexpected comma between values you just need to copy this murder should have that with you this be careful you know how many commas you have in how many elements you have but guys I recommend please do not practice while we do this class okay I really don't recommend that the reason is I mean you waste a lot of time these are pretty simple functions don't feel insecure you can always practice at any time it is a very very simple file so don't feel inside of the syntax in tax is pretty easy R is a very easy to use language you can see these are like those simple functions just one time run and we will remember okay so did we create character vector we don't have okay that's fine I need not create it and they know everything so.

This is calculator okay next one is creating a logical letter so it is very similar you just need to say C of true/false to draw force whatever you want so to represents one falls represent seal and these are key words okay these are key words you cannot say TR EU or something it has to be TR UE from really and tftf is also it also represents true/false reverse it with us it is a short form of - okay so one thing we can check quickly is I am able to create these vectors so I have one vector which is which says to false oh I'm sorry I just messed it up okay so vector 2 has true and false two elements now you can ask me where can you use this true and false so this is where you can leverage this for any kind of automations say or you are attempting a very complex programming no challenge where you have to automate all your data manipulation this true-false true-false will really help you want to store say do you have is it a null or not right for all that occurs then you can store it in a forward and then pass it on to some of the system I mean these are the scenarios but you cannot define a specific scenario in like once you understand what can ever task you have in analytics it will be easier for me to explain where you can use a relative to automate your designs stuff later angling stuff okay now if you want to impute your data rate impute all of you know impute means replacing an element so let's say if your vector vector has some null values and null in R naught means something okay in any database we have something called null so not a little bit has nothing a missing value or a bland so gnarling R is represented by any this n a so let's say we have a vector which we create we have few pennies if you look at it vector - has anyone to nxd5 any it has like how many beep steel seven animals late and we have three missing values how do you tell takes a very very long because a huge vector right so you can just say is that any it returns a logical relative true.

False true false was rated so true means yes the value is any false means the well is not in a raid so this function is that in a will stay with you for a long thing this is helpful in imputing missing values which is one of the most important data science activities so we'll spend some time here on this topic okay now let's say if you want to impute and replace all this NH with zero so just use a simple if-else statement inference is a function it's again an inbuilt function if else and within brackets thus we have three parameters here okay one two and three so first one is the condition is not any first one is the condition the second one is what do you want to impute it with if this condition satisfied then what do you want imperial and if it does not what do you want that's the the way you write - statement so let me run this and let us take the victim ah so 0 1 2 0 3 5 0 all the anisa imputed by zero until the arrows are just for direction so if you write the definition first then you use the second one second option which you have given and if you have the milton name first when you give the first one it is just 2 it just gives a direction of what to process first and assign the world so the processing part will be first rate and then assigned to some variable so either way is same so this basic data object Inanna moving on okay so you can also find out the length so if you say length of the vector name will get the number of elements in the relative okay so few functions you can try yourself pretty straightforward accessing the elements of a vector so how do you access if we just want one of these right you can this a vector name and within bracket which element given axis 1 2 3 so let's say we have I think I have it somewhere else let me just find out accessing the elements I don't die if anything for now yeah I create fewer editors okay so length of number we have num1 is having five elements rate so we have five then length of Calvin Calvin has.

Three elements so it gives us three right then if you want to access a second element of can to will is capital this is can to right and we want a second element so all you have to do is just say care to care to and password to so this will give you it's either the second element range if you want the first element you just say care to one basically with this way additionally you can also go with it so if you want the one more than one element will print a rate or return so for example for my love to you want the first and third element so what do we do basically expects a vector of 1 and 3 so this is like we won first and third elements so true and true first and third it starts with this one here you'll observe your rate when we are saying data to of 1 it gives me this okay I think we have seen this imputing missing value next one is lizard list is a linear object which contains heterogeneous elements a list allows you to gather a variety of objects under the name a list may contain a combination of replacement rates those data frames and even other lists so so pretty powerful so what it's trying to say it's again linear object it's like although I would say it's not strictly linear village we'll we'll see that how it is like one single list of the names of this pixel in a table there it has one single list and it can contain heterogeneous elements as well and a syntax is you just say list within brackets say an integer a number one not one and then a string Spartan now these are two different data types but can we fit into one single list this can be packaged or zipped into one single list right so that's that one days of having a list you can have multiple data types in one single object now what does the last sentence mean it may contain a combination of letters matrices data frames and even other less so this means within a list you can have let's say this one number one string and you can also have other list matrices data frame so the first element could.

Be just one vector filter of length one second element could be a vector of length 1 1 element could be a whole data stream meaning thousand rows and n columns or the fourth element could be a matrix matrix so that's the beauty of list now how we apply it depending on the requirement Frank I have not used such a complex a still not because most of the operations are in detail frame you can think of some way how you can leverage the compose the most complex form of the list so creating a list using the list function you just say list within the brackets you pass on the elements ok so you can observe here the first list has all the elements of same length one a and true although these are different types first one is numeric second one is factor third one is logical but a lot of same length same for the list tube we have list C of 1 to C or a B and C of true/false all of same length there is two elements it in every element of the list are real answers so you just say within double brackets you give the element number which element your analysis so say second element is say mild s1 within level brackets to okay what happens in a list is because we have multi levels here right within an element I can have multiple elements so I would I might want to access an element with an element of a list so for that we have this double bracket Authority so say I won the second element of the third element in the list so I say my list too and then within double brackets three and then within single brackets or two so this will give me false so three and the second element is false right this will give me false okay let us execute these we see if I have this handy yeah so my list one it can always print it all your reference one a Andrew and then my list to say my list - I have tweaked the length the third element is has three sub elements right so you can also do that it need not be always of the same length first element is of has two elements second elements here two elements.

Third one has three okay and you'll observe I am using the C C's index within a list so we didn't list we can have a back tonight these are all vectors these are all relatives so if you want the second element of list one what is the second element two lists one just to make it more interactive my list one what is the second element two second element is - is it it's a right second element is a so if we do this - one of two the two of my list one this will be okay let's tweak it a little bit so I'll say - I'm sorry two and one so I what what am i doing it what gives - two of - one what is the expected output if you guys follow me Lily - - I am accessing the list - two and two to one I passed within double brackets to every single brackets one okay let's see perfect so this is what we expected in no we'll go back and so we can also name the elements of a list so what happens in a complex now is when you want to create you don't want to make it more user-friendly right to access the list you can just give a label to every element so syntax almost remains the same because we can say list and say first element is 85 second is 45 third is hundred what you want though specifically provide a label you can say Apple is already five most likely it is like the price of Apple per kg and then ISO banner per kg and by the world per kg rate so labeling the individual elements of the list is or you read list of label equal to the element level element like that so what this does is it makes it easy to access the list elements we just saved the list name and then dollar the element label and you it has 85 so I'd read this list and I can access 85 if you're - banana got a label just do this and replace apple vana our put should be 45 right no can just scrolling through the chat window to check them there are some queries so nanda is asking is object always associated with storage only so it's about store is yeah it's about storage and how you store it yeah and not sure.

What else normalization could be this for how you store it and then while data wrangling although you can leverage it so yes then we can define an element with just data within an object we can do that alien is like one row of data it's one dimensional okay if I happen to miss some question let's add windows in this coffee place and put it once again yeah dollar is used to get the value when you have a labeled list only in the case of label list number the name of elements has to be unique that is true so if you want to use the names within the little beyond the list right you have to use the dollar per atom this is named or a banana this way it will it is very very difficult to relate this to the real name datasets only object which is closer to the ultimate assets is a data frame and then is worth even over the most of the times when it's my job to explain all of the datatype state before jumping on to data frame okay yeah creepy so C of one two is given so that we can return multiple elements in one go so if I just say my log of one it will give me first element my log of 2 will give me second element but if I want both one first and second I have to pass C of one two and C means field create a vector so it is a vector of all the indices within a list and yes there are always guidelines for naming the variables you need to follow proper guidelines based on domain which you are working these are just examples but will me work with real time today you'll see how the way it was locally at least can you try the duplicate one at your end you know I just need to really cover other things so you can try and let me know if you get some error if we're gonna get we learned something new okay fine remember these are all I would say just the introduction to all these objects most of the times or I would say 99% times he'll be working with a data frame so that is where we will spin spend a lot of time on the data frame him so like a lot of objects which are seldom used.

You also have metrics for that matter in the next one is matrix I don't remember I have used it any time in the past till not until you go for some recommender systems where you only have one single and I read his number you know convert everything to a relative for example in matrix we have like thousands of movies right so every movie has to be the power editor in the fourth numbers so there may be where this will be useful but even that can be done using a little flame so data frame is a one-stop-shop for you all the little manipulations can be done using an inner frame but let's see what is the matrix matrix is a 2d object which contains homogeneous elements again two-dimensional meaning rows and columns if you see the output here we have rows and columns we have two rows and four columns in and the way you create is so the easiest way to do it is matrix c128 so we give a sequence and say 1 1 2 8 as my elements 1 2 3 4 5 6 7 8 and we want it in the form of two rows so it automatically splits your Delta into two rows okay so creating a matrix the syntax is just say matrix so for creating a vector you have C right just you see that represents feeling a vector how do you just pass one list lis see lists within the brackets you person elements how do you create a matrix you just say matrix maybe Ras matrix and within the brackets we want elements but because matrix is a two-dimensional object we need to be careful with how do you want your image to be so the first the first parameter is you have to say what elements you want element is so let's say I want one two three four which is a vector all these elements I want in a man wherever you see you observe this sea rail this guide to see this means it is creating a military so mental of these elements are there one two three four I want this in my mantis and how do I want it I want two rows so two rows so automatically if you have four elements and do rows you'll have two column sale so we'll have 1 2 3 & 4.

And the third one is by 0 is equal to true that means not assigning the elements to the placeholders and fill that Oh first so 1 2 & 3 & 4 ok then a slight variation you can save see ABCD again we have four elements and or equal to ok here we don't have any variation it's only filling the characters instead of numbers so you can do that if you do by du equal to false okay if we do by Rho equal to false so what this will do is it will first fill the first column a B and then C and D that's the only difference and you can also do it for the logical elements TFTs and you give the same output now how do you access the matrix so you need to use basically you need to pass on two things here one is lower number and the column number so if you same act one of two one two represents the row number and one represent the column number so the first one is the row number second column number now what happens if you just say mat 1 1 comma another that means first row and all the columns and if you same at one-nothing comma 1 comma 1 so that means all the rules and first column so this has length coordinates x and y coordinates to access different elements using these two parameters row number and column non-symmetric name within square brackets roll number column number if you remember the transpose so if you want to change the rows and columns in a matrix so you can use a transpose function okay so mat one is one two three and four so let us see one of our transpose so you just say P of mad one so what this will represent this see one and four will remain same because this was 1 1 into 2 so it cannot exchange the rows and column number 8 but the other two elements 3 & 2 rate see was 2 comma 1 & 2 was 1 comma do she just exchanges so all the elements except diagonal elements will get it exchanged numbers but not a combination of number and complex numbers so if I go back to the side matrix is homogeneous so whenever we have homogeneous rate only one biggadike is allowed so.

That's where you can use a data frame which is more realistic okay I'm just going through we did not get by row okay we'll see that now so let me open up and get this matrix so this is my mad one well it'll be pass we passed four elements one two three four we pass this to the matrix function and we said we want two rows and we also said by equal to true so what this did it filled elements one two three four so it is filling it by row row first first row fill everything then go to second row let me try by Rho equal to false so it did work and what is happening now we are not filling it by row now we're filling it by column first fill it in the sequence how to fill these elements in a sequence so fill the first column one two then go to the next column three and four we didn't miss any question Manish is asking you do not use C in the fruit list yeah the fruitless tree is the label list right there C is not required because for every element we have a label so that's a couple a difference saint-denis don't worry not the number Sona syntax of this data science is not about your hombres indices don't worry about the coding what do you guys can do is you can just leverage this PDF for this our fire rated as a I would say it's a reference ready reference first in taxes if you don't defend ring-opening right so that's all you should use it for don't try to remember it by heart I mean nobody expects and if you go for interviews right in a one-hour interview I guess right inverse will be a coding one they just want to know if you can if you know RN leverage all these things that's it nobody's expecting you to do a lot of coding when they the frame is available why would one need matrix I am so I can give an example matrix is faster when you have homogeneous elements and when we want to do something like cosine similarity that's what you might not know what is cosine simulant if you know well and good so things like you know vector multiply mean two matrices multiplication.

May it is easier when you have the matrix object instead of data frame data frame is assumed to be much more complicated in terms of how it stores data in the backend right because it has to be in the form of rows and columns with every column being heterogeneous right it's not similar one column will be number the column would be a date type format rate so what I'm trying to say matrix will be useful when you want to do a lot of mathematical operations on bigger matrices remember the mathematics where you would do mathematical sorry matrix multiplications so it is difficult is explaining now once you understand how a recommender system works and how this cosine similarity is you know used then I can give an example of how matrix is lit and my second answer is to this question I mean this programming language is a war rate it has to have all the objects and then it goes to Datagram it does name is the most immature object so initially these all things came this may not be used all of these may not be used but these are there in the programming language data frame is enough for most of the operations and that is why as a program as a if you have something in armory you have to check how you can really leverage them so one thing I remember is matrix gives you better performance when you have homogeneous kind of scenario you have to multiply two matrix and by cosine similarity and give you a little bit detail so what happens is every again then it will example every movie every movie can be represent in the form of matrix when I say matrix matrix will be in the form of say the feature rate movie could be from a Yahner movie has a length like how long is the movie movie has some actors or whatever so all these features within a movie could be the sum of numbers and that will nothing BMN nothing but a matrix if you want to understand the simulation into movies that is nothing but finding the similarity between two matrix so if you go by the geometry and you.

Know calculus we shall learnt in the high school you can multiply these two matrices and find out one single number which will give you the angle between these two so it might go over your head for now but bear with me I am giving a high level example so that angle will tell you how different of these two movies based on all the features which were embed in the matrix and in these cases matrix might be little faster compared to this multiplying two data frames okay and by the way into the frame I used no need to remember all the syntaxes not at all okay I is asking if placing by or equal to falls it all already looks like a transpose matrix right even in a transpose okay once you already fill a matrix with say thousand rows and thousand columns do not refill the matrix ray it's only while filling them into his first rain and pose that if you want to transpose it then swear transpose is used and by the way transpose has a lot of other applications it's not only just here so you might need transpose to find out the matrix inverses if you go with the linear algebra concepts say you want to find out same matrix a multiplied by X gives me a matrix B so if you want to find out matrix X that's where you will need lot of transpose you know operations so more on that later if you want to go deep you can start learning linear algebra concepts then linear programming concepts on top of matrices so transpose is a useful function for that okay so did we create went this one yeah matrix Y is created we also created matrix 2 which is all is all kindest so we killed it now ABC and B then matrix 3 we all all of these are logical elements true false true false you can print it John yes when two then man 3 all these you can print then you can access the elements if you want the first first draw all columns you can say mat 1 of 1 comma everything you get 1 & 3 then all the rows first column then you get 1 & 2 then second row and first column within this 8 to 1 and then if we do.

A transpose gives us 1 2 3 & 4 which is the Dynel event will remain same one for only the non diagonal elements will change and if you want to find out the average of say all the rows only second column rate so what do we have in the second column let's see in the matte one what do we have in the second column second column is 3 & 4 rate so what this function is saying mean of mat 1 comma 2 so what is there in mat 1 comma 2 we have three and four brain all the rows only second column and what were the mean mean would be does the average of three and four which is three point five three plus four is seven and seven we're way to do is three point five nine same thing and do four mean mad went to comma this which is three okay so observe the earth that the original matrix and the transpose matrix adjust length reversing the coordinates row and column that's it okay for the non square matrix I think I just executed hit here so what I meant by saying non square matrixes the rows and columns are the role length environment they're not equal in your two rows and three columns here right and on top of that if we do what and suppose D of man so what it does is it just change the coordinates so for example for the element one what was the row number and column number it was one one rate so if we just change it again b11 outward three three was one common to so it will become 2 comma 1 see ya two comma one right how about five five is 1 comma three in the original matrix now if you just