Data Analytics For Beginners | Introduction To Data Analytics | Intellipaat
Hey guys welcome to the session by.
Intellipaat data analytics is considered as an important aspect in any data-driven business today top companies like Google Facebook Amazon and many more use data analytics to drive their business to success in today's session we'll be having a quick introduction to the world of data analytics before we get started please do subscribe to Intellipaat's youtube channel to get some instant notification about our upcoming videos now let's take a look at the agenda for the day we'll start off by understanding what is data analytics then we'll move on to understand the need of data analytics in an organization thirdly we will also find out who our data analysts and what are the roles responsibilities tasks that they would be doing on a daily basis fourthly we will also deep dive into the types of data analytics and its lifecycle finally we will also check out the use case on data analytics and if you have any questions clarifications concerns please put it down in the comment section below we are glad and happy to help you out also guys if you are interested in an end-to-end course in data analytics please do check out intellipaat's data analytics training course that would help you to master the concepts of data analytics thoroughly details for the same is available in the description box below without any further delays now let's get started we've all heard this term somewhere it pretty much rings bell that every time you think about it right so analytics with respect to data well what are we trying to analyze is what we need to know here right so the formal definition of how it goes is basically extracting a meaningful information from raw data it's as simple as that so we have data right so consider anything to be data if that is not useful to you at that point it is considered as data but then if you perform some operation on this data and make sure that it is useful for your organization this is when the data becomes information and this information is what is usable for you right so the process of pretty much converting raw data into information by performing certain analysis on it can be termed as data analytics case.
I mean this is just a very rough definition of what goes in the industry to give you a better insight of what that means basically it is the pursuit of extracting meaning from raw data using specialized computer systems guys and then these systems they transform the organized and they model the data end of it is to basically draw the conclusions from all of the data so you will have again as I've mentioned with the raw example our goal here is to basically draw conclusions to identify patterns to make use of this raw information in a better way I mean sure data can just be used in its raw form but then not with every case right and to give you an example so think of the data probably you understand so let's say you're looking at a huge or data set which contains thousands of values those are just numbers right so having these numbers sure as a data analyst you will understand what you're doing but then let's say you need to explain it to a person who does know what data analytics is or you need to explain it to someone probably your pure or someone above you as well let's say you business meeting where you have to explain this so giving the numbers will not really be that great.
I did not make a very good presentation so converting all of these numerical data making it into graphs making it in a way where everyone understands the data and it's showing out graphically whatever insights that you can generate from the data that can be considered as data analytics as well guys again today basically the field of data analytics is ever so growing rapidly and why is it growing so rapidly well because the market demand for it is that much so we have pretty much every you know startup companies these days have a requirement for big data as well so big data or to dumb down the definition it pretty much means a huge amount of data that cannot be handled by just one machine if you're stretching out your data across multiple nodes you have data coming in from various sources then you need a method to handle all of this big data that's coming in or to process the data and then to perform analysis on it guys so this has been in demand in the market for a while and for the last couple of years data analytics has been in the boom and then it's providing a very straight answer to all of the questions being raised about how all these can be handled guys and of course we're going to need the people who have the skills which is needed in a skills which are needed to manipulate all the data queries to translate all these numbers into graphs and to make insightful analysis on it right so that's what is the main key role of a data analytic person.
Oh what does a need for data analytics if you have to break it down into three steps guys so again you already know that it is on the rise right so I can pretty much go about to step out into the open and then tell you that it is not soon that it will be an integral part of an organization it is already an integral part of every organization there is guys so I quick info guys in case you are interested in doing an end-to-end course in data analytics please do check out intellipaat's data analytics training course it will help you to master the concepts of data analytics thoroughly details for the same is available in the description box below please do check out so the first important need is again we already know that it is a top priority for all of these organizations ranging from these small organizations all the way to the big guys and then this is needed to make very good or decision making well with respect to you or being confused to let's say take a decision for you then looking at this analytics will make your decision norms making skills and decision-making skills a little bit easier and it'll be more valuable as well guys and it'll be more validated as well guys and the second thing is to over require or new revenue right so let's say now you have the data which pretty much only a couple of people can make sense of understand let's say your market is very niche in that case sure you'll be still making revenue still be making money on it but then at the end of the day if you have to reach out to a broader audience you need to make sure your data is understood by this broad audience right so that forms the second of a very important need for data analytics guys and the third one is to obviously or decrease these operation costs for every organization so this can be a very demanding task if you have to convert raw data into information process the data and make out very good visualizations out of it as well but then if you have the workforce for it then sure you can do it but if not again since it's probably a manual task and until a couple of years ago it shorty was on my new task and these years you can implement machine learning and data analytics to pretty much automate itself and make your job easier as well it speeds up your work it keeps you very efficient in such a way where your data is being visualized processed very effectively so again time is money when it comes to big organizations right so on that note pretty much it helps decrease all of the operation costs with respect to you know either waiting in the data pipeline or raw waiting to process you know waiting to publish waiting to do some analytics waiting for visualization all of these weights you know you need not consider them anymore because they will pretty much be reduced to null guys so on that note you might be wondering who our data analyst side so data analysts are the people who sit at the end of the chain, I'll just explain the chain in the next line so these guys form a very good workforce for the company where they basically you know deliver the values by taking in all the data that data scientists might give it to them of data engineer might give it to them and then he uses this person data analyst uses all of these to answer a couple of questions and then communicate all of the results back and all these results that he just gives back is basically used to take very good business decisions guys it might include analytics of past trends it may be prediction of what the future looks like and so much more and then the common tasks done by data analyst or raw data cleaning or performing analytics and then creating visualizations well data cleaning to come about it as basically simple yes so again data is raw and for raw information right to make sense out of it to pick out information that we only require and to make sure that again efficiency is the game here if you pretty much processing data that we pretty much will not require then it's just a waste of time and resource so cleaning the data making sure that your data is just perfect enough for processing that's that's a very important first step guys so this is the spearhead of a data analysts role and then the second one is obviously as soon as your data is clean need to perform some very good analytics on it and create very good visualizations guys so these four designations use free see on the screen right now is basically the designations of data analyst is also known by so the first is the business analyst so data analysis is also a business analyst you know he or she can be an Operations Analyst a business intelligence analyst and a database analyst as well guys so coming to the task that they pretty much do the first important tasks if you already spoke about was cleaning and organizing a raw unstructured data right so this forms a this is again a very overlooked concept in today's world unless you get your hands dirty with the data itself when you start to or to go or doing that you will realize that cleaning and organizing the raw data is extremely important because at the end of it your data might be unstructured your data might be semi structured or it might be a structured data this will not matter if the data you're processing is of no use at the end case so cleaning and organizing raw data is very important and the second thing is the analysis of all of the hidden trends found in the data so making sense of something again maybe predictions in the future or looking at past trends you know generating this sort of an information which you cannot figure out upfront just by looking at the data this is pretty much the umbrella under which the hidden trends function of a data analysts work so you'll be looking at a data you will find something very interesting so let's say you find a trend in your data where pretty much you can access a new side of the market with respect to your sales in the next five years ten years and then you were not told this but then your data was telling you this so you making sense of the data in which this trend was found which was not upfront then this is again a very important skill of a data analyst as well guys and the third important thing is pretty much the big-picture view or using descriptive statistics at the end of it sure we know what the past trends of the company are we know how the company is going on now but then if you need to or just the summarize all of this and say even add some predictions for the next 5 10 years getting this big-picture view of what's happened what's happening now and what will happen in the future or not just using a very simple statistics but using descriptive statistics where we'll be picking up each aspect of what's going on and then perform analytics on it and find out what's going on if we are right in this place if you're wrong in the space how can this be improved you know how can this concept be checked how can we reach the clients better and so much more so getting this big picture of you is again a very important task our data analyst goes about doing guys and the fourth one is probably the most important thing a data analyst does is pretty much the creation of dashboards and visualizations guys so as I already told you in the introduction part of the video where we're starting out with raw information again a lot of numbers but if you have to show these numbers at a business meeting yeah I mean numbers are numbers some people like them but a majority of the people whom they might not understand the numbers so making these numbers as you input creating very good visualizations giving them a user interface and giving your customers your peers your superiors your business meeting members are giving them a very good user experience with all of this data again at the end of it will add up to very good business methodologies and then it will help you take better business or driven decisions again and again the same goes through representations of results to your clients and your internals as well guys so coming to the chain of how the data moves around in a form so I have three important things that I was just want to walk you through guys in a quick just ask the first person who is the spearhead of the data for an organization by spear and has mean he's the first one who look at the data or he'll be responsible for bringing in all of the data from various sources guys so data coming in from the various sources right so think about all the data you can get from Twitter or if you're performing any sentiment analysis think of all the data that can come from all of the big data sources you can have data coming from your various notes from Hadoop so much more you can have your data coming from your own network away from your network and so much more guys the data engineer as a spearhead handles bringing in the data and making it understandable by the organization guys and then as soon as the data engineer finishes his part with the data the data moves on to the data scientist the data scientist he's responsible for walking on all of these raw data are converting it into valid information but then he already told this for the data analyst as well well the data scientist pretty much uses machine learning algorithms use deep learning he uses let's say binary classifications naive bias he makes use of so many concepts you're the data analysts might not know and he uses a lot of machine learning and deep learning as I've already mentioned to convert this raw data into valid information and 99% of the time the information which is being converted into our number guys so after thorough data scientists pretty much goes about doing his magic onto the data and then we have the data analyst who steps this person again as we've already been mentioning he is responsible for the prediction of what happens in the future that is finding our trends and then the presentation of all these informations to your peers to your clients to your superiors and so much more this is the basic gist of how data moves around so the data's first seen by the data engineer he does his job comes to the data scientist the data scientist works his magic we are all these fancy algorithms and whatnot and then pushes the data to the data analysts the data analyst performs prediction sees friends visualizes the data or makes it presentable for everyone to understand and that eventually goes about analyzing the data guys so on that note we can come check out the type of for data analytics that can be done in today's business guys there are a couple of types of data analytics and I'll be walking you through the same so the first type of data analytics you can go about doing is the descriptive analytics Oh with respect to descriptive analytics again the quick one phrase answer to what descriptive analytics is so a quick info guys in case you are interested in doing an end-to-end course in data analytics please do check out intellipaat's data analytics training course it will help you to master the concepts of data analytics thoroughly details for the same is available in the description box below please do check out it's basically you know picking out data from a source or summarizing your data making sure your data can be understood by everyone that is picking out very good insights from your data from all the past events doing some predictive analytics on that as well and then keeping it ready so your data is descriptive at the end of today and a question which goes by to understand the descriptive analytics is what happened in my business so the answer to this question what happened in my business is given to by the descriptive analytics case so most of the time or the data which is generated by descriptive analytics is very comprehensive it has extremely accurate and the visualizations are very effective as well so here's a very simple scenario that you guys can consider so consider a scenario or you know where an e-learning website decides to focus on a trend course according to the analytics of a search volume of the course content or which writi much the users go on searching and all of the revenue generated by the course in the past few months as well so there are certain technologies again we live in a world full of threads right so let's say data science is in the boom right now so pretty much any elearning company or any company for that matter they're gonna find these trends they're gonna know that hey data science is sitting in the top tier and we need to do something about it if they have students in data science then most of these companies are aimed at making the students life better right so even in a firm here in an Intellipaat that's exactly what we do as well we get this amazing insight from all of our learners for any particular course and we use all of those insights all of the feedback that we get and what's trending in the market what's latest in the market and we use all of these to perform analytics and then pretty much all of these helps us to put out a better course as well guys so at the end of it you can already see how or the descriptive analysis is already helping our business right and then the next type of data analytics we can check out as the diagnostic analytics guys so we already checked out what is happening to my business with respect to descriptive analytics with respect to diagnostic analytics you will check out why this particular thing is happening to my business guys to simplify it again it's basically gives you the ability to drill down to the root cause of why something is behaving just like how it should guys so drilling down the data to identify certain set of problems which are present in your data or in the trends that the data shows is a very vital part of diagnostic analytics guys and then it basically helps in answering the question for us about why the issue is occurred you know so it just takes one look towards the data or towards the analytic result of the data to understand what the cause behind a problem if one should exist as guys so again to give you a very simple scenario of how diagnostic analytics would work let's say consider a company where the sales went down for a month so let's say they were doing as good as the previous months so to diagnose this particular problem you will let's say consider the situation where the number of employees were quitting their job and they weren't bringing him of sales so this could pretty much you know this number of people quitting the company could directly impact the sales that they brought into the company so the sales went down this month because they did not have the company right or they did not have the sales right so finding out why this is happening and hunting for that root cause which is basically a treasure hunt with your data where you go on finding something and taking insights from your data finding out that why part of it is extremely important when you'll be finding out what's happening or to the data and why it's behaving this way guys so this again is a very vital part of diagnostic and analytics and coming to the third or type of data analytics at the split of data analytics guys and as the name suggests the question you'd be asking is what will happen in the future based on the past trends guys so finding out hunting out you know historical patterns that are being used to basically predict all these specific outcomes using let's say machine learning algorithms using deep learning and using so many more concepts to predict something in the future based on the data we have now is again a very important niche of data analytics guys so predicting the future trends and the possibilities in the market based in the current trends self-explanatory and then it helps in optimizing all of the business plans for the future because none of your business has a direction to head to right so you're predicting certain aspects in the future and you know which path leads to better businesses or strategies right and this will give you an edge over all of the other businesses as well to give you a very nice example think of the Netflix recommendation systems guys so this will basically use statistical modeling to analyze all the content that it's being watched by the audience across the world so let's say there are a couple of TV shows which are very famous in India but then there might be a couple of TV shows which is you know trending in the United States and let's say Australia United Kingdom so much more I'm making sure that the right users in the right geographical area get to know the right recommendations are extremely important for Netflix as a business right so this again our predictive analytics will just do them good ther case again it'll provide us with the prediction of all of the upcoming content of the shows that must be watched by all the different class of audiences well so let's say someone sitting in the United States is very interested in watching an Indian knob trending television series as well so annoying or Netflix finding out that this person exists and to recommend an Indian TV show for this person sitting in the United States to watch again this is another very good business opportunity for Netflix and then if they go about nailing their recommendation system for everyone it just makes them a better business model and then it just makes a better user environment for the users to go about watching videos there guys so the next one the next type of data analytics is prescriptive analytics so as soon as we think of a prescription again we think of a doctor or we think of something a medical appointment or something right here it's again something similar the question which will be asked here is what should be done so applying advanced analytical algorithms to ensure that you make very good recommendations you make sure you punching out very good strategies that help the business and so much more is a very vital part of prescriptive analytics case so it basically involves breaking down all of this complex information into a very simple set of steps and these steps are like the prescription we handed out when we was at the doctor right and these are the prescriptions which are a precautious as well let's say or they're the precautions basically used to remove any of the future problems that might occur and it will also help in performing predictive analysis this would basically help to predict outcome and eventually that will help us optimizing businesses well guys so this again demands the use of artificial intelligence and big data and then the Data analyst person will always be in touch with the data engineer as well and the data scientist as well because he's gonna need all the help he can get with respect to artificial intelligence from the data scientist and help with respect to big data from the data engineer as well right so all these three guys working in correlation make up a business and I've already told you that so again to give you a common scenario of how this would happen consider the prescriptive data analytics that Google map goes about doing right so every day we are commuting from our office to our homes or let's say we going on a vacation and then we need to get out of the city as quick as possible so Google Maps has a wonderful API where they map out the best possible route considering live traffic conditions weather conditions road closures and so much more it considers your distance traffic constraints and again as I already told you it even considers if it's raining if you're walking oh you know what's the best route that you can take to walk what's the best route to take fare on a car and now they've come up with the bike route as well so if you're in a moped or a bike you can take a different route compared to the person who commute with the car and so much more so knowing this kind of prediction into the future and giving you the sort of a prescription to ensure you don't run into any problems along the way is is a very important part of present to data analytics guys so on that note let's quickly check out what the life cycle of data analytics is like so the data analytics lifestyle it basically defines the analytics process and all of the best practices which goes on from the discovery of the data or the project or till the completion of this project guys so there are a couple of steps involved in this life cycle process and the first one is business understanding at the end of it again understanding the purpose at all of the requirements that come from your business and understanding it from the business viewpoint is very important and vital for the functioning of a business right this also consists of a very good introductory plan it consists of a decisions plan it consists of or it consists of a formal to-do list let's say for the business to go on to achieving the target so the first important thing about the life cycle is the understanding of what's going on around you and the second one is the data understanding guys with respect to data understanding again this mainly involves the process where we're collecting the data and we're processing the data in a way which leads to analytics and then after the analysis of the data is done we need to pick up some insights that we can you know go about using from the data so i quick info guys in case you are interested in doing an end-to-end course in data analytics please do check out intellipaat's data analytics training course it will help you to master the concepts of data analytics thoroughly details for the same is available in the description box below please do check out so extracting all of these meaningful insights from the data again is a very vital step in the life cycle of data analytics through data understanding guys oh the third part of it is data preparation so data preparation is the let's say converting the data from an unstructured form to a structured form and this involves constructing let's say a data set as well and then this data set will be provided and fed into a model and then this model will be used by a machine learning algorithm to train to understand to see what's going on to perform predictions and then it will be given to the data analyst to be visualized using tableau or any other business intelligence tools and so much more so data preparation again is a very important phase where the data is actually being transformed and even at the states the cleansing of the data is pretty much performed as well guys and then comes modeling.
This step is very important because it involves the selection of various modeling techniques you know applying all of these modeling techniques making sure the parameters are right and all the readings here are right to ensure that your data being converted into the information raw or raw data being converted into the information is of the optimal let's say it's of the optimal tolerance that can be used for your business usage and then as soon as your modeling part of it is done as soon as the techniques are applied and all the parameters are marked then comes evaluation guys so evaluation is a very important phase where again the model which is being built it will be built very rigorously tested very or rigorously based on what you have built in the initial status based on what you have built in the initial stages and then so many tests will be performed on this data as well so evaluating something that you have generated performing various tests on it is extremely important Oh most of the time this is overlooked but then these days everyone knows the value of evaluating your database so this again involves reviewing all of the steps that are again needed to carry out or carry out to construct this particular model and to perform tests on it is very important and with evaluation.
That's exactly what we do guys so the next life cycle concept I want to tell you guys is deployment deployment is the last step or in the data like in the data analytics life cycle because deploying is when you're sending out your model into the world for let's say by the world I mean let's say for your team for your pure so let's see even for your client and customers as well so or making your data go from just using it all the way to spreading the data and perform it spreading the data to your clients to your peers or anyone else if you know you can perform more tests on these as well so after deployment you can have after deployment tests if something is wrong again you can go back to modeling perform or evaluation perform or deployment as well so the thing you need to know here is a deployment pretty much goes about to be the final phase of the data analytics lifestyle guys and on that note or we can check out very quickly what are the rules of analytics in various industries around us guys again data analytics has an amazing insight and impact when it comes to telecom industry because again if you've been observing all the prices of let's say the calls the messages or the internet packs these days have been coming down and down and there was a time when they were being exorbitantly high as well so again the telecom industry realized that if you keep the prices very high to make more profits and eventually your customers will not come they will not bite internet packs so to keep this in mind probably they decided let's say we're having a bad impact right now so let's drop down our prices to see if it works and it is I guess and this has helped telecom industry in bringing better business and for us as users as helped us or you know just make it a bit economical and efficient on our side with respect to money as well and then the retail banking industry of data analytics has a huge impact than the retail banking industry as well to know what the customer wants because again with telecom industry as well they have a huge amount of customers to play with right so they need to understand the view of each customers the requirement of each customers and then to find out if all the customers are actually in the common chain of what's being supplied by the bank or if the customers are against something that's being sent by the bank so that again is a very important thing that you need to check here as well and then with respect to the e-commerce industry as well to performs again some analytics in the e-commerce industry it might be our recommendations or there are very big sales which are run by some of the big names in the industry such as Amazon Flipkart myntra and so much more so these guys will have to perform extremely heavy analytics on the data that they see based on the products that they sell and based on the places based on the city in which the product sells or if the people are unhappy with the price again performing analytics has just changed the e-commerce industry or if you make guys and the land last most important industry where raw data analytics has talks to the healthcare industry this has had the most impact with respect to analytics in the healthcare industry with respect to so many things mainly does respect to with respect to finding out what medication is required by what countries what amount of medication is working for what population and so much more guys I can probably talk about just the roles of analytics in these industries for days together and we can still be going on and have a very good discussion of how important analytics has become these days again with respect to insurance as well what what geographical location requires insurance what what what the audience are and so much more so again as I've said how we can go on talking about this but then to keep it to the scope of this tutorial we can just quickly brief through each of these guys so with respect to the healthcare industry this is the formal way of going about analytics now the first one is to again analyze all of these disease patterns analyze all of the disease outbreak that pretty much goes out we had a disease outbreak a couple of years back which is Ebola and so much more weird h1n1 and so much more right to keep a track of all of these outbreaks this basically again improves the surveillance with respect to health and then this gives out good responses to the emergency sectors as well guys and then development of better targeted preventive techniques obviously and then the development of vaccines are making sure use vaccines again reach your customers and all of these where you need to reach your customers or let's say in this case the patients are all very important guys so identifying the consumers again in this is the greatest risk of business so identifying certain customers or patients who are at the greatest risk is again you know because they might be developing some adverse health outcomes and then developing welfare programs to keep a track of their health to track their health on a daily basis even a weekly basis monthly basis to perform analytics or all the aspects in the parameters that you're tracking with respect to the patients again that is a very important thing and lastly to ensure that they can already use readmissions because they might know what the cause of for an adverse effect is and if there are 10 patients with very similar symptoms then you can perform analysis and no you know these filter out and find out that all of these French patients might have this one common symptom associated with them and this might be the cause of that so mapping that for every patient is again very important case so coming to the telecom industry again any comment as to you pretty much goes about using predictive analysis to gain all of the insights that they need to make better decisions to make faster decisions and to make more effective decisions again as they're talking about the internet pack example again this was very key in that and then by learning more and more about the customers daily and the preferences and the needs these telecom companies can be more successful in this extremely highly competitive industry as well it's good for them with respect to business and it is good for us as customers by bringing down the prices again so it is used for analytical customer relationship management it is used for fraud reduction it is use of bad debt reduction to surprise optimization call center optimization and so much more so now that you're looking at data analytics in this way you realize that data analytics eventually has a big play or has a big shame when it comes to any of these business models right so again coming to banking as I've already mentioned another Texas making banks become very smart day-by-day guys so it is managing all the the plethora of challenges that the bank faces and then again while pretty much all you know going about doing some basic reporting all the way till descriptive analytics this is all a must for every single bank right even performing let's say advanced or prescriptive analytics are so much more and all these starting at this age or banks have started to realize that this will again help you generate very good insights and this will result in extremely good business impact and will help the banks as well on that we need to check out how data analytics is helping in the banking industry as well right so it just used to acquire and retain customers it is used to take fraud which is again extremely important with respect to banking it is used to improve risk control find new sources of growth for the bank and to optimize all of the product and generate their portfolio models as well so as soon as we check out the banking sector again with the e-commerce industry right so this is again this is the market which is exploding for the last couple of years I could say even the decade right eBay came up Flipkart came up Amazon is again taking over everything my cock guys there's so many e-commerce portals today and to make sure we perform very good analytics on these e-commerce industries is very vital so how was your date analytics used again it is used to improve user experiences it is used to enhance customer engagement customize offers and promotions maintain effective supply chain management optimize pricing models minimize the risk of frauds provide them very good advertisements that pretty much helped them pick up products good recommendation systems where they'll pick up another product after the first product guys and so much more if you've just bought an iPhone again you will pretty much be recommended with a couple of cases that the people have bought as soon as they bought another iPhone so you might be that person you might like the case and you might pick it up at the end of it you have the case to protect you've won the businesses create and more money out of it right so again the analytics or the role of analytics in e-commerce industry is it's extremely vital let's see the people in the e-commerce industry have known this for a while guys so coming to the analytics in the insurance industry again here as well guys is basically used to enhance your customer engagement acquire new customers retain the existing customers make sure the customers don't leave prevent the frauds at the end of it venues the frauds prioritize all the claims that need to be you'll have medical insurance you'll have fog and health insurances oh you have life insurance you have so much more that you need to take and all of these directly impact the user right so making sure you take the feedback from the user work on it and then create some analytics out of it is again very vital guys so on that note we can quickly come to our Rock case study which I was talking about and this case study is a very famous one it's basically the house of pies our case study and here all we'll be doing is predicting house prices guys so we'll have a certain set of data which we will use to predict the prices of the houses so basically how can we predict the price of a house there are so many things that you will need to know right you'd be looking at the locality in which the house is present you'll be looking at the amenities you'll be looking at the number of bedrooms the living space the number of floors in your house or the number of cars that can fit in your garage the size of the garage the quality of the construction of the house if it has a swimming pool or not if it has or you know a spa or not I mean so many things if we have to list down or this particular use case then it will be extremely tough because each one of us has our own judgment of how we can validate a house right because house is something that's very personal to us again the materials of what was used to go to build the house the style of the house which is built in the number of four you know if the house has an elevator how convenient is the house for disabled people guys so much more so basically we'll be performing exploratory analysis on this guy so exploratory data analysis against used to find a hidden trend in your data by performing analysis on it and then at the end of it the trends will be shown as numbers but since we already know visualization we're gonna be pretty much using these numbers to visualize all of the data for us and you'll be doing it step-by-step so we'll be finding correlation between the data as well so again correlation is basically to check how one variable is linked and how the changing of one variable directly.
Ah you know changes the other variable as well so how these two variables are all pretty much hung up together how changing one variable change the other or can be known using correlation as well guys so a couple of steps that pretty much is generally followed now in the case of performing exploratory analysis first we'll be visualizing all of our data finding the missing values and we'll be looking for correlations guys and then after this is done we'll be cleaning the data to check if any issues are faked or will be checking out of the data that we've have we have is pretty much being used fully or not and then we'll go about building a model which is used to visualize a result it will give us the diagnostic it'll give us the residual diagnostic roc curves you know charts graphs tabs or tables and so much more guys.
I do not want to basically overwhelm you with the use case so to keep this use case very simple we will just perform our exploratory data analysis at this stage to find out correlations between the data and as soon as we go about progressing with respect to our data set and to work with you will understand how beautiful data analytics is guys so let me quickly jump into google collab which is basically a jupiter notebook hosted on the google cloud and here we can go about performing our raw data analytics on the use case that I just walked you through so we can need a couple of files to run our own use case I will ask you need one file which is our training a dataset file let me just quickly add the file to our Google column and then we can go about receiving performing our analytics case those will just take a second to upload the file give me a second we just actually need one file from out here but then it doesn't harm to upload it and keep it in your runtime but then you just take the message right so particularly pretty much all of your files are recycled as soon as the runtime has pretty much changed so the first step for our use cases to load all of the necessary files in the libraries that we require guys so the first again we'll be using pandas to handle all of our data will be using Seaborn and matplotlib to perform plotting operations or all of this data to perform visualizations on all of these data and the style that we'll be using is pretty much called as the bmh method and with respect to BMH again bmh is nothing but the Bayesian method for hikers and this is a type of a graph visualization method which gives us graphs which look nicer and then it helps us to perform analysis better on linear data's it's guys so the second thing we'll be doing is again loading all of the necessary files the one of the important files we need is the training data set which is this and here you have the ID of the houses you have the subclass of where the AUSA's is present you have the zone in which the house is present what is the area of the front X that you have what is the area a lot of your house what is the street it's a what is the alley again a lot shaped lot contour what is it your the house was built in order to is really modeled in what is the size if the roof you know again so many conditions out here what does the foundation made of how is the quality of the basement condition of the basement the exposure to the basement again finishing type of the basement guys you know just look at how expansive or this data set is this is again a data set which is extremely popular among us analysts there we pretty much like to work on this because it comes soft everything and you can perform so much on the single dataset and so much more organized so on that note let's quickly find out the information of all the variables in our presence again if the housing is heating house equality control of the heating just a centralized air-conditioning how was the condition of the electricals what does the square footing of the first or square footing of the second or what does the low quality finish that saw and how many square feet of that we have or what's the living space area how many full bathrooms we have how many half bathrooms do we have our kitchen quality guys this will go on right so all these data is what we need to check out again if you see here all of these are the values that are present which will help us map something but then Ali again doesn't have many values you can check out here itself right so Ali its nem is basically not a number so there are not many you know Ali or details which we can make analysis out of so we will not require Ali again coming down not much of fireplace quality as well we do not have pool quality control at all miscellaneous features are very less again even fencing is very less so the average is somewhere around 400 right so to make sure that are cleaning up our data a very important part of it is to basically how we go about doing it again I will just move all of these data which are less than 30 percent Oh so less than 30 percent off again or 1 4 6 0 and then we can have at least 70% of the data to give us some accurate results right so as already checked Ali ID not every houses an ID so that's removed not every house is mapped to an alley a pool quality control is not their fence is not their miscellaneous which is a very less so we have dropped all of these columns and will not be using thieves to perform our analytics s and so again to describe how what goes on to you know do a distribution I hope you guys know this concept of our normal distribution and and with respect to all the details that we can get out of it when we perform the math operation guys so basically we can count the total number of data that's present with respect to all the individual data what does the mean or sale price of the data what does the standard deviation let's say the mean of the normal distribution is the right and the center somewhere here guys so the mean is somewhere around 18,000 right oh I'm sorry this is 180 thousand oh if you just keep tracing from the center point down this is somewhere where 180 thousand exists guys so again what does sound Sanger deviation is basically the deviation from the mean so what is that there are house values which deviate from this mean as well again 25 percent deviation is 50 percent deviation 75 percent deviations and then what is the maximum sale price of the house as well so all of these can be found out from this particular graph case and if you can already observe and even if you're not exposed to a normal distribution this starts out as a very steep curve but then it ends out with respect to a lot of data here as well it goes on until now 800,000 so basically probably from 5000 or hav even four hundred thousand all the way till eight hundred thousand we call these as outliers these are called outliers because these are very far away from our normal distribution and then these actually might not be useful for us with respect to our mean or whatever and these will impact a lot when we are performing analytics with respect to the mean or standard deviation or anything for that matter so you will have to actually remove them and not consider them to basically perform very accurate analysis guys so on that note we can pretty much go on to finding you know the type of the data set from the type of the data that will only consider because in this particular case since you are playing with numbers it has to be the numeric data type right again you can check out we have integer number the floating numbers and so much more so that's pretty much go on to print or what it looks like after we have what dropped or the values were we not using we're not using ID we're not using Ally we're not using so much more right so these are all the numerical values that we'll be using to pretty much consider again your built as a numerical value 2003 is a date I mean your overall condition overall quality all these can be rated from a particular scale right so again a square footing is a particular number as well so the second floor of this particular house has 854 square feet so much more so on that particular note as soon as we check out all these numbers are present we can start performing an analysis guys so before that again we need to just plot all of these to just check of what it will look like on graphs because seeing numbers are one thing seeing graphs on the other hand or something else so it so pretty much will be you're developing histograms and we can be checking out this so let me just scroll down a little so with respect to first-floor square footing the mean is somewhere around here right sits around let's say 500 square 4 thousand square foot and again look at a second for the square footing look at the bedroom average basement finishing qualities or garages so number of cars that you can park in the garage and look any value here - so the majority of the houses here have two spaces to power to park your cars the year that the garage was built in again autograph your living area how many half bathroom so you can see one half bathrooms again at this point of time you have certain value said zero as well well sure we can consider values of zero if it's very important but then if since we're talking about sale price what is present is more important than what is absent right we need to have something descriptive for our analytics methods to work so in that case we'll have something called as the golden features list and this variable will basically contains all of the features that we'll be associating with respect to why our sales price is as high as it is guys so this variable called as the golden features list will have all the features guys so basically we're creating a variable where we're finding out the correlation and then we can already check out the top 10 correlated values which are strongly correlated by correlated again.
Let's say let me do your quick impact basically we have described this in the descending order as you can already check out so the first thing here is overall quality again to transcribe this into literal terms overall quality of the house whatever the rating that was given in our data set is mattering the most of how the houses is being priced again the living area of it the number of garages you can park is having almost 64% impact of four with respect to why the prices like that garage area is having a 62% impact the basement square footing has a 61% impact and then the year it was remodeled and you know changes made as having a 50% impact of why the house price the sales prices like that case so looking at a couple of four linear relationships you can pretty much find out that a lot of values are zeros that I just walked you through a couple of seconds and look at that o with respect to or again the ground floor living area we have a very little number of zeros with respect to the sales price again or check out the basement door surface area again with respect to the total basement surface area again even here as well so look at all these tiny dots which are sticking up to zero with respect to sales right so these have no impact for our sales price because they have they're not giving us any valid linear deal okay this again with zero oh it is raised up to a lot about $600,000 right so all these are not adding any meaning to our data so if X is equal to zero again this might indicate that that house does not have that feature at all so if this is zero this house does like I mean the Lord houses do not have pools here so pool area is zero so in that particular case we need to remove all of these zeros so that we can go on to finding more correlated values that we can actually use to go about working with in case again here's all of the correlated values that we found as soon as we run this command basically we're sorting it again in the descending order to find out what helps most and you can see that it's almost 80% of the total quality of the house which matters when you're buying god the because of the price of the house and and our people are actually preferring this living area 2nd floor surface area it has a 67% you know chances of affecting the price and so much more and check out what the golden uh features or list looks like with respect to all of the strongly curated values which is found again if your remodeled your built also much more it's a total surface area again number of full bathrooms the first four surface area of the garage area total basement or square footing the number of cars you can Partin the square footing of the second floor the living area and the overall quality again in this particular order from the least to the highest is exactly what we're trying to find out with respect to exploratory data analysis guys so just looking at the data said you could never figure out why the list price of a house was so much and once we break down into simple terms like this we can find out that there is an 80% impact from the overall quality of the house or which the user is singing out to just consider the house or not if the quality is very less he will not pick up the house if the quality is higher than sure he will pick up the house so 80% of the reason why the price is set like that is a very important aspect of for why the house prices are casing study is important guys again this has been a very important very nice data set to work with and you will get a lot of analytics it can be done using your this particular data set as well guys so a quick info guys in case you are interested in doing an end to end course in data analytics please do check out intelli.
Pat's data analytics training course it will help you to master the concepts of data analytics thoroughly details for the same is viable in the description box below.
Please do check out hey guys hope you all liked the video have any questions clarifications concerns please do put it down in the comment section below we are happy to assist you so thank you so much.
Intellipaat data analytics is considered as an important aspect in any data-driven business today top companies like Google Facebook Amazon and many more use data analytics to drive their business to success in today's session we'll be having a quick introduction to the world of data analytics before we get started please do subscribe to Intellipaat's youtube channel to get some instant notification about our upcoming videos now let's take a look at the agenda for the day we'll start off by understanding what is data analytics then we'll move on to understand the need of data analytics in an organization thirdly we will also find out who our data analysts and what are the roles responsibilities tasks that they would be doing on a daily basis fourthly we will also deep dive into the types of data analytics and its lifecycle finally we will also check out the use case on data analytics and if you have any questions clarifications concerns please put it down in the comment section below we are glad and happy to help you out also guys if you are interested in an end-to-end course in data analytics please do check out intellipaat's data analytics training course that would help you to master the concepts of data analytics thoroughly details for the same is available in the description box below without any further delays now let's get started we've all heard this term somewhere it pretty much rings bell that every time you think about it right so analytics with respect to data well what are we trying to analyze is what we need to know here right so the formal definition of how it goes is basically extracting a meaningful information from raw data it's as simple as that so we have data right so consider anything to be data if that is not useful to you at that point it is considered as data but then if you perform some operation on this data and make sure that it is useful for your organization this is when the data becomes information and this information is what is usable for you right so the process of pretty much converting raw data into information by performing certain analysis on it can be termed as data analytics case.
I mean this is just a very rough definition of what goes in the industry to give you a better insight of what that means basically it is the pursuit of extracting meaning from raw data using specialized computer systems guys and then these systems they transform the organized and they model the data end of it is to basically draw the conclusions from all of the data so you will have again as I've mentioned with the raw example our goal here is to basically draw conclusions to identify patterns to make use of this raw information in a better way I mean sure data can just be used in its raw form but then not with every case right and to give you an example so think of the data probably you understand so let's say you're looking at a huge or data set which contains thousands of values those are just numbers right so having these numbers sure as a data analyst you will understand what you're doing but then let's say you need to explain it to a person who does know what data analytics is or you need to explain it to someone probably your pure or someone above you as well let's say you business meeting where you have to explain this so giving the numbers will not really be that great.
I did not make a very good presentation so converting all of these numerical data making it into graphs making it in a way where everyone understands the data and it's showing out graphically whatever insights that you can generate from the data that can be considered as data analytics as well guys again today basically the field of data analytics is ever so growing rapidly and why is it growing so rapidly well because the market demand for it is that much so we have pretty much every you know startup companies these days have a requirement for big data as well so big data or to dumb down the definition it pretty much means a huge amount of data that cannot be handled by just one machine if you're stretching out your data across multiple nodes you have data coming in from various sources then you need a method to handle all of this big data that's coming in or to process the data and then to perform analysis on it guys so this has been in demand in the market for a while and for the last couple of years data analytics has been in the boom and then it's providing a very straight answer to all of the questions being raised about how all these can be handled guys and of course we're going to need the people who have the skills which is needed in a skills which are needed to manipulate all the data queries to translate all these numbers into graphs and to make insightful analysis on it right so that's what is the main key role of a data analytic person.
Oh what does a need for data analytics if you have to break it down into three steps guys so again you already know that it is on the rise right so I can pretty much go about to step out into the open and then tell you that it is not soon that it will be an integral part of an organization it is already an integral part of every organization there is guys so I quick info guys in case you are interested in doing an end-to-end course in data analytics please do check out intellipaat's data analytics training course it will help you to master the concepts of data analytics thoroughly details for the same is available in the description box below please do check out so the first important need is again we already know that it is a top priority for all of these organizations ranging from these small organizations all the way to the big guys and then this is needed to make very good or decision making well with respect to you or being confused to let's say take a decision for you then looking at this analytics will make your decision norms making skills and decision-making skills a little bit easier and it'll be more valuable as well guys and it'll be more validated as well guys and the second thing is to over require or new revenue right so let's say now you have the data which pretty much only a couple of people can make sense of understand let's say your market is very niche in that case sure you'll be still making revenue still be making money on it but then at the end of the day if you have to reach out to a broader audience you need to make sure your data is understood by this broad audience right so that forms the second of a very important need for data analytics guys and the third one is to obviously or decrease these operation costs for every organization so this can be a very demanding task if you have to convert raw data into information process the data and make out very good visualizations out of it as well but then if you have the workforce for it then sure you can do it but if not again since it's probably a manual task and until a couple of years ago it shorty was on my new task and these years you can implement machine learning and data analytics to pretty much automate itself and make your job easier as well it speeds up your work it keeps you very efficient in such a way where your data is being visualized processed very effectively so again time is money when it comes to big organizations right so on that note pretty much it helps decrease all of the operation costs with respect to you know either waiting in the data pipeline or raw waiting to process you know waiting to publish waiting to do some analytics waiting for visualization all of these weights you know you need not consider them anymore because they will pretty much be reduced to null guys so on that note you might be wondering who our data analyst side so data analysts are the people who sit at the end of the chain, I'll just explain the chain in the next line so these guys form a very good workforce for the company where they basically you know deliver the values by taking in all the data that data scientists might give it to them of data engineer might give it to them and then he uses this person data analyst uses all of these to answer a couple of questions and then communicate all of the results back and all these results that he just gives back is basically used to take very good business decisions guys it might include analytics of past trends it may be prediction of what the future looks like and so much more and then the common tasks done by data analyst or raw data cleaning or performing analytics and then creating visualizations well data cleaning to come about it as basically simple yes so again data is raw and for raw information right to make sense out of it to pick out information that we only require and to make sure that again efficiency is the game here if you pretty much processing data that we pretty much will not require then it's just a waste of time and resource so cleaning the data making sure that your data is just perfect enough for processing that's that's a very important first step guys so this is the spearhead of a data analysts role and then the second one is obviously as soon as your data is clean need to perform some very good analytics on it and create very good visualizations guys so these four designations use free see on the screen right now is basically the designations of data analyst is also known by so the first is the business analyst so data analysis is also a business analyst you know he or she can be an Operations Analyst a business intelligence analyst and a database analyst as well guys so coming to the task that they pretty much do the first important tasks if you already spoke about was cleaning and organizing a raw unstructured data right so this forms a this is again a very overlooked concept in today's world unless you get your hands dirty with the data itself when you start to or to go or doing that you will realize that cleaning and organizing the raw data is extremely important because at the end of it your data might be unstructured your data might be semi structured or it might be a structured data this will not matter if the data you're processing is of no use at the end case so cleaning and organizing raw data is very important and the second thing is the analysis of all of the hidden trends found in the data so making sense of something again maybe predictions in the future or looking at past trends you know generating this sort of an information which you cannot figure out upfront just by looking at the data this is pretty much the umbrella under which the hidden trends function of a data analysts work so you'll be looking at a data you will find something very interesting so let's say you find a trend in your data where pretty much you can access a new side of the market with respect to your sales in the next five years ten years and then you were not told this but then your data was telling you this so you making sense of the data in which this trend was found which was not upfront then this is again a very important skill of a data analyst as well guys and the third important thing is pretty much the big-picture view or using descriptive statistics at the end of it sure we know what the past trends of the company are we know how the company is going on now but then if you need to or just the summarize all of this and say even add some predictions for the next 5 10 years getting this big-picture view of what's happened what's happening now and what will happen in the future or not just using a very simple statistics but using descriptive statistics where we'll be picking up each aspect of what's going on and then perform analytics on it and find out what's going on if we are right in this place if you're wrong in the space how can this be improved you know how can this concept be checked how can we reach the clients better and so much more so getting this big picture of you is again a very important task our data analyst goes about doing guys and the fourth one is probably the most important thing a data analyst does is pretty much the creation of dashboards and visualizations guys so as I already told you in the introduction part of the video where we're starting out with raw information again a lot of numbers but if you have to show these numbers at a business meeting yeah I mean numbers are numbers some people like them but a majority of the people whom they might not understand the numbers so making these numbers as you input creating very good visualizations giving them a user interface and giving your customers your peers your superiors your business meeting members are giving them a very good user experience with all of this data again at the end of it will add up to very good business methodologies and then it will help you take better business or driven decisions again and again the same goes through representations of results to your clients and your internals as well guys so coming to the chain of how the data moves around in a form so I have three important things that I was just want to walk you through guys in a quick just ask the first person who is the spearhead of the data for an organization by spear and has mean he's the first one who look at the data or he'll be responsible for bringing in all of the data from various sources guys so data coming in from the various sources right so think about all the data you can get from Twitter or if you're performing any sentiment analysis think of all the data that can come from all of the big data sources you can have data coming from your various notes from Hadoop so much more you can have your data coming from your own network away from your network and so much more guys the data engineer as a spearhead handles bringing in the data and making it understandable by the organization guys and then as soon as the data engineer finishes his part with the data the data moves on to the data scientist the data scientist he's responsible for walking on all of these raw data are converting it into valid information but then he already told this for the data analyst as well well the data scientist pretty much uses machine learning algorithms use deep learning he uses let's say binary classifications naive bias he makes use of so many concepts you're the data analysts might not know and he uses a lot of machine learning and deep learning as I've already mentioned to convert this raw data into valid information and 99% of the time the information which is being converted into our number guys so after thorough data scientists pretty much goes about doing his magic onto the data and then we have the data analyst who steps this person again as we've already been mentioning he is responsible for the prediction of what happens in the future that is finding our trends and then the presentation of all these informations to your peers to your clients to your superiors and so much more this is the basic gist of how data moves around so the data's first seen by the data engineer he does his job comes to the data scientist the data scientist works his magic we are all these fancy algorithms and whatnot and then pushes the data to the data analysts the data analyst performs prediction sees friends visualizes the data or makes it presentable for everyone to understand and that eventually goes about analyzing the data guys so on that note we can come check out the type of for data analytics that can be done in today's business guys there are a couple of types of data analytics and I'll be walking you through the same so the first type of data analytics you can go about doing is the descriptive analytics Oh with respect to descriptive analytics again the quick one phrase answer to what descriptive analytics is so a quick info guys in case you are interested in doing an end-to-end course in data analytics please do check out intellipaat's data analytics training course it will help you to master the concepts of data analytics thoroughly details for the same is available in the description box below please do check out it's basically you know picking out data from a source or summarizing your data making sure your data can be understood by everyone that is picking out very good insights from your data from all the past events doing some predictive analytics on that as well and then keeping it ready so your data is descriptive at the end of today and a question which goes by to understand the descriptive analytics is what happened in my business so the answer to this question what happened in my business is given to by the descriptive analytics case so most of the time or the data which is generated by descriptive analytics is very comprehensive it has extremely accurate and the visualizations are very effective as well so here's a very simple scenario that you guys can consider so consider a scenario or you know where an e-learning website decides to focus on a trend course according to the analytics of a search volume of the course content or which writi much the users go on searching and all of the revenue generated by the course in the past few months as well so there are certain technologies again we live in a world full of threads right so let's say data science is in the boom right now so pretty much any elearning company or any company for that matter they're gonna find these trends they're gonna know that hey data science is sitting in the top tier and we need to do something about it if they have students in data science then most of these companies are aimed at making the students life better right so even in a firm here in an Intellipaat that's exactly what we do as well we get this amazing insight from all of our learners for any particular course and we use all of those insights all of the feedback that we get and what's trending in the market what's latest in the market and we use all of these to perform analytics and then pretty much all of these helps us to put out a better course as well guys so at the end of it you can already see how or the descriptive analysis is already helping our business right and then the next type of data analytics we can check out as the diagnostic analytics guys so we already checked out what is happening to my business with respect to descriptive analytics with respect to diagnostic analytics you will check out why this particular thing is happening to my business guys to simplify it again it's basically gives you the ability to drill down to the root cause of why something is behaving just like how it should guys so drilling down the data to identify certain set of problems which are present in your data or in the trends that the data shows is a very vital part of diagnostic analytics guys and then it basically helps in answering the question for us about why the issue is occurred you know so it just takes one look towards the data or towards the analytic result of the data to understand what the cause behind a problem if one should exist as guys so again to give you a very simple scenario of how diagnostic analytics would work let's say consider a company where the sales went down for a month so let's say they were doing as good as the previous months so to diagnose this particular problem you will let's say consider the situation where the number of employees were quitting their job and they weren't bringing him of sales so this could pretty much you know this number of people quitting the company could directly impact the sales that they brought into the company so the sales went down this month because they did not have the company right or they did not have the sales right so finding out why this is happening and hunting for that root cause which is basically a treasure hunt with your data where you go on finding something and taking insights from your data finding out that why part of it is extremely important when you'll be finding out what's happening or to the data and why it's behaving this way guys so this again is a very vital part of diagnostic and analytics and coming to the third or type of data analytics at the split of data analytics guys and as the name suggests the question you'd be asking is what will happen in the future based on the past trends guys so finding out hunting out you know historical patterns that are being used to basically predict all these specific outcomes using let's say machine learning algorithms using deep learning and using so many more concepts to predict something in the future based on the data we have now is again a very important niche of data analytics guys so predicting the future trends and the possibilities in the market based in the current trends self-explanatory and then it helps in optimizing all of the business plans for the future because none of your business has a direction to head to right so you're predicting certain aspects in the future and you know which path leads to better businesses or strategies right and this will give you an edge over all of the other businesses as well to give you a very nice example think of the Netflix recommendation systems guys so this will basically use statistical modeling to analyze all the content that it's being watched by the audience across the world so let's say there are a couple of TV shows which are very famous in India but then there might be a couple of TV shows which is you know trending in the United States and let's say Australia United Kingdom so much more I'm making sure that the right users in the right geographical area get to know the right recommendations are extremely important for Netflix as a business right so this again our predictive analytics will just do them good ther case again it'll provide us with the prediction of all of the upcoming content of the shows that must be watched by all the different class of audiences well so let's say someone sitting in the United States is very interested in watching an Indian knob trending television series as well so annoying or Netflix finding out that this person exists and to recommend an Indian TV show for this person sitting in the United States to watch again this is another very good business opportunity for Netflix and then if they go about nailing their recommendation system for everyone it just makes them a better business model and then it just makes a better user environment for the users to go about watching videos there guys so the next one the next type of data analytics is prescriptive analytics so as soon as we think of a prescription again we think of a doctor or we think of something a medical appointment or something right here it's again something similar the question which will be asked here is what should be done so applying advanced analytical algorithms to ensure that you make very good recommendations you make sure you punching out very good strategies that help the business and so much more is a very vital part of prescriptive analytics case so it basically involves breaking down all of this complex information into a very simple set of steps and these steps are like the prescription we handed out when we was at the doctor right and these are the prescriptions which are a precautious as well let's say or they're the precautions basically used to remove any of the future problems that might occur and it will also help in performing predictive analysis this would basically help to predict outcome and eventually that will help us optimizing businesses well guys so this again demands the use of artificial intelligence and big data and then the Data analyst person will always be in touch with the data engineer as well and the data scientist as well because he's gonna need all the help he can get with respect to artificial intelligence from the data scientist and help with respect to big data from the data engineer as well right so all these three guys working in correlation make up a business and I've already told you that so again to give you a common scenario of how this would happen consider the prescriptive data analytics that Google map goes about doing right so every day we are commuting from our office to our homes or let's say we going on a vacation and then we need to get out of the city as quick as possible so Google Maps has a wonderful API where they map out the best possible route considering live traffic conditions weather conditions road closures and so much more it considers your distance traffic constraints and again as I already told you it even considers if it's raining if you're walking oh you know what's the best route that you can take to walk what's the best route to take fare on a car and now they've come up with the bike route as well so if you're in a moped or a bike you can take a different route compared to the person who commute with the car and so much more so knowing this kind of prediction into the future and giving you the sort of a prescription to ensure you don't run into any problems along the way is is a very important part of present to data analytics guys so on that note let's quickly check out what the life cycle of data analytics is like so the data analytics lifestyle it basically defines the analytics process and all of the best practices which goes on from the discovery of the data or the project or till the completion of this project guys so there are a couple of steps involved in this life cycle process and the first one is business understanding at the end of it again understanding the purpose at all of the requirements that come from your business and understanding it from the business viewpoint is very important and vital for the functioning of a business right this also consists of a very good introductory plan it consists of a decisions plan it consists of or it consists of a formal to-do list let's say for the business to go on to achieving the target so the first important thing about the life cycle is the understanding of what's going on around you and the second one is the data understanding guys with respect to data understanding again this mainly involves the process where we're collecting the data and we're processing the data in a way which leads to analytics and then after the analysis of the data is done we need to pick up some insights that we can you know go about using from the data so i quick info guys in case you are interested in doing an end-to-end course in data analytics please do check out intellipaat's data analytics training course it will help you to master the concepts of data analytics thoroughly details for the same is available in the description box below please do check out so extracting all of these meaningful insights from the data again is a very vital step in the life cycle of data analytics through data understanding guys oh the third part of it is data preparation so data preparation is the let's say converting the data from an unstructured form to a structured form and this involves constructing let's say a data set as well and then this data set will be provided and fed into a model and then this model will be used by a machine learning algorithm to train to understand to see what's going on to perform predictions and then it will be given to the data analyst to be visualized using tableau or any other business intelligence tools and so much more so data preparation again is a very important phase where the data is actually being transformed and even at the states the cleansing of the data is pretty much performed as well guys and then comes modeling.
This step is very important because it involves the selection of various modeling techniques you know applying all of these modeling techniques making sure the parameters are right and all the readings here are right to ensure that your data being converted into the information raw or raw data being converted into the information is of the optimal let's say it's of the optimal tolerance that can be used for your business usage and then as soon as your modeling part of it is done as soon as the techniques are applied and all the parameters are marked then comes evaluation guys so evaluation is a very important phase where again the model which is being built it will be built very rigorously tested very or rigorously based on what you have built in the initial status based on what you have built in the initial stages and then so many tests will be performed on this data as well so evaluating something that you have generated performing various tests on it is extremely important Oh most of the time this is overlooked but then these days everyone knows the value of evaluating your database so this again involves reviewing all of the steps that are again needed to carry out or carry out to construct this particular model and to perform tests on it is very important and with evaluation.
That's exactly what we do guys so the next life cycle concept I want to tell you guys is deployment deployment is the last step or in the data like in the data analytics life cycle because deploying is when you're sending out your model into the world for let's say by the world I mean let's say for your team for your pure so let's see even for your client and customers as well so or making your data go from just using it all the way to spreading the data and perform it spreading the data to your clients to your peers or anyone else if you know you can perform more tests on these as well so after deployment you can have after deployment tests if something is wrong again you can go back to modeling perform or evaluation perform or deployment as well so the thing you need to know here is a deployment pretty much goes about to be the final phase of the data analytics lifestyle guys and on that note or we can check out very quickly what are the rules of analytics in various industries around us guys again data analytics has an amazing insight and impact when it comes to telecom industry because again if you've been observing all the prices of let's say the calls the messages or the internet packs these days have been coming down and down and there was a time when they were being exorbitantly high as well so again the telecom industry realized that if you keep the prices very high to make more profits and eventually your customers will not come they will not bite internet packs so to keep this in mind probably they decided let's say we're having a bad impact right now so let's drop down our prices to see if it works and it is I guess and this has helped telecom industry in bringing better business and for us as users as helped us or you know just make it a bit economical and efficient on our side with respect to money as well and then the retail banking industry of data analytics has a huge impact than the retail banking industry as well to know what the customer wants because again with telecom industry as well they have a huge amount of customers to play with right so they need to understand the view of each customers the requirement of each customers and then to find out if all the customers are actually in the common chain of what's being supplied by the bank or if the customers are against something that's being sent by the bank so that again is a very important thing that you need to check here as well and then with respect to the e-commerce industry as well to performs again some analytics in the e-commerce industry it might be our recommendations or there are very big sales which are run by some of the big names in the industry such as Amazon Flipkart myntra and so much more so these guys will have to perform extremely heavy analytics on the data that they see based on the products that they sell and based on the places based on the city in which the product sells or if the people are unhappy with the price again performing analytics has just changed the e-commerce industry or if you make guys and the land last most important industry where raw data analytics has talks to the healthcare industry this has had the most impact with respect to analytics in the healthcare industry with respect to so many things mainly does respect to with respect to finding out what medication is required by what countries what amount of medication is working for what population and so much more guys I can probably talk about just the roles of analytics in these industries for days together and we can still be going on and have a very good discussion of how important analytics has become these days again with respect to insurance as well what what geographical location requires insurance what what what the audience are and so much more so again as I've said how we can go on talking about this but then to keep it to the scope of this tutorial we can just quickly brief through each of these guys so with respect to the healthcare industry this is the formal way of going about analytics now the first one is to again analyze all of these disease patterns analyze all of the disease outbreak that pretty much goes out we had a disease outbreak a couple of years back which is Ebola and so much more weird h1n1 and so much more right to keep a track of all of these outbreaks this basically again improves the surveillance with respect to health and then this gives out good responses to the emergency sectors as well guys and then development of better targeted preventive techniques obviously and then the development of vaccines are making sure use vaccines again reach your customers and all of these where you need to reach your customers or let's say in this case the patients are all very important guys so identifying the consumers again in this is the greatest risk of business so identifying certain customers or patients who are at the greatest risk is again you know because they might be developing some adverse health outcomes and then developing welfare programs to keep a track of their health to track their health on a daily basis even a weekly basis monthly basis to perform analytics or all the aspects in the parameters that you're tracking with respect to the patients again that is a very important thing and lastly to ensure that they can already use readmissions because they might know what the cause of for an adverse effect is and if there are 10 patients with very similar symptoms then you can perform analysis and no you know these filter out and find out that all of these French patients might have this one common symptom associated with them and this might be the cause of that so mapping that for every patient is again very important case so coming to the telecom industry again any comment as to you pretty much goes about using predictive analysis to gain all of the insights that they need to make better decisions to make faster decisions and to make more effective decisions again as they're talking about the internet pack example again this was very key in that and then by learning more and more about the customers daily and the preferences and the needs these telecom companies can be more successful in this extremely highly competitive industry as well it's good for them with respect to business and it is good for us as customers by bringing down the prices again so it is used for analytical customer relationship management it is used for fraud reduction it is use of bad debt reduction to surprise optimization call center optimization and so much more so now that you're looking at data analytics in this way you realize that data analytics eventually has a big play or has a big shame when it comes to any of these business models right so again coming to banking as I've already mentioned another Texas making banks become very smart day-by-day guys so it is managing all the the plethora of challenges that the bank faces and then again while pretty much all you know going about doing some basic reporting all the way till descriptive analytics this is all a must for every single bank right even performing let's say advanced or prescriptive analytics are so much more and all these starting at this age or banks have started to realize that this will again help you generate very good insights and this will result in extremely good business impact and will help the banks as well on that we need to check out how data analytics is helping in the banking industry as well right so it just used to acquire and retain customers it is used to take fraud which is again extremely important with respect to banking it is used to improve risk control find new sources of growth for the bank and to optimize all of the product and generate their portfolio models as well so as soon as we check out the banking sector again with the e-commerce industry right so this is again this is the market which is exploding for the last couple of years I could say even the decade right eBay came up Flipkart came up Amazon is again taking over everything my cock guys there's so many e-commerce portals today and to make sure we perform very good analytics on these e-commerce industries is very vital so how was your date analytics used again it is used to improve user experiences it is used to enhance customer engagement customize offers and promotions maintain effective supply chain management optimize pricing models minimize the risk of frauds provide them very good advertisements that pretty much helped them pick up products good recommendation systems where they'll pick up another product after the first product guys and so much more if you've just bought an iPhone again you will pretty much be recommended with a couple of cases that the people have bought as soon as they bought another iPhone so you might be that person you might like the case and you might pick it up at the end of it you have the case to protect you've won the businesses create and more money out of it right so again the analytics or the role of analytics in e-commerce industry is it's extremely vital let's see the people in the e-commerce industry have known this for a while guys so coming to the analytics in the insurance industry again here as well guys is basically used to enhance your customer engagement acquire new customers retain the existing customers make sure the customers don't leave prevent the frauds at the end of it venues the frauds prioritize all the claims that need to be you'll have medical insurance you'll have fog and health insurances oh you have life insurance you have so much more that you need to take and all of these directly impact the user right so making sure you take the feedback from the user work on it and then create some analytics out of it is again very vital guys so on that note we can quickly come to our Rock case study which I was talking about and this case study is a very famous one it's basically the house of pies our case study and here all we'll be doing is predicting house prices guys so we'll have a certain set of data which we will use to predict the prices of the houses so basically how can we predict the price of a house there are so many things that you will need to know right you'd be looking at the locality in which the house is present you'll be looking at the amenities you'll be looking at the number of bedrooms the living space the number of floors in your house or the number of cars that can fit in your garage the size of the garage the quality of the construction of the house if it has a swimming pool or not if it has or you know a spa or not I mean so many things if we have to list down or this particular use case then it will be extremely tough because each one of us has our own judgment of how we can validate a house right because house is something that's very personal to us again the materials of what was used to go to build the house the style of the house which is built in the number of four you know if the house has an elevator how convenient is the house for disabled people guys so much more so basically we'll be performing exploratory analysis on this guy so exploratory data analysis against used to find a hidden trend in your data by performing analysis on it and then at the end of it the trends will be shown as numbers but since we already know visualization we're gonna be pretty much using these numbers to visualize all of the data for us and you'll be doing it step-by-step so we'll be finding correlation between the data as well so again correlation is basically to check how one variable is linked and how the changing of one variable directly.
Ah you know changes the other variable as well so how these two variables are all pretty much hung up together how changing one variable change the other or can be known using correlation as well guys so a couple of steps that pretty much is generally followed now in the case of performing exploratory analysis first we'll be visualizing all of our data finding the missing values and we'll be looking for correlations guys and then after this is done we'll be cleaning the data to check if any issues are faked or will be checking out of the data that we've have we have is pretty much being used fully or not and then we'll go about building a model which is used to visualize a result it will give us the diagnostic it'll give us the residual diagnostic roc curves you know charts graphs tabs or tables and so much more guys.
I do not want to basically overwhelm you with the use case so to keep this use case very simple we will just perform our exploratory data analysis at this stage to find out correlations between the data and as soon as we go about progressing with respect to our data set and to work with you will understand how beautiful data analytics is guys so let me quickly jump into google collab which is basically a jupiter notebook hosted on the google cloud and here we can go about performing our raw data analytics on the use case that I just walked you through so we can need a couple of files to run our own use case I will ask you need one file which is our training a dataset file let me just quickly add the file to our Google column and then we can go about receiving performing our analytics case those will just take a second to upload the file give me a second we just actually need one file from out here but then it doesn't harm to upload it and keep it in your runtime but then you just take the message right so particularly pretty much all of your files are recycled as soon as the runtime has pretty much changed so the first step for our use cases to load all of the necessary files in the libraries that we require guys so the first again we'll be using pandas to handle all of our data will be using Seaborn and matplotlib to perform plotting operations or all of this data to perform visualizations on all of these data and the style that we'll be using is pretty much called as the bmh method and with respect to BMH again bmh is nothing but the Bayesian method for hikers and this is a type of a graph visualization method which gives us graphs which look nicer and then it helps us to perform analysis better on linear data's it's guys so the second thing we'll be doing is again loading all of the necessary files the one of the important files we need is the training data set which is this and here you have the ID of the houses you have the subclass of where the AUSA's is present you have the zone in which the house is present what is the area of the front X that you have what is the area a lot of your house what is the street it's a what is the alley again a lot shaped lot contour what is it your the house was built in order to is really modeled in what is the size if the roof you know again so many conditions out here what does the foundation made of how is the quality of the basement condition of the basement the exposure to the basement again finishing type of the basement guys you know just look at how expansive or this data set is this is again a data set which is extremely popular among us analysts there we pretty much like to work on this because it comes soft everything and you can perform so much on the single dataset and so much more organized so on that note let's quickly find out the information of all the variables in our presence again if the housing is heating house equality control of the heating just a centralized air-conditioning how was the condition of the electricals what does the square footing of the first or square footing of the second or what does the low quality finish that saw and how many square feet of that we have or what's the living space area how many full bathrooms we have how many half bathrooms do we have our kitchen quality guys this will go on right so all these data is what we need to check out again if you see here all of these are the values that are present which will help us map something but then Ali again doesn't have many values you can check out here itself right so Ali its nem is basically not a number so there are not many you know Ali or details which we can make analysis out of so we will not require Ali again coming down not much of fireplace quality as well we do not have pool quality control at all miscellaneous features are very less again even fencing is very less so the average is somewhere around 400 right so to make sure that are cleaning up our data a very important part of it is to basically how we go about doing it again I will just move all of these data which are less than 30 percent Oh so less than 30 percent off again or 1 4 6 0 and then we can have at least 70% of the data to give us some accurate results right so as already checked Ali ID not every houses an ID so that's removed not every house is mapped to an alley a pool quality control is not their fence is not their miscellaneous which is a very less so we have dropped all of these columns and will not be using thieves to perform our analytics s and so again to describe how what goes on to you know do a distribution I hope you guys know this concept of our normal distribution and and with respect to all the details that we can get out of it when we perform the math operation guys so basically we can count the total number of data that's present with respect to all the individual data what does the mean or sale price of the data what does the standard deviation let's say the mean of the normal distribution is the right and the center somewhere here guys so the mean is somewhere around 18,000 right oh I'm sorry this is 180 thousand oh if you just keep tracing from the center point down this is somewhere where 180 thousand exists guys so again what does sound Sanger deviation is basically the deviation from the mean so what is that there are house values which deviate from this mean as well again 25 percent deviation is 50 percent deviation 75 percent deviations and then what is the maximum sale price of the house as well so all of these can be found out from this particular graph case and if you can already observe and even if you're not exposed to a normal distribution this starts out as a very steep curve but then it ends out with respect to a lot of data here as well it goes on until now 800,000 so basically probably from 5000 or hav even four hundred thousand all the way till eight hundred thousand we call these as outliers these are called outliers because these are very far away from our normal distribution and then these actually might not be useful for us with respect to our mean or whatever and these will impact a lot when we are performing analytics with respect to the mean or standard deviation or anything for that matter so you will have to actually remove them and not consider them to basically perform very accurate analysis guys so on that note we can pretty much go on to finding you know the type of the data set from the type of the data that will only consider because in this particular case since you are playing with numbers it has to be the numeric data type right again you can check out we have integer number the floating numbers and so much more so that's pretty much go on to print or what it looks like after we have what dropped or the values were we not using we're not using ID we're not using Ally we're not using so much more right so these are all the numerical values that we'll be using to pretty much consider again your built as a numerical value 2003 is a date I mean your overall condition overall quality all these can be rated from a particular scale right so again a square footing is a particular number as well so the second floor of this particular house has 854 square feet so much more so on that particular note as soon as we check out all these numbers are present we can start performing an analysis guys so before that again we need to just plot all of these to just check of what it will look like on graphs because seeing numbers are one thing seeing graphs on the other hand or something else so it so pretty much will be you're developing histograms and we can be checking out this so let me just scroll down a little so with respect to first-floor square footing the mean is somewhere around here right sits around let's say 500 square 4 thousand square foot and again look at a second for the square footing look at the bedroom average basement finishing qualities or garages so number of cars that you can park in the garage and look any value here - so the majority of the houses here have two spaces to power to park your cars the year that the garage was built in again autograph your living area how many half bathroom so you can see one half bathrooms again at this point of time you have certain value said zero as well well sure we can consider values of zero if it's very important but then if since we're talking about sale price what is present is more important than what is absent right we need to have something descriptive for our analytics methods to work so in that case we'll have something called as the golden features list and this variable will basically contains all of the features that we'll be associating with respect to why our sales price is as high as it is guys so this variable called as the golden features list will have all the features guys so basically we're creating a variable where we're finding out the correlation and then we can already check out the top 10 correlated values which are strongly correlated by correlated again.
Let's say let me do your quick impact basically we have described this in the descending order as you can already check out so the first thing here is overall quality again to transcribe this into literal terms overall quality of the house whatever the rating that was given in our data set is mattering the most of how the houses is being priced again the living area of it the number of garages you can park is having almost 64% impact of four with respect to why the prices like that garage area is having a 62% impact the basement square footing has a 61% impact and then the year it was remodeled and you know changes made as having a 50% impact of why the house price the sales prices like that case so looking at a couple of four linear relationships you can pretty much find out that a lot of values are zeros that I just walked you through a couple of seconds and look at that o with respect to or again the ground floor living area we have a very little number of zeros with respect to the sales price again or check out the basement door surface area again with respect to the total basement surface area again even here as well so look at all these tiny dots which are sticking up to zero with respect to sales right so these have no impact for our sales price because they have they're not giving us any valid linear deal okay this again with zero oh it is raised up to a lot about $600,000 right so all these are not adding any meaning to our data so if X is equal to zero again this might indicate that that house does not have that feature at all so if this is zero this house does like I mean the Lord houses do not have pools here so pool area is zero so in that particular case we need to remove all of these zeros so that we can go on to finding more correlated values that we can actually use to go about working with in case again here's all of the correlated values that we found as soon as we run this command basically we're sorting it again in the descending order to find out what helps most and you can see that it's almost 80% of the total quality of the house which matters when you're buying god the because of the price of the house and and our people are actually preferring this living area 2nd floor surface area it has a 67% you know chances of affecting the price and so much more and check out what the golden uh features or list looks like with respect to all of the strongly curated values which is found again if your remodeled your built also much more it's a total surface area again number of full bathrooms the first four surface area of the garage area total basement or square footing the number of cars you can Partin the square footing of the second floor the living area and the overall quality again in this particular order from the least to the highest is exactly what we're trying to find out with respect to exploratory data analysis guys so just looking at the data said you could never figure out why the list price of a house was so much and once we break down into simple terms like this we can find out that there is an 80% impact from the overall quality of the house or which the user is singing out to just consider the house or not if the quality is very less he will not pick up the house if the quality is higher than sure he will pick up the house so 80% of the reason why the price is set like that is a very important aspect of for why the house prices are casing study is important guys again this has been a very important very nice data set to work with and you will get a lot of analytics it can be done using your this particular data set as well guys so a quick info guys in case you are interested in doing an end to end course in data analytics please do check out intelli.
Pat's data analytics training course it will help you to master the concepts of data analytics thoroughly details for the same is viable in the description box below.
Please do check out hey guys hope you all liked the video have any questions clarifications concerns please do put it down in the comment section below we are happy to assist you so thank you so much.