Data Science & Machine Learning for Non Programmers | Data Science for Beginners Intellipaat


Hi good evening everyone, and good morning to my US folks and people of other geographies who have logged in to this live session on YouTube, I would like to welcome you all for this open discussion and the forum for Data Science specifically designated and targeted for the non-programming background. So, if you guys are good to go with that, I think it's the right time for me to take you up with this entire conversation. For the next couple of minutes, I would quickly walk you through with the agenda what we have tried out capturing for this entire communication and for this session. So, having said that part, let's go and have some keywords and the takeaways what we are going to cover and discuss in today's session. So, very first part would be to talk a bit on the data analytics, what the data analytics is all about. We will try to have a brief understanding of what exactly the data analysis means to us, and why it is so popular in this 21st century and been called as a sexiest job as of now. Then, next part of our communication and conversation would be to talk a bit on the Data Science and Machine Learning because everyone understands that these keywords are very much saleable these days. So, we must have the right clarity about what each of these things stands for us, what does that convey. The third key point here, what we actually are here for, is to focus on the Data Science and Machine Learning with non-programming perspectives, and how this can be relevant for us to shape our career and to look for a better future in this space. Then, we'll spend our time to discuss and learn the differences between the various keywords related to the Data Science and analytics industries like Data Science, Data Analysis, Business Analysis, Data Mining, Machine Learning, Deep Learning, and Artificial Intelligence. So, how these things are similar or different from each other or being interchangeably used in the industry because many of the times these keywords are actually used knowingly or unknowingly interchangeably.

So we actually have to have a right clarity between all these terminologies. It is very essential so as to get the right clarity. The fifth part would be to get through and understand the tools that could be relevant for us being there from the non-programming background and will try to capture few of the insights and the overview about those tools, how they can be helpful for you to look for a better career. The last part would be to talk a bit on the case study which I have identified for generic purpose which is very much popular and frequently used in the industry. So, these are these six keywords or you can say the takeaways what we are going to cover in this session one after the another. So here we go with the first and the very important aspect, and let's have a discussion on the analytics as a whole. Now when I am talking about the analytics, we have to have a right clarity with what the data analysis is all about because we are living in the world where each and everything is very much surrounded with the data, and if you will talk about the current scenario where we have everything quite interconnected with the Internet these days, so more or less every move of a human being is generating the data. So, if I would have to talk about what is data analysis, in a simple sense what you can see on your screen in this slide, I would ask you to just go ahead with a simple keyword that data analysis is just like any attempt to make sense of the data. I hope this makes sense to each and every one. So the point is very simple: any activity, any attempt with the help of which you are trying to make sense out of the data. When I say data, it could be structured or unstructured, whatever the way it is coming to us. So, the activity what you are doing to drill down with the data in order to get some sense, some information, some interpretation from that actually comes under the terminology of data analysis. So, these are the supporting keywords you can see because we are living in a world which is surrounded by the data.

If we talk about the last couple of years and the future of the data, I would rather say we have a huge scope because, as of now, if you will take my experience personally, the data till date is getting generated by the online sources, the Internet, the social media, and the e-commerce websites, but going down the line in the next five years this Data Science is going to be more informative, more resourceful, and more significant for each and every human being on this planet. I would like to discuss this thing in detail. The reason being till date, the data is only and only getting generated with the online sources, but in near future, in next couple of years, I personally see that the data is also going to be generated with the offline sources with the sensors. So sensor data is going to be the next big thing. If you are moving in and out of a retail mall, the sensors are being placed at the entry gate of that retail mall will be also generating the data and would further be used for our analysis. So having said that part, whatever the things we are going to discuss and cover, be it a data analysis, data science, , so each and everything would be all and all about what these things are for. So, let me explain you what is data analytics. If I would have to talk about what is data analytics, I would simply be saying 'any attempt to make sense out of the data.' Now, why we are trying to put a lot of efforts to make sense out of the data? This is important, I mean, I could have ended up my conversation just by simply putting this line for you to understand this is what the data analytics is. Now the point is why we are taking much efforts to make sense out of the data to get the information out of the data.The answer would be in one single statement: to gain and to get the competitive edge. That's what the data is all about. I hope this single definition is necessary and sufficient for us to identify what is and why is data analytics so important for us, be it whatever you are doing or to any extent scaleably.

So having said that part, this is what we are going to discuss and cover back and forth. Let me take few of the more input so as to strengthen our understanding. There are several techniques used Now, the question is Why data analysis? I think this has already been discussed a couple of seconds before when I said that in order to get the competitive edge, so that's the bottom line of every statement. What you can see on my screen is: to improve the business requirements, performing real-time market analysis, generation of the reports and studies, and gathering the hidden insights. So, these are the four points you can see in order to understand why data analysis is so important, but at the end of the day, all these four keywords and the statement just simply highlights one common agenda and the context, that is, to gain the competitive edge, be it to reduce the cost or the expenses, or to increase the revenue. At the end of the day, these are the only aspects what each and every organization is intended to do. I hope that makes sense to each and every one. So that is what the Data Analysis is all about. Let's take this forward to understand few of the other things which would be relevant for us to know. We have been talking about data analytics, data science, data mining, back and forth again and again. The question always comes in our mind what actually the data is, or whether this makes any sense for us to understand these definitions or not. So, I would like you to understand the clarity and to have the right understanding with all these key words being the beginner or the fresher in the industry. So, I would like to start and put the quote that 'facts and statistics collected together for the reference or for further analysis' this is what you can understand in the context of data. If I would have to quote, I mean, based on my personal experience what the data science and the data analysis is, I would say 'the process of beating the data, torturing the data until it speaks by its own,' I mean, I can place that as a second quote at the top of it.

I would reiterate that thing once again I mean in case if you want to take this definition of data science and data analytics on a very lighter note, you can put it more like 'torturing the data until it speaks by its own, that means until it's giving its own information, just like any culprit speaks the things in the jail. So, let us now take this forward to understand what is data science. So, we always have these two key words being used back and forth again and again interchangeably or synonymously being called as data science and data analysis. So, I believe, it is the right time for us to understand what the similarities we have between the data analysis and data science and what are the differences we have in between these two terminologies. So, let me put that very clearly here before I walk you through with the slide. What is data science? Each one of us would be having different interpretation with this definition because data science, as a whole, is very subjective. If you take my definition, I would simply say in a necessary and sufficient key word that 'science of dealing with the data is data science.' I can put it like necessary and sufficient keyword and then in order to make it very easy and comfortable for you to understand, I can further extend that. When I say science, science for all of us is nothing but the collection or a process, collection of all the methodologies. okay. So collection of all the methodologies or techniques, tools or processes in place, that is what the science is all about. Now I'm just getting stick with that keyword in place to deal with the data. Now the second definition or the second part of this definition I am now extending. So when I say to deal with the dat,a what are all the things you can think in terms of dealing with the data to a certain extent, I can think of sorting the data, filtering the data, summarizing the data, merging, appending, and before that I can think of sampling, mining, and transforming.

So, these are the things which you can think. so whatever you generally do with Microsoft Excel or sequel is what comes under this tagline, or visualizing or maybe modeling at the end of the day. So having said that part, this is what you can understand in the context of data science that 'the collection of all the methodologies, techniques, tools, process in place with the help of which you are doing all these things with the data, that means, anything with the data. It's what comes under the data science as a keyword. Having said that part, let me further strengthen your understanding giving the final summary that it's an umbrella term or umbrella terminology. when I say umbrella terminology, that means, be it whatever you are doing with the data that actually counts under the data science. The reason why I am highlighting all these things back and forth again and again is because of a simple reason that most of the people here in this entire conversation might be having a confusion or a perception that data science is all and all about dealing with the high and advanced analytics concepts like machine learning deep learning or artificial intelligence or maybe to deal about the unstructured volumous Big Data, but my dear friends, this is a complete myth. When I say it's a complete myth, this explains each and everything that it's an umbrella term having said that part be it whatever you are doing maybe you are simply creating a report or an Mis dashboard on the lower side to the high end machine learning or artificial intelligence or deep learning concepts everything actually comes under the umbrella of data science this is what we actually have to understand clearly and in a write fashion and I hope this makes sense to each and every one as well so just to break this myth data science is not only and only about dealing with the volumous data putting some high-end advanced analytics using the core statistical softwares and the modeling techniques be using SAS R or Python focusing on only these areas no this is not only about data science when I say data science it is simply or even a single additional formula in there Microsoft Excel itself and I hope this makes sense to each and every one coming back to the further conversation to take this forward data science is the study of the data in a structured manner okay putting the things in a right framework is what the data science is all about it involves developing the methods of recording storing and analyzing the data so I hope you can get the context from this line that what the data science is all about this is later used to extract the useful information and to further use it for the analysis part let us now take this forward so as to understand few of the additional keywords and to take a deep dive with the data science part quantitative to help steer strategic business decisions that means data analysis could be qualitative or quantitative on the other side quantitative data analysis further helps us to steer when I say steer that means to derive the business decisions to strengthen them and you can actually optimize the resources sales force marketing strategies the bandwidth available and the rest of the other things so I would like to I would like you to understand that with the help of data science you can actually fine-tune lot of areas in your business be it in terms of allocating the resources or be it in terms of allocating the human resources of workforce they are being engaged so this aspect of data science is about uncovering the fine uncle uncovering the findings from the data that means hidden patterns hidden trends or the ideas from the data this is what generally what we are using the data science for this is further use for understanding the complex business behaviors trends and the inferences it's about surfacing hidden insights that can help enable companies to make smarter business decisions so if you will just sum up with this overall point one thing is very same and contextual that at the end of the day data science is again used for us to drive our business in a right direction so as to optimize the budget allocations fundings resources and the sales force when I say sales for that means the human power engaged with that so just to give you a right clarity okay not going much beyond into that detail say for an example if you have four seer rupees for an organization and if you are just allocating that one CR one CR one CR into four zones like for print media for digital media for the social media and for the other activities so if you are blindly putting that once here for all these four channels there might be a scenario that print media may not give you that much return okay out of one CR it may be in a loss giving only 50 like return but on the other side if you are putting one CR money for the digital media or for the social media campaigning or for the marketing it might be possible that it can give you a return to force year so this is something which you are doing without analysis but with the help of analysis you could have reduced this funding on the print media it could have further added the part there in the print media so these are all the areas where you can think of using the analysis and the data science part and having said that part I would say data analytics and data science is more like a fun job these days because you actually like and try to see the way how the inferences are getting generated I mean I would like to share one of my example particularly I did one project recently when I was in USA for the leading retail industry that's called I would probably not give a name to that so I actually did the detailed analysis on the data which I was supposed to mention and talk about and to give the culture intensity to the retail largest retail channel to tell them where a particular product has to be kept in the entire retail store being spread in and the I mean very large space what should be the position of that particular product at what height a particular product should be kept at what angle a particular product should be kept so if you can see like if a store is having a space of 10,000 square feet that's a big area really okay then having a range of somewhere close to 3,000 products when I say 3,000 it's like 3,000 varieties different different products so we actually did the read detailed in size and the analysis where we were telling at what location in the entire store a particular product has to be kept I mean in which shelf it's at which position should that be at the end of the store or should should that be right at the beginning of these stores the retail model and at what height and at what which angle that means we were just creating a 360 degree program for that so that is where and to that extent you can do the data analysis taking this forward let's try to put some more understandings with the data product so you can have a lot of examples I mean most of the companies are actually having and driving their business because of the evolution of the data science itself right so if I will keep on talking about the application areas of data science and data analysis I mean this one hour would be very very less for that part if you are just moving here and there your Netflix Spotify all the mobile apps I mean one thing you would still be wondering right I should not actually be talking about all these things but one thing is very common these days if you are downloading any application even that application is focused on the data or not let's suppose if you are downloading any news application right so they would ask you they would probably be asking you to get all the access to the locations the folder the photo print media so whatever you have in your mobile they will ask you to get the access to that right why is it so because they want each and every minut information of you being an individual if I would know not go far beyond all these conversations three or four years before there was a rumor in the market that Flipkart was planning to shut down their url-based website and was planning to go 100% on the mobile app if you tell me the reason why is it so I mean you will find thousands of the reason but most of you would be correct in their understanding there was a one particular strong reason why they were planning to get complete hundred-percent operating operational with the mobile application because the web URL based services could not be that personalized but if a person is using their smartphones and purchasing something that piece of information that means that data is more reliable and you know this has been commonly said if there was going to be any fourth World War that is for sure be happening because of the data only right if you are just going through with the daily news channel so us a couple of day before yesterday I was watching through the news USA has canceled the license for whom why Chinese based company because of the data privacy issues only right so this has been commonly said Jesus if there is going to be fourth world war that is surely be going to happen because of the data issues so having said that but I just wanted you to understand the importance of the data the peels of the data I mean not going far beyond there was a couple of weeks before there was in popular app right I mean the app was to show you how old are you looks after a period of time right and there that app that F was AI ml base rep I don't remember the name of that application but that application was a DI and ml based application but there was a huge controversy and debate over that application like how relevant is there for a user of that app because eventually you are actually sharing most relevant data and the piece of information with the company so if you will scroll out to that part you will find thousands of the controversial statements and the blogs and the articles of what that particular app it says so data is of course important your personal information is very very fruitful and I would say price priceless for for the organizations to help them they're identifying their business strategies that's what all I can talk about let's take this forward I mean you are drilling down the organization's are drilling down with the data with the help of which they are defining the data product what they should be coming into the next part right this all happens because of the insights and the analysis all right having said that part let's take this forward and have us further conversation over one of the most common topic and the widely used term in logic these days called machine learning as you can see on my screen it is very very crucial for us to understand the machine learning in a simple sense machine learning is a term which is widely used in almost all the fields ranging from simple optimization of advertising of all the ways of parting the quickest space path and navigation systems to Mars so I mean be it whatever you are talking about machine learning has been a part and parcel of our conversation having said that part if I would suggest you if you would ask me to tell you what is machine learning for for me I would simply say the way how a human being learns from our experiences similarly is what you are actually making the algorithms or you're creating such algorithms which can learn by their own this is what the machine learning is all about let me put that very quickly so if we will talk about what is machine learning this is all about creating such programs or you can say algorithms or processes in place which can learn and evolve by their own when I say Walt Byron that means if you will take the analogy just like the way we human being human being learn from the experiences and evolves this is what ecology machine learning is that means you are actually creating those dynamic and generic programs which can evolve by their own without any human intervention that is important what do you need to know that's without human intervention okay see this machine learning is an application of artificial intelligence so I would be talking about all these differences what is statistical modeling data analysis predictive modeling machine learning and how it is different from the artificial intelligence machine learning is actually a part and parcel of artificial intelligence that provides the system the ability to automatically learn when I say automatically learn so be it whatever you are using let us take an example of YouTube or Netflix they are if you are watching any particular your honor of the videos in YouTube you will get the recommendations only on that part right say for an example if a person X is there and he is consistently watching a lot of sports related videos there on our YouTube channel so YouTube will keep on recommending him or her the videos related to sports only right but maybe after a V this X person change his taste and started looking for the music video so recommend nation engine will also start recommending him the music video that said that this algorithm was the same simple single algorithm no one has actually did define tuning from turning from sports to music channel okay sundar Pichai has not actually allocated a one particular data scientist to take care of your likings and disliking that earlier last week you were interested for sports and also there was one person sitting behind and feeding this portion you'll know that's not the scene all I want you to understand is that the algorithm is same which got a world by its own without any human intervention that means there is no as such a dedicated data scientist I saw sitting in the headquarters of Google there for you to take care of your likings and his likings right so this is a standard program a general program which is having the ability to automatically learn and I hope this example is good enough for you to understand the capabilities what this machine learning brings to our business and to strengthen our decision-making after learning they improved experiences without being explicitly programmed that means Google has not done any explicit programming for your recommendation engine to turn it from source to news music recommendations right it's the same l Gordon this technology focuses on the development of computer programs that access the data I use it learn for themselves so Google driverless cars Robo takes all are the implementations of the machine learning which actually enhances learns with the data they are experienced with this is what typically the machine learning is all about let's take and focus on the ideas and before I jump into that conversation it is very very crucial for me to highlight the core idea talking about a data science and machine learning because machine learning is actually a part of your data central data science is an umbrella term right so if I would actually help you to understand all these things you can put it more like a Venn diagram where you have statistical models the general modeling criterias then you have this one overlapping with the machine learning so machine learning is the automation of the individual fine-tuned programmed model which we just have discussed if you want to attain or achieve more accuracy in your machine learning model then you actually go with the deep learning so this is your deep learning guys deep learning is nothing but having multiple layers of machine learning algorithms is what the deep learning is and collectively your ml and DL part being associated with the computer science softwares and the hardware equipments is being called as artificial intelligence so I can very well say that artificial intelligence is actually an amalgamated form of the ML machine learning and DL deep learning at the end of the day so but all these things be it whatever you are doing is coming under the umbrella of data science so we are focusing on data science so the point is who is a data scientist from my side a mathematician a person who is having us a bit understanding with the mathematics plus statistics more importantly he should be a problem solver and that means you should have the problem-solving abilities and the thought process that is what more important we have in place plus story teller plus visualiser who can visualize the data and most importantly he should have a detailed understanding with the domain so I am highlighting this part just for a simple reason that I mean there is one thing left which is what we are focused here guys programmer okay this is what has been commonly set but there is a myth a person coming from non programming background can also or I mean this data science is not only and only about the programming part ok then let this is what I wanted to clear get clear with this statement so only programming is not what it is required to be a data scientist a person should have a fine and at least 10th standard level of mathematics understanding a bit of statistics core understanding the basics of some way of understanding the platform's problem solving abilities what makes it important to retailer visualization and the business domain this is what I wanted you to capture this is what I wanted to put that in the single frame and this has also made that a person coming from the programming background can also can only be a better data scientists know that submit that's a complete myth I would probably be putting that in a more stronger message so this business domain is very important and that's the reason most of the doctors pharmacists financial ca's or space I have of lawyers me my friends here are doing wonders in their respective fields being the data scientist why because you need not to be a hardcore programmer only you need not to be a statistician only you need not to be a mathematician all it is what required to be a overall business understanding and the ability to put that into a framework that means you should only be having the basic basic understanding dealing with you Microsoft Excel and sequel part that's all what it is required to be a data scientist because if you will talk about these statistical platforms we have says our or peyten primarily being used in this current situation but pupils are people are doing wonders even they are not coming from the non programming background so this is what the next discussion area what I would be having in place in order to get you the clarity with the market requirement and the overall scope in the industry so let me take this forward programmers prefer to understand the data set and work with it as efficiently they can programmers make it easy for users further down the chain that means they can write their programs to write the extensive logics implement their logics for the various purposes like for the high hard core data visualization aspects or for the storyboarding but what is there with us we have being from the non programming background that is what the question mark is right and having said that part of is you could understand this reason let me tell you so before I come into all these conversations let me take you back a bit only ok if you remember the data science in this excel file I captured two things right that means reporting Mis - both till the high end machine learning part correct that means data science or data analysis is actually of two parts one is descriptive I am talking about the predictive part so these are the two sides of the coin we have descriptive analytics is just to use the store achill data and to get the meaningful insights of what it has already happened and for that you only need not even the programming background you only need simple class sixth or maybe like I can on a higher so I can put class tenth mathematics that's it which primarily requires your aggregation and summarization isn't it so like finding the total mean minimum value maximum value average and so on and so forth correct this is what generally we do on the descriptive side just to make it very easily understandable even if you are looking of a balance sheet of a very big corporate MNC then on their only they are using the basic averages totals and the mean values even though that is put in a very large scale of finding the quarter-on-quarter result or maybe year-on-year result isn't it so but at the end of the day if we look into the detail they are anyhow just summarizing the data and finding the total or maybe to certain extent averages mean value maximum value and so on and so forth so that only requires the simple class ten mathematics in order to get survive with the industry a lot of examples could be there was an average salary of an employee in a company and I don't think we need to have a programming background and the understanding so as to get these insights that I have the salary of your company the attrition ratio of your organization the pass percentage of the students in your college the placement ratio of the students in your college so all these things primarily does not require any programming background isn't it so it may be done this just by using the simple basics class and mathematics and for that reason we have two basic tools to talk about that is Excel to certain extent you can target on this equal and on a higher side tableau for visualizing those basic so that is what I am going to reiterate here that you need not to be champion in programming or it is not required for you to come in from the hard core programming background having a paddle I'm in every understanding of C C++ Java no that's not really required if it would have been the scenario then my dear friends all the people working in the IT industry like taking few of the names like TCS way Pro and forces must be doing wonders who are the Hardcore Champion in java.

Net Perl PHP right so they must be doing wonders in the data science base no but that's not the current correct fact the fact is for you to be a better data scientist you have to have a problem-solving abilities you have to have a business understanding and the basic idea about how to survive in the industry and that is what it makes it important so for non programming background who are actually looking to attain easy analysis and visualization of the data without having an in-depth knowledge of coding so there are I would say if we will talk about the proportions still in the current situation 60 to 70% of the work still lies with with those areas where a person probably need not to have a programming background 60 to 70% of the work in the industry the opportunities in the industry still lies to those areas where it is really not required for you to come from the programming background they are actually looking out for the tools and analytics which have an easy user interface to work with and avoid them the enticed output let and did identify the differences between the core keywords like the data science and machine learning I believe I have already discussed that a couple of minutes before that machine learning is actually a part and parcel of a data science whereas data science is actually an umbrella terminology machine learning is a specific way with the help of which you are creating algorithms which need not to have a human interference this is generally done on the large volume of the data where you are developing the proof of concepts optimizing the requirement tuning the algorithms furnishing the proof of concepts further and innovating the existing data whereas data science is the overall terminology where you are focusing on the career growth enhancing these skill sets whatever it is required for you to understand and analyze the data operational research experimental design and the data economics so if you are not able to grasp all these or maybe not able to relate all these terminologies I would still be liking you to get stick with the previous conversation that machine learning is actually under the umbrella terminology of data science where I can say this is one another context out of that we have deep learning as well which is the multiple layers of machine learning is what we call it as a deep learning and then we have AI which is nothing but the inclusion or amalgamation of machine learning plus deep learning with the computer science concepts or the hardware equipments like robotics driverless cars or everything is what what we can see as a part and parcel of our AI and all these things comes under the data science so data science is actually a umbrella terminology I hope this is very much clear with the everyone here let me now take this forward with whatever the time remaining we have let us try to get more comparisons between the data scientist and the business analyst people working in the BPO industry is also being designated most of the times as business analysts and people who are coming from the AIIMS or IT backgrounds coming in the leading MNC is indifferent and analytics actors are also given the designation of business analysts so what does that really mean ask because the earlier business analysts focuses more more on the database design part project management data optimization and designing the report that means this person is more inclined or having the detailed business understanding with the help of it they actually try to put the approach how the saw problem will be solved and based on that we further use the data science methodologies so business analysis business analyst is most likely a person which actually tries to understand the problem statement of a specific domain be it like the financial industry the legal industry the healthcare industry the retail industry and then try to identify the solution using the data science skill sets I hope that is what you need to understand in a write session so business analyst is a person who try to look the overall problem statement in a wider perspective and then try to find the solution using the data science concepts data scientists as I discussed is more about leveraging the skill sets and finding the solutions so as to help the business analysis or the business analyst right so this is very much there in front of you taking this forward I am looking to spend the last 10 minutes form a case study the one which you would probably be liking there so for data analysis specifically for those those who are coming from the non technical background Microsoft Excel is one thing which everyone should be equipped with with or without any concern this is something being called as the basic hygienic practice in the industry ok people will not ask you to get skilled with the Excel specifically in the job description but if you don't know Microsoft Excel being there as a data analyst or data science professional that is still very serious and it's more like a crime to be there in the industry right so this is famous and used to any extent widely used for the statistics and the data modeling and to easily learn that it can actually integrate with the datasets easily so most of the times I would rather say that it percent of the work in the industry still happens on the Microsoft Excel even if you will include the big force when I say big for the big organizations like Deloitte KPMG II nyn Deloitte KPMG environed I just missed the last one I mean sorry for that PwC yeah alright so we have another tool that's being called a stab view so these are all the tools what you can think in under the umbrella of data analytics this is free tool which connects to any data source the creator data visualization and maps supposed to relate from the difference so this these these are the tools which is primarily used for the data handling and data visualization purpose specifically the tableau we have other tools along with the tableau which is very much popular these days like power bi QlikView and these are the tools which requires some working with the data used for manipulation and the topmost part is says SAS is one of the most widely and frequently used statistical software for us to handle menu plate and analyze the data the reason why it is so popular is because it has all the capabilities available within that which primarily a statistical tool should have be it to process the data transform the data or to visualize the data or to statistically do the machine learning and other things there so we have all the available capabilities of in there incest and that is the reason this is one of the most favorite tool among the professionals even though it is having a have heavy licensing cost so one thing is very important that this SAS is very very fast when I say very very fast I can share my personal experience I have worked on a one single SAS data set when I say data said that means that most it's more like a table of 40 GB that is the speed and the extent what says cells to you then we have rapid - rapid - again very frequently used tool which is integrated with the data science platform so you can do the statistical modeling into that it can actually integrate it with the other software's and the platforms where or the our DBMS I would say like Excel Oracle or maybe sequel Tara data in order to get the data and to do the statistical analysis or maybe machine learning analysis for that can make the use of real life data - when I say real life data that means I have it I can also pass into that and that's the beauty of the rapid miner tool we have in order to get you comfortable with the overall conversation I would like to quickly walk you through with one of the very basic and very interesting key study that is primarily based on the segmentation and clustering on the RFM part the RFM is I would talk about the RFM let me make it very clear the RFM stands for the recency I would first of all give the context in the next five minutes and then I will walk you through with the steps which you can think of implementing be it whatever the tool either it could be done on simple Excel also or may be you can write the solution in our platform or may be in Python later on that's not a challenge so before I jump into that part my dear friend the entire requirement in the industry is all about the data science right skill sets required there in the data science when as a data science means how you are putting that in place it hardly matters for a professional whether you are doing that which says or with Excel or with R or Python these are just these different different platforms and the tools you have each one of them is having their own advantages and disadvantages maybe who knows tomorrow you have something better than these tools right but these skill sets required in the data science will always remain same that means how you are analyzing the data it hardly matters which platform you are using to analyze this data so my intention is to get you there so as to help you understand the importance the granularity and the significance of this concept of data science and for that only reason I am talking about the RFM case study so as to make it very very when not to get it specific with a particular platform so let us now try to try to understand the business problem statement and the solution of that in terms of RFM which is primarily based on the segmentation and clustering I will talk about the what the segmentation clustering is in a while but let me tell you the frequency the RF M stands for the recency frequency and monetary correct this is what the full form you have now before I jump and walk you through with the problem statement in a lucid manner or an intuitive understanding let us try to understand the idea what is clustering then we will talk why clustering and then how clustering clustering or segmentation are used interchangeably so not going into that part everyone here being professional in respective areas must be aware that each and every business organization or the enterprise wants to reach out their customer or wants to serve their customer in their very specific way right they would like like a Airtel would like to entertain me would like to serve me in the best possible manner to whatever the way it is possible for them with their available resources and bandwidth based on my requirement isn't it so being there in the industry let's take an example of Airtel would that be possible for Airtel to get 130 koror individual post paid plans for each of us individually no that is not possible even though I am saying that my requirement is 535 GB per month for the data plan I only want hundred minutes of outgoing call and want to have only an only 50 sms's so this is what my exact requirement is as far as my usage is concerned as I told you each and every organization wants to reach out their customer want to serve their customer in a very specific and the best possible manner so would that be possible for Airtel to create a one unique postpaid plan for me with all these unique specifications for sure not so what they want to do or what they generally do is they actually group the data in such a way that people having their similar likings properties preferences choices and characteristics should be allocated into one group so what I can do is instead of creating 130 crore postpaid plans I can probably roll out 10 or 20 different prospect plans right so that is what the clustering is that means you you are clubbing or grouping the similar clients customers in one group so that you can reach out to them in a specific manner rather than reaching each of them individually because that is not possible because of the costing issues bandwidth resources issues right so this is what clustering is all about having said that part let us try to understand our problem statement now everyone is clear what is clustering that means distributing the data dividing the data in in different different clusters such that each cluster is homogeneous internally and heterogeneous with another per one heterogeneous why clustering because each organization I mean I have I can walk you through with this one each every business units wants to reach out the customer in a perfect customized dedicated fashion but this is not possible right because of the resources constraints so they want to cluster the overall data into n groups so that instead of entertaining total k samples they can target the created end be it like three four or five groups collectively this is why the clustering is important so this is the example what we are now talking about and the case of recency recency means how recently a person has purchased the item so if this is generally being done on the transactional data and very very popular for the retail and e-commerce industry very popular because this is very easy but very significant let me tell you honestly I will walk you through with that part reason why it is so popular because it is easy simple plus very significant and that is the reason why I have picked up with this for our conversation significant so if you have the transactional data let us suppose I have this data can we get the transactional data grouped or summarized and can identify how what was the last time he did the transaction with us this simply requires the PI working in Excel or maybe you can do the group by summary in Microsoft SQL or you can even put that into the Microsoft Excel pi word table is required this is something which you can directly retrieve after doing some basic processing in the data or maybe if you are getting the data and the client level details you can simply get that having said that part I have captured the numbers like this is the last time this customer ID has done some transaction with us this is the frequency that means in here four times he has done the transaction with us and this is the total monetary value of the amount he has done repurchasing collectively these three parameters or attributes makes a lot of sense for me to identify how value how valuable each of these customers is so that I can focus on the relevant areas let me try to help you understanding this scenario I hope everyone in this conversation understands the percentiles decides in the quartiles percentile means the same percentile what you are getting in your competitive examinations right if you have appeared in your competitive examinations you must have score 98 percentile marks right so compression tile is distributing the data in 200 equal parts this is what we call it as a percentile this Eiling is distributing the data into ten equal parts and then we have quartiles which is very very important and robust for the statistical perspective which is all about distributing the data into four equal parts and I am probably using this quad tiling feature for doing this analysis this can be done in the Microsoft Excel or maybe incest are peyten I have done this deliberately in Excel for this conversation so that you can identify the flow and can understand the numbers there because my idea is not to get you through with the programs my idea is to get you through with the problem statement and with the solution part I would like to take five more minutes to conclude my conversation because this is very very easy simple and straightforward now what I have done is in this excel file I have captured the various quartiles you can see this is a simple quartile function I have placed there in the Microsoft Excel so this is the quartile of recency if you will look into that I can actually put that number the reason why I have used the Excel for this particular problem statement is because everyone more or less understand the Excel right so this quad tiling is done on the recency first of all then on frequency first of all and then on the monetary value I have marked all these values like if a number in the recency is between zero to twenty six this is first quartile 27 to 93 is second quartile 94 to 225 is third quartile to 26 to 474 58 is fourth quartile using this basic simple Excel formula you have highlighted each of these quartiles in front of that recency and similarly is what you have done for the frequency and monetary this is done in Excel with the simple if-then-else formula as far as the programming methodology is concerned same is what the logic you can write be it whatever the platform we have in place B it says R or Python all you have to understand is the way how the solution has to be drafted out and as far as this s is concerned R or Python is concerned your basic preliminary data handling understanding would be sufficient for you to write the code there so that is the reason I have been telling to you from the very first minute that programming skill sets are not essentially required all you need to have in place is like what needs to be done and how that has to be done that means approach and algorithm should be clear as far as the syntaxes are concerned you should not be worried about that even Google Baba is enough for you to tell this in Texas right all right coming into that conversation what I want you to understand is that I have marked all the quartiles like u 1 Q 2 Q 3 Q 4 where I can say Q 1 recency means this particular customer is a group which has done the transaction very much recently so having said that part just focus on this conversation can I say that the person if it is falling under the Q 1 Q 1 means first quartile our very active customers being them considering as active customer can i channelize the cross-sell or upsell promotional activities to them because if a person has done the active transaction very much recently then he is most likely the active customer means he is loyal customer correct compared to a customer who has done a transaction with us maybe like 6 months before who can guarantee whether he is there in India or not has been shipped to to some other place so q1 means very much recently transaction I can consider them as active so let's go for that can I say q1 if a person has falling under the recency of q1 or q2 or q3 they are risky customer because it's been so long that they have not done any transaction with us so they are risky customer can I start the retention campaigning for that of course yes I should most or moreover if a person is falling into the fourth quartile of recency only can you see that this is recency right can I say this is something called the churn customer that means it seems after a particular time period maybe like 12 months afterwards or 360 days if your as per your business definition they can be considered as shown customer so you can trigger the deactivation campaigning isn't it so so this is one dimension of understanding this problem statement where you can identify which one is the active customer which one is the risky customer which one are the current customers a very simple one isn't it so all you are doing is just simply going and splitting the data into various clusters or being called as quartiles primarily using the basic simple statistical I mean you must have covered the quartiles percentages in dissonance in your schooldays let me further extend that for the next one more minute being focusing on the exit of customer just focus on this area green-colored if a person has done the reasons I mean transaction very much recently that means falling under the q1 first quartile of the recency and having very much high monetary value that means falling under q4 this one of monetary and also falling with the q4 of frequency that means very very frequent so this person is very active being first in phillipe recency very much frequent being in the fourth frequency and very much high valued so sure and I offer them are pre mean like if you are doing this analysis for the credit card company so don't you think this category of the customer should be offered the premium card compared to what I have been getting the call from the credit card companies like hey I am giving you the silver card as compared to what probably many of you must be getting the calls form the premium card why because might be a scenario you are doing a lot of transactions and that too very much frequently and with the highly valued ticket size and this is how you can understand the problem statement so I can identify simply that to whom I should so this is a premium category of the customer so out of that can't I do this just a second guys so I can clearly define the status of the customer like I can put a new column very much clearly with these certain if-then-else conditions like whether a customer's active or risky or turn cut by just following these certain basic rules even if it is active I can decide whether I should offer him a premium or you can say if it is in the context of credit card whether I should offer him a platinum card or a gold card or simply a silver card based on is other locations of the quartiles so this is where now you can reach out to the individual customer and can target them accordingly so don't you think this simple analysis all I have done is just basic quarreling and then I am simply putting the if-then-else basic formula in Maxell and same as what you could have done in SAS or are as well or maybe in Python and now you are in a position where you can have a decision which one is valuable customer to whom I should offer a Platinum Card to whom I should offer a gold card to whom I should offer a silver card which are the risky customers so don't you think you are saving a lot of business if you are starting your retention campaigning to all those customers who are risky don't you think you are getting back all those customers who are on a verge of getting shown so this is this intention the significance of doing the analysis this is just one example my dear friends in industry you have thousands of such problem statements and the solutions can either be done on Excel or maybe says our Python or maybe sequel these are the platforms we have in place of course programming is not essentially required people coming from the non programming backgrounds are also doing wonders in the industry beat like I have lot of doctors my friends they are using all these capabilities or maybe like lawyers pharmacists financial marketing experts in the MBA backgrounds coming from the I am so doing the wonders so this is a complete myth that people coming from the programming background can only do the wonders in the design space no that's a complete myth and this is one proof I have tried my level best to put that in a simple sense so that you can understand that intensity with that set part I am ending up with my conversation I hope everyone must have enjoyed this conversation and in case if you want to reach out to purchase any course specifically with a discount of 30% you can connect with the Intel Abed team and can focus accordingly after the conversation ok so with that conversation I am ending up with my call and have a good evening to my Indian folks and all my US and other folks very good morning to you thank you so much thanks for your time and have a nice journey in your career happy learning all right welcome welcome back guys and good evening before we want though ahead and wind up the session I would like to thank encore which who are who was our instructor and I would like to quickly show you my training program which is the data science certification training so if you look at my website if you want to go ahead and browse through down you would see there is a Browse section under which you would see there is a category called data science and under that particular section you would see the first program which says the data science certification training cost so this is the training that we generally recommend for someone who irrespective of whether you are from a programming or from a non programming background if you want to go ahead and master your skill sets in the field of data science this is what you should be going for and getting started with this as if there was all your aspects of data analysis data exploration data manipulation data planning all of that in detail and in order to perform these analysis we also go ahead and we teach you our as well in this particular training so just to quickly wind up this is a 40 years of training that I have with the live instructor and we have two kinds of schedule that is coming up we have one weekend class which is which is every Saturday on Sunday at 8 p.

M. ist and for him anyone who's joining us from from US or Canada it is going to be 10 to 80 a.m. in the morning which would be Eastern Standard Time and the complete session is of three hours in total every Saturday and Sunday and it goes on for seven to eight weekends with the instructor and if you want to rather do it on the weekdays I have an upcoming we did class which would start in from this from the this coming Monday it would be Monday to Thirsty's from 9:30 p.m. est and for our learners from back in India it would be Tuesday to Friday in the morning from 7:00 a.m. ist and this training session would be of tours in total and the complete training cost as you can see on my website is has already been brought down from 22800 to 19,000 8:30 due to the independent state campaign that we are running back in India we would have a flag 30% discounts for the next two days on this complete training so if you would like to have some more information about the pricing and and if you want to avail the discount you can please come in on a live chat and just help us with your with your details with your name and cell phone number and with the best time to talk and we can my consultants can reach out to you and help you with with further details and this offer is exclusively for the next two days and if you're going ahead and paying in u.

S. dollars for our US or Canada based learners the complete training again is three hundred and forty-eight US dollars which after you can go ahead and avail of lack 30 percent discount on the same and you can go ahead and put in your details over a life chat and someone would go ahead and assess you but with the other options and if you are taking the instructor-led training one additional thing that you would get is you would have our self-paced training given absolutely free of cost there's no difference as such you would have the self-paced training which already has covered the recorded sessions which would be an actual life class recording that you can undergo an aura on a complete training package you would have a lifetime access you would have a lifetime support and this all comes up with a lifetime free upgradation so once you sign up with us you can still come back even at a later date point in time and as and when there is a version change or there is an operation happens you can come back and request for a life trust again and again depending on your requirements and you can keep yourself updated so go ahead and sign up for the program as soon as possible and I'm sure that this is going to be a benchmark in your career where we're in it would definitely help you for a real good transition into the field of data science and I'm pretty much sure that once this training goes and really well there would be other options that you would be very much excited to go for will let it be in the field of you know integrating these skill sets in the field of tableau.

Or artificial intelligence or SAS a machine learning so go ahead and come in on a live chat and someone from a team definitely go ahead and help your so this is jazz Benjamin I'm from the enterprise team and if you need any further information you can you can just drop in on our website or give us a call back at the toll free number or at our Indian number that you're looking at our website so thank you so very much for your time and you have a wonderful day just.