Monday, September 21, 2015

Anallyz Self Service BI and Analytics - Intro Analytics

Anallyz Self Service BI and Analytics - Intro Analytics
Out of several definitions, which can be used for Analytics, the following should suffice for most of the cases. "ANALYTICS means set of techniques, which gives STRUCTURE to large amount of INFORMATION for actionable INSIGHTS." Structure by Charts, Graphs, Aggregations, Groupings, Inferences, Combinations, Trends etc. Insights by showing What happened, Why it happened, What can happen and What should be done for that.
 Ingredients of Analytics
We will focus on STATISTCS and SMART SOFTWARE (Machine Learning), and mention other ingredients wherever appropriate.
Analytics relies heavily on Classical Statistics and Smart Software, sometimes used independently and sometimes in a combined fashion.
 STATISTICS can be classified into Descriptive and Inferential Statistics. Descriptive Statistics presents summarized view of the as-it-is data, which is easier to use and interpret. Descriptive can be further classified into Graphical and Numerical Statistics. Inferential Statistics mainly refers to techniques, by which we deduce a measure about data(population), based on a part of it(sample) drawn from it. Before we proceed, let us arm ourselves with basic understanding of some concepts.
 Population: Population consists of all the items of interest, which can be possibly used for Analytics. Example could be all adults eligible for voting in a country,all cars plying on the road in a city or all gears manufactured at a plant.
 Sample: We could possibly use all items in a population, however we seldom do that due to huge cost and time involved with it. What we normally do is to take a part of the Population. But we refrain from picking people from the same city or gears manufactured in a single week. We would pick every nth person(Simple Random Sample) from a city, from different age groups(Stratified Sample) in a city and do the same for multiple Cities(Cluster Sample).

Variable is a characteristic of the Population or the Sample. As an example Vote preferences or Weights of people could be a variable. As the name suggests, it can take several values.
 Data is the measured or observed value of a variable. When the data can take uncountable number of values, it is called Continuous Data (Interval, Quantitative Data). When the data can take only finite, countable number of values, it is called Categorical (Discrete, Nominal or Qualitative Data). When we measure weight of a person in absolute value, say lbs. it is Continuous. But when we put him in any of the three categories (Obese, Normal, Under-weight), it is Categorical.
Observe the alignment of Analytics Objectives and Techniques below. We are going to learn Descriptive and Inferential Stastics as foundation for more advanced concepts.