Analysis
Data analytics is the art of turning data into relevant knowledge and insights. That is, comparing or aggregating the raw information to understand what the data tells us. The analysis almost always involves the search for patterns and their deviations, in order to better understand relationships, behaviors or connections about the researched topic. It tends to be one of the most interesting moments for anyone doing research with data.
Data analysis can be divided into 4 types:
Descriptive analysis: consists of describing the main characteristics of a dataset, listing and summarizing values, sometimes from just one variable. Here, I will apply operations like mean, median, mode, minimum, maximum, percentage and frequency. If, for example, the variable is a grade from 0 to 5, then descriptive analysis could show the total and percentage of each grade, the overall mean or standard deviation.
Exploratory analysis: corresponds to the analysis of correlation between variables, using techniques such as regressions and analysis of variance. A good example of this type of analysis is the correlation between a country's GDP and its average life expectancy. It is an approach more focused on discovering new relationships or facts that were previously unknown and often makes use of graphics and data visualization tools for this exploration.
Predictive analytics: In this step, knowledge extracted from the previous step in the historical data series is used to try to make predictions about future events. Imagine a database of places frequented by a particular person. If she has a more or less regular routine (such as going out every morning to work in an office), then it would be possible to use this data to make very credible predictions about where she will be next Monday at 9 am, for example.
Prescriptive analysis: at this level, based on the accumulation of previous analyses, the objective is to generate actions or suggestions, automatically or semi-automatically. This is the case, for example, of systems that release credit to users tailored to their payment history.
Some terms currently in use that act as hype to describe data analysis, knowledge extraction from data, forecasting, and databased decision making include:
- Big Data
- Data Science
- Machine Learning
- Artificial Intelligence
For any of these phases of data analysis it is important to use appropriate software (such as SPSS, or SAS), hardware (GPU may be necessary) or appropriate programming language (R, Python) with associated libraries. For databases, tools such as Excel, SPSS or SQL data management systems can be used.