Overview

Learn about the common types of data, chart types typically used for data visualisation, and real-world applications of mathematical concepts such as seasonal adjustment.  

How to Use Data Correctly

Using Statistics Correctly: Can Numbers Lie?

Correlation and Causation

Correlation is a statistical measure that indicates the extent to which the value of two or more variables move in relation to each other. Positively correlated variables tend to move in the same direction, while negatively correlated variables tend to move in opposite directions with one another. However, it may not necessarily be the case that the change in one variable causes the change in the other. On the other hand, causation means that the change in one variable causes the other variable to change.

The figure below illustrates the difference between correlation and causation. Hot sunny weather would cause an ice-cream to melt and cause sunburn (with prolonged sun exposure). Melting ice-cream and getting a sunburn are correlated, where they tend to occur together in the hot sunny weather. If the presence of the hot sunny weather was ignored, it would be wrongly concluded that melting ice-cream causes sunburn!

Correlation and Causation

Misleading Visualisations
Simpson’s Paradox

Beware of results from small sample sizes, or polls

When testing out a hypothesis, it may not always be possible to collect data for the entire population due to logistical or financial reasons (e.g., research budget). Hence, an option for researchers would be to use a smaller group, which is known as a sample.

Population vs Sample

Population Vs Sample

However, small sample sizes could affect the reliability of the results. One reason is because small sample sizes decrease the statistical power of a study, which means that there is a lower likelihood of detecting a true effect that exists in the entire group, via the study. Another reason could be that the sample is not representative of the population, like online polls, where only people who feel strongly about a subject would respond to the polls. This means the results are skewed towards this group of people, when the majority could be neutral about the subject. As such, robust statistical reporting or research typically requires a large enough sample size. To circumvent non-representativeness, one way is to conduct simple random sampling, where samples are chosen strictly by chance, so that all members of the population have the same chance of being selected for the study.

Concluding Remarks

As statistics is a broad field, the content above serves as a brief and simple introduction to the different types of data, ways to analyse data, and the common pitfalls of using statistics. With this new-found knowledge, enjoy exploring and working with data to gain useful insights.