RSS icon

Top Stories

Big Data

Big Data

Be Careful Interpreting Correlated Data

March 11, 2013
Related Topics: Strategy and Management
Reprints

Any time you walk into a doctor’s office, whether it’s a hospital or a bush clinic in a tent, someone takes your pulse and temperature. This sort of basic triage is fundamental to understanding your data.

This sort of fundamental information on dashboards and scorecards gives you what we would define as descriptive statistics. They are typically activity-based and basic trend data and are a good check on what is happening with your business processes and allow you to see this basic data in an easy-to-read and understand format. In a quickly changing data landscape, descriptive statistics keep people informed when there isn’t time for a lengthier analysis. Descriptive statistics also aid analysts in becoming familiar with the data sets in a way that helps them detect patterns and begin asking tougher questions of the data.

It is possible to combine data from various dashboard sources throughout the company to get a deeper insight and analysis. However, take caution when making decisions solely based on this data. In almost all cases, the data presented on dashboards is correlated information (meaning the data is related), but that does not necessarily imply causation.

For example, the following are highly correlated variables:

• The days when one brings an umbrella and the days it rains.

• The weight of elementary school children and their math scores on standardized tests.

• The number of firefighters called to a fire and the amount of damage caused by the fire.

• The height of an elementary school student and his or her reading level.

• Childhood obesity rates and safety warnings on playground equipment.

These are all similar to the correlation we shared last week — ice cream sales and deaths due to drowning. Both increase at the same rate and time due to the weather getting warmer. It is difficult to make decisions or show proof of impact based on correlations alone. Descriptive statistics are a great place to start, but I encourage you to dive deeper into your analysis and make your big data work for you!

In further posts we will explore how to begin to move from relying exclusively on dashboards to more sophisticated data mining techniques and moving from correlations to causation — understanding the casual link between your human capital investments and business outcomes.

Please leave your thoughts on correlations below in the comment section. Join me next week as we discuss moving past correlations and discovering causation in your big data.

Comments powered by Disqus

Hr Jobs