So, I’m a wannabe data alchemist(more popularly known as data scientist). An alchemist is a person who tries to transform metal substances into gold. The only difference in my case is my metal is data and my gold is the jaw-dropping insight I want the data to confess!
So, if you are still reading this, you are probably one of this community and you might know that statistics holds a major part in our skillset. In this article, I’m going to pen my understanding of measurements and it’s level using a story. Of course, it has to be a story, data scientists are superb storytellers after all!
Meet my friend, Bob. He is a really creative and imaginative student pursuing a career in statistics. One day, after college he decided to visit a pizza place for lunch. There were a lot of customers in the place and Bob had to wait to place his order.
As imaginative and in-his-own-world boy Bob is, he was glancing at the menu. He entered his own Stats World. He noticed that the categories of pizza were nominal data. Nominal data are often called as categorical data because they simply put the given data into defined categories. For example, pizza can have veg or non-veg toppings. Nominal data don’t indicate order, there is no way we can depict the level of ‘pizza-ness’ just by the toppings used. Best other example is the gender of a person i.e. male, female or transgender
Bob then realizes that the sizes of the pizza were of an ordinal type. As the name suggests, ordinal data clearly indicates a meaningful order. Bob could see Small, Medium and Large pizza sizes on the menu, clearly indicating the order of hierarchy. However, ordinal variables don’t imply that the difference between two sets of values(intervals) are equal. It means we can’t say if the difference between small and medium-sized pizza is equal to the difference between medium and large sized pizza.
Bob’s daydreaming was interrupted by the attendant as he asked for Bob’s order. Bob ordered his favorite Cheese burst chicken barbeque pizza(Ah, I’m drooling!) and was gazing around in the pizza place.
In the table in front of him, there was a family enjoying their pizza, a father and mother with their infant in the walker. The infant had so tiny feet, Bob was sure his shoe size was definitely zero! Then he realized shoe size is an interval variable. Eureka! An interval variable has a defined interval between values but lacks a zero point. Consider shoe sizes, we can say that the difference in shoe size 8 and shoe size 7 is equal to the difference in sizes 2 and 3. But it doesn’t mean that size 6 is 2 times size 3. And when we say a shoe size of zero it doesn’t mean an absence of a shoe. But it instead indicates a shoe size.i.e its an arbitrary zero point.
Bob’s cheesy hot pizza had arrived by now and his thoughts were now focused just on the pizza. Bob devoured the pizza and after he was done, the pizza place was really calm. Surprisingly all the customers were gone including the family in front of him. There were just the attendants and him.
Bob soon left the place. His mind went back to the Stats World and he concluded that the number of customers in a pizza place is of a ratio scale. A ratio scale is interval scale’s big brother. It has definite intervals and also holds a true zero point value. That means at the time Bob left the pizza place there were zero customers i.e. true absence of customers. While if there were 20 customers, it literally meant twenty times the number of customers right now.End of Story!
I hope our boy Bob has helped you understand levels of measurement in his own way. I tried to give you the simplest explanation of levels of measurement using a story. But why learn about levels of measurement?
Having a knowledge about the level of measurements helps us to interpret the data from that variable. For eg. for the data variable color, you would encode red as 0, blue as 1 and green as 2. Encoding categorical data into numbers is preferred because computers tend to interpret numbers more easily than alphabets. Also, we can minimize the chances of error while data entry. For Example, the person entering the data would write ‘red’ like ‘Red’. ‘R’ and ‘r’ mean differently for the computer and hence, can affect our analysis. Also if we know that the data is nominal, we would never average it. Why?
For understanding better, consider the above encoding for music genres. Using the above encoding if we asked six people to pick a color and we get the hypothetical data as shown in the below picture.
It says that the average feedback of our survey is 4.It is clearly misleading. Hence, It is necessary to know the level of measurement of the variable at hand before analyzing it. That’s all folks!
Phew! I finally wrote my first blog on Medium. Do leave a response, peeps!