When to use mode over mean ?
* Lets take an example (this example is purely for understanding purpose) to answer this question. A
data annotation company has hired 2 data annotators.
* Annotator "A" is very good at understanding english and hence he can annotate complex sentences.
* Annotator "B" is moderately good at understanding english and hence he can annotate relatively less
complicated sentences
* Annotator "A" and "B" are managed by a manager "M".
* Manager "M" has got 2 datasets to be annotated. Hence he wants to understand which annotator is
suitable for which dataset.
* 'M' wants to give the dataset which has longer sentences to Annotator "A" since he is very
good at understanding complex sentences
* "M" wants to give the dataset which has shorter sentences to Annotator "B" since he is
moderately good at understanding english
* Lengths of sentences in datasets and statistics of datasets are as follows:
dataset_1 = [ 10, 20, 20, 20, 50, 120 ]
mean of dataset_1 = 40
mode of dataset_1 = 20
dataset_2 = [ 10, 30, 30, 30, 47, 90 ]
mean of dataset_2 = 39.5
mode of dataset_2 = 25
* From observation it is clearly visible that dataset_1 has shorter sentences compared to dataset_2
* If manager only looks at mean of the dataset then he might end up giving lengthier sentence dataset to
moderately good annotator
* But if he sees mode of the data then he gives shorter sentence dataset to moderately good annotator.
Note: Mean is easily disturbed by outliers hence it is better to check mode in the above case
Note: When we are dealing skewed data it is always better to use median over mean. Similarly it is good to use IQR Inter Quartile range over stnadard deviation for the skewed data. Why? As we know presence of outlier will impact the mean, similarly it will also impact the SD.