Post Your Answer

Options to analyse non-normal distribution statistically

3 years ago in Statistics By Ashwin Goel

Statistics

Hello experts, I am yet confused about how to proceed with data evaluation in my research framework. However, I am interested in analysing my histogram in which data is distributed in a non-normal way. I have taken a range of 0-100 and the tests I performed are above 150. In these tests, most of the data (about 114) have fallen between the range of 30-60. All the other remaining tests have become outliers now.

So, I have a few questions that I need answer for:

How can I get an accurate (or near accurate) average from this type of data distribution?
Should I calculate the mean by including the outliers or not?
How can I examine the effect and cause relationship between these outliers and the mean calculated?

I want to analyse it statistically but I am not well-versed in the various approaches so I do not know which method is appropriate in this case. It will be really helpful if any of you can help me in deciding which statistical tests to use and which of the metrics (as in mean, standard deviation, significant value, or p-value) need to be evaluated.

If you can provide references, then also it is great.

Looking forward to the answers.

All Answers (2 Answers In All)

By David Answered 3 years ago

Hi dear, Why don’t you try going with Robust statistics to get the calculations right? According to me, I can help you solve your data systematically. You can use the robust concepts of mean and standard deviation both. It will get you the suitable answer.

By Mehar Mehta Answered 3 years ago

Hello Ashwin, before answering I would very much like to ask if you can share more about your research question and what is your study design as it will be helpful to answer your question. Also, you said that you have taken about 150 test cases, are these by chance a sample? If they are a sample, then I would suggest you check whether you chose the right selection method or may as well select a different sample collection which can provide you a steadier distribution of data. If not these, then go with the research hypothesis that you have selected. Based on that you can decide whether you need to get another sample or not. However, if these test cases are the only cases you are interested in, then there is no need for statistical testing as you can do without the significance value, p-value, or such measures. You also need to check the reason for the test cases being the outliers. Whether there is a mistake or a reason for being less in number. So, please check that once too. Now, to answer your questions one by one: As far as I understood, median should be the appropriate approach for you to follow as most of your test cases are ranging from 30-60, making it skewed in the middle score data. You can get more information about median and interquartile range from the following links: https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_summarizingdata/BS704_SummarizingData5.html#headingtaglink_4 https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_summarizingdata/BS704_SummarizingData7.html#headingtaglink_1 Calculating mean with or without outliers depends on your purpose. That is why I asked about your research question. Do you have to find the correlation between outliers and mean or not? Calculate the mean and test its accuracy. (This depends on the second answer again.) If you can just compare your answer with the research hypothesis, then I don't think that you need to compare it with the mean as well. I hope I have answered your questions in a way that you understood.

Replied 3 years ago

By Ashwin Goel

First of all thank you so much for your response. To answer your questions: Yes, the test cases considered in this research are the only cases I am interested in. In practical terms, I can justify that the outliers (cases below 30 and above 60) are important and must be included in the research. The issue I am facing is that my supervisor wants me to explain it better. The not normal distribution is leading me to a not normal average and that is why, I am confused as to which statistical approach should be used. The research hypothesis is to either prove that the gap between these test cases is insignificant or it is not. I hope I have answered your questions now. Also, I will try to calculate it by the median approach and will see if it helps. Thank you again.

View Related Questions