Experiment 1: APPLICATION OF STATISTICAL CONCEPTS IN THE DETERMINATION OF WEIGHT VARIATION IN SAMPLES LEE, Hyun Sik Chem 26. 1 WFV/WFQR1 ------------------------------------------------- Nov. 23, 2012 A skillful researcher aims to end his study with a precise and accurate result. Precision refers to the closeness of the values when some quantity is measured several times; while accuracy refers to the closeness of the values to the true value. The tool he utilizes to prevent errors in precision and accuracy is called statistics.
In order to become familiar to this tactic, the experiment aims to help the researchers become used to the concepts of statistical analysis by accurately measuring the weights of ten (10) Philippine 25-centavo coins using the analytical balance, via the “weighing by difference” method. Then, the obtained data divided into two groups and are manipulated to give statistical significance, by performing the Dixon’s Q-test, and solving for the mean, standard deviation, relative standard deviation, range, relative range, and confidence limit—all at 95% confidence level.
Finally, the results are analyzed between the two data sets in order to determine the reliability and use of each statistical function. RESULTS AND DISCUSSION This simple experiment only involved the weighing of ten 25-centavo coins that are circulating at the time of the experiment. In order to practice calculating for and validating accuracy and precision of the results, the coins were chosen randomly and without any restrictions. This would give a random set of data which would be useful, as a statistical data is best given in a case with multiple random samples.
Following the directions in the Analytical Chemistry Laboratory Manual, the coins were placed on a watch glass, using forceps to ensure stability. Each was weighed according to the “weighing by difference” method. The weighing by difference method is used when a series of samples of similar size are weighed altogether, and is recommended when the sample needed should be protected from unnecessary atmosphere exposure, such as in the case of hygroscopic materials. Also, it is used to minimize the chance of having a systematic error, which is a constant error applied to the true weight of the object by some problems with the weighing equipment.
The technique is performed with a container with the sample, in this experiment a watch glass with the coins, and a tared balance, in this case an analytical balance. The procedure is simple: place the watch glass and the coins inside the analytical balance, press ON TARE to re-zero the display, take the watch glass out, remove a coin, then put the remaining coins back in along with the watch glass. Then, the balance should give a negative reading, which is subtracted from the original 0. 0000g (TARED) to give the weight of the last coin. The procedure is repeated until the weights of all the coins are measured and recorded.
The weights of the coins are presented in table 1, as these raw data are vital in presenting the results of this experiment. Table 1. Weights of 25-centavo coins measured using the “weighing by difference” method| Sample No. | Weight, g| 1| 3. 6072| Data Set 2| Data Set 1| 2| 3. 7549| | | 3| 3. 6002| | | 4| 3. 5881| | | 5| 3. 5944| | | 6| 3. 5574| | | 7| 3. 5669| | 8| 3. 5919| | 9| 3. 5759| | 10| 3. 6485| | Note that the data are classified into two groups, Data Set 1 which includes samples numbered 1~6 and Data Set 2 which includes samples numbered 1~10.
Since the number of samples is limited to 10, the Dixon’s Q-test was performed at 95% confidence level in order to look for outliers in each data set. The decision to use the Q-test despite the fact that there were only a few, limited number of samples and to use the confidence level of 95% was carried out as specified in the Laboratory Manual. Significance of Q-test The Dixon’s Q-test aims to identify and reject outliers, values that are unusually high or low and thus differ considerably from the majority and thus may be omitted from the calculations and usages in the body of data.
The Dixon’s Q-test should be performed, since a value that is extreme compared to the rest can bring inaccurate results that go against the estimated limits set by other calculations and thus affect the conclusion. This test allows us to examine if one (and only one) observation from a small set of replicate observations (typically 3 to 10) can be "legitimately" rejected or not. The outlier is classified objectively, by calculating for the suspected outlier, Qexperimental, Qexp, and comparing it with the tabulated Qtab. Qexp is determined by Qexp equation (1). Qexp=Xq-XnR (1)
Where Xq is the suspected value, Xn is the value closest to Xq, and R is the range, which is given by the highest data value subtracted by the lowest data value. R=Xhighest-Xlowest (2) If the obtained Qexp is found to be greater than Qtab, the outlier can be rejected. In the experiment, the sample calculation for Data Set 1 is given below: Qexp=Xq-XnR=3. 7549-3. 60723. 7549-3. 5574=0. 14770. 1975=0. 74785 Since Qtab for the experiment is set as 0. 625 for 6 samples at 95% confidence level, Qexp>Qtab. Thus, the suspected value 3. 7549 is rejected in the calculations for Data Set 1.
The same process was done for the lowest value of Data Set 1 and the values for Data Set 2, and the values were accepted and will be used for further calculations. This is shown in table 2. (Refer to Appendix for full calculations. ) Table 2. Results of Dixon’s Q-Test| Data Set| Suspect Values| Qtab| Qexp| Conclusion| 1| 3. 7549| 0. 625| 0. 74785| Rejected| | 3. 5574| 0. 625| 0. 15544| Accepted| 2| 3. 7549| 0. 466| 0. 53873| Accepted| | 3. 5574| 0. 466| 0. 048101| Accepted| The statistical values were then computed for the two data sets, and were compared to relate the significance of each form of statistical functions.
The values required to be calculated are the following: mean, standard deviation, relative standard deviation (in ppt), range, relative range (in ppt), and confidence limits (at 95% confidence level). Significance of the mean and standard deviation The mean is used to locate the center of distribution in a set of values . By calculating for the average value of the data set, it can be determined whether the set of data obtained is close to each other or is close to the theoretical value. Thus, both accuracy and precision may be determined with the mean, coupled with other statistical references.
In the experiment, the mean was calculated using equation (3). The sample calculation used the data from Data Set 1, which had 5 samples after the outlier was rejected via the Q-test. X=i=1nXi=X1+X2+X3…+Xnn 3 =(3. 6072+3. 6002+3. 5881+3. 5944+3. 5574)5=3. 5895 Mean is represented by X, the data values by X, and the number of samples by n. It can be observed that the mean indeed shows the precision of the accumulated values, as all the values are close to each other and the mean. The standard deviation, on the other hand, is a relative measure of precision of the values.
It shows how much the values spread out from the mean. A smaller standard deviation would show that the values are relatively closer to the mean, and a bigger one would show that the values are spread out more. This does not determine the validity of the experimented values. Instead, it is used to calculate further statistical measures to validate the data. The equation (4) was used to calculate the standard deviation, where s represents standard deviation, and the rest are known from the mean. The data set used is the same as the mean. s=1n-1i=1nXi-X2 4 =15-1[3. 072-3. 58952+3. 6002-3. 58952+3. 5881-3. 58952+3. 5944-3. 58952+3. 5574-3. 58952] =0. 019262 Mean and standard deviations by themselves are relatively poor indicators of the accuracy and precision of the data. These are manipulated to give clearer views on the data. One of the measures of precision is the relative standard deviation. RSD=sX? 1000ppt (5) =0. 0192623. 5895? 1000=5. 3664 The relative standard deviation is a useful way of determining the precision of the data compared to other sets of data, as the ratio would be a good way of differentiating the two.
This will be expounded further. Range is easily found with equation (2) to give the value of 0. 0498, taking note that the highest value was rejected via the Q-test. R=3. 6072-3. 5574=0. 0498 The relative range is also a way of comparing sets of data, just like the relative standard deviation. Again, it will be discussed when comparing the values from data sets 1 and 2. RR=RX? 1000ppt (6) =0. 04983. 5895? 1000=13. 874 Significance of the confidence interval The confidence interval is used to give the range at which a given estimate may be deemed reliable.
It gives the interval in which the population mean is to be included in. The boundaries of the interval are called confidence limits, and are calculated by equation (7). Confidence limit=X±tsn 7 =3. 5895±2. 780. 0192625 =3. 5895±0. 023948 Using the confidence limit and the interval, one can easily determine the value that can be estimated if the same experiment was performed. The confidence limit shows that there is a 95% confidence that the actual mean lies between the values of 3. 5656 and 3. 6134. Difference between Data Set 1 and Data Set 2
The statistical values computed from the two data sets are arranged below in table 3. Table 3. Reported values for data sets 1 and 2| Data Set| Mean| Standard Deviation| Relative SD| Range| Relative Range| Confidence Limts| 1| 3. 5895| 0. 019262| 5. 3664| 0. 0498| 13. 874| 3. 5895±0. 023948| 2| 3. 6085| 0. 057153| 15. 838| 0. 1975| 54. 731| 3. 6085±0. 040846| The two data differ in all the components, but what’s important are the relative standard deviations and the relative range. The standard deviation and the relative range, along with the confidence limits went up from data set 1 to 2.
This shows that the data became less precise as more values were added, which is normal since one cannot always expect perfect results from every trials. The relative values all show the precision of the data from each other—the lower the number, the more precise they are. However, since the number of elements increased as the relative values increased as well, we can say that data set 1 is more precise but it isn’t accurate, since the sample population is quite limited. Statistical values have been computed and analyzed so that when further, more difficult research arises, the researchers will be able to accomplish them without problems.
These values are significant in determining the accuracy of the experiment. For example in this experiment, the actual weight of 25 centavo coins is found to be 3. 6g for brass plated steel coins minted from 2004. It can be deduced that the majority of the coins used are indeed from that value, and that the mean became more accurate to the true value as more samples were used. REFERENCES Silberberg, M. S. (2010). Principles of general chemistry (2nd ed. ). New York, NY: McGraw-Hill Jeffery, G. H. , Bassett, J. , Mendham, J. , & Denney, R. C. (1989).
Vogel’s textbook of quantitative chemical analysis (5th ed. ). Great Britain: Bath Press, Avon http://www. bsp. gov. ph/bspnotes/banknotes_coin. asp. Accessed Nov. 21, 2012. Appendix Working Calculations Q-test Data Set 1 (Highest) Qexp=|3. 7531-3. 6921|0. 1920=0. 3177 0. 3177<0. 625 (accepted) Data Set 1 (Lowest) Qexp=|3. 5611-3. 6104|0. 1920=0. 2568 0. 2568<0. 625 (accepted) Data Set 2 (Highest) Qexp=|3. 7531-3. 6921|0. 1938=0. 3148 0. 3148<0. 466 (accepted) Data Set 2 (Lowest) Qexp=|3. 5593-3. 5611|0. 1938=0. 009288 0. 009288<0. 466 (accepted)
Mean Data Set 1 X= 3. 6427+3. 5611+3. 6206+3. 6104+3. 6921+3. 75316=3. 6467 Data Set 2 X=3. 6427+3. 5611+3. 6206+3. 6104+3. 6921+3. 7531+3. 5732+3. 5593+3. 6095+3. 568710 =3. 6191 Standard Deviation Data Set 1 s= 3. 6427-3. 64672+3. 5611-3. 64672+3. 6206-3. 64672+3. 6104-3. 64672+3. 6921-3. 64672+3. 7531-3. 646725 =0. 06742 Data Set 2 s= 3. 6427-3. 61912+3. 5611-3. 61912+3. 6206-3. 61912+3. 6104-3. 61912+3. 6921-3. 61912+(3. 7531-3. 6191)2+(3. 5732-3. 6191)2+(3. 5593-3. 6191)2+(3. 6095-3. 6191)2+(3. 5687-3. 6191)29 =0. 06289 Relative Standard Deviation
Data Set 1 RSD= 0. 067423. 6467? 1000= 18. 49 ppt Data Set 2 RSD= 0. 062893. 6191? 1000=17. 38 ppt Range Data Set 1 R=3. 7531-3. 5611= 0. 1920 Data Set 2 R=3. 7531-3. 5593= 0. 1938 Relative Range Data Set 1 RR= 0. 19203. 6467? 1000=52. 65 ppt Data Set 2 RR= 0. 19383. 6191? 1000=53. 55 ppt Confidence Limit Data Set 1 confidence limit=3. 6467± (2. 57)(0. 06742)6=3. 6467 ±0. 07074 3. 6467-0. 07074=3. 5760 3. 6467+0. 07074=3. 7174 Data Set 2 confidence limit=3. 6191± (2. 26)(0. 06289)10=3. 6191 ±0. 04495 3. 6191-0. 04495=3. 5742 3. 6191+0. 04495=3. 6641