What is Statistics?


To skip to a brief description of statistics courses at Southern State Community College click here: here.

Statistics is a word with two meanings. Most people are aware of the mundane definition of statistics as a collection of data, such as baseball statistics or statistics the government collects during a census. There is little awareness of the more important, broader definition of statistics as a branch of academics--some would say a branch of mathematics--and so to much of society the act of doing statistics is merely the collection and presentation of data for informational or persuasive purposes.

The larger definition of statistics is a discipline concerned with the analysis of data and decision making based upon data. It can also be used to spot trends or isolate causes. Statistics is based upon a solid edifice of mathematical theorems proven through unassailable laws of logic. In theory, statistics works every time. We shall discuss the inherent problems with statistics in due course, because as many people know, statistics can be misleading.

Statistical analysis and decisions are based upon the notions of probability, the study of measuring how chance affects certain events or outcomes. One of the simplest probability problems is flipping a coin. The probability of getting "heads" is one-half, or fifty percent. This is an example of a theoretical probability, and all theoretical probabilities are concerned with finding ways to count all possible outcomes to some hypothetical experiment. Some thought was given to these types of problems in the sixteenth century but modern theories of probability can be said to date to a correspondence between Pierre de Fermat and Blaise Pascal in 1654.

The next major breakthrough came around 1809 when Karl Friederich Gauss derived the normal distribution, known popularly as the bell curve. This bell curve is a signature of randomness when most of the data is symmetrically clustered about a center and less and less data occurs as you move to an extreme on either side. This construction can be used to measure probabilities of data occurring within or outside specific ranges. For example, the clinical definition of genius based upon IQ scores are those people in the top one percent of IQ. On the bell curve, an IQ of 100 is average, at the center, while the cutoff for the top one percent lies at an IQ of 135 or higher.

Statistics as we know it and use it was developed beginning around 1893 and continues to be refined to this day. It is a very large subject area mostly applied in practical ways to data for which there is no theoretical probability. For example, pharmaceutical firms must test new drugs before they can be put on the market. Groups of people are given these drugs and compared to similar groups not taking these drugs. How can we predict an outcome? We can't, so we measure the outcome and determine through methods of statistics--based on probability--whether the results were significant or not.

Two forms of statistics are studied at the elementary, i. e., introductory level. First one studies the measurement of a single type of quantity, known as a single variable. Examples might include measuring the number of defectives produced by a production line, measuring the age of SSCC graduates, or measuring blood pressures of people in certain health risk groups. Results are compared to an established standard, and by using probability, the significance, if any, of the difference between the experimental result and an established standard is measured. Alternatively, results are used to produce a range estimate for the larger group from which the data samples come from. You see this type of statistics in opinion polls with their margins of error.

The other common form of statistics is concerned with comparing how one quantity is affected by another, or whether there is a difference among similar quantities. Analysis of Variance, ANOVA, was developed by seed growers in the 1920's as a way of determining if certain hybrids of crops are significantly different from other hybrids, but ANOVA is used widely these days for all kinds of data. Comparing two variables measurable with numbers, for example, crime rates and educational levels among states, is the purview of linear regression. One can measure how well, if any, one variable is affected by the other, or one can produce a graph which best fits the data and can be used to make predictions or spot trends.

Statistics has been used much more extensively in the business world since the end of WWII. A little known statistician named W. Edwards Deming was sent to Japan to help the Japanese rebuild their country. He taught them the value of statistics in monitoring quality, as well as establishing a set of management principles known as Total Quality Management (TQM). (Several others did similar things for the Japanese by Deming is now the most famous.) Eventually the Japanese became very competitive and renown for their quality products, and Deming's ideas, which always included a role for statistics, have spread throughout this country and others.

Now for an explanation of the stock phrase, "lying with statistics." The most famous quote in statistics is attributed to English Prime Minister Benjamin Disraeli, who in 1871 declared, "There are three kinds of lies: lies, damned lies, and statistics." He was clearly referring to the propensity of some politicians for distorting information by selective distortions of the data. One of the most famous statistical distortions in modern days was the use by Procter & Gamble for years of the marketing line, "Four out of five dentists recommend Crest." The joke this line begs is, "Which five dentists did they talk to?" One year two students in the author's class went to the heart of P & G country in Cincinnati and, by calling fifty dentists, demonstrated that this claim was significantly exaggerated, at least in Cincinnati at the time.

It is true that statistical information can be manipulated in ways that distort the picture. Mistakes in the methods of collection of statistical data can also be subtle but lead to misleading conclusions. A famous recent example is the election night confusion in the presidential election of 2000, and another famous example was the declaration by the media that John Dewey defeated Harry S Truman in 1948. Ever hear of President Dewey?

Here is an important point: statistics does not lie. Aside from people purposely distorting the information by their presentation of data, statistics works every time as billed provided the assumptions underlying the statistical theorems are followed. When it fails it is usually because the sample of data collected is not sufficiently random, but biased in some manner.

Southern State Community College offers two different statistics courses:

Math 160 Statistical Concepts. This course is designed for people who will have a need to read and understand published statistics. This group would include bachelor degree nursing students, pharmacy, psychology and sociology majors. The broader exposure of topics in this course also make it an excellent course for math majors. Math 160 is currently offered at Central Campus during the winter and summer quarters.

Math 281 Introductory Statistics. This course is a traditional introduction to statistics and is a little more computationally based than Math 160. This is the better choice for business, accounting and engineering majors as it gives a better perspective on how statistics is produced, and these people could likely end up conversing with co-workers who use statistics on a daily basis. Math 281 is currently offered at three or four campuses during the spring quarter.

Which statistics you may choose to take depends on the needs of your academic program. Please do not assume that one will substitute for the other. Feel free to discuss any questions about SSCC's statistics courses with a member of the Math Department.

To contact the author by e-mail click on this link: Jon Davidson