Accessing Variables in a Data Frame
So far, we have been using the dplyr package to analyze our data. This package makes it easy to refer to variables in a dataset by the variable name alone. But, R is a collection of many functions written by many different people.A lotoftheolderfunctionsdonotusethesameparadigmas dplyr for specifying dataset variable names. To use these older functions, we’ll need another way to tell R whichvariable in which dataset we want to use.
Onecommonwaytoaccessvariablesina data.frameis withthe $operator.To use this operator, you specify the dataset name, followed by the $ operator, followed by the specific variable name in that dataset that you want to access. For example, to access the spine variable in the crabs dataset, you would use:
crabs$spine
This way of accessing variables is not necessary for dplyr functions, like count() and summa-rize(), but is necessary for older R functions, like those introduced below. The homework guides will letyouknow whenyouneedtousethe$operator style of accessing variablesfor new functions.
Checking Normality
The Q–Q plot allows us to assess to what extent a sample can be well approximated by a Normal distribution. The qqnorm()functionasks Rto create a Q–Q plotforagivenvariable.Simply provide the dataset and variable name to qqnorm(), using the $ operator style. For example, to create a Q–Q plot for the heartrates(pulse) in the nhanesdataset, you can run:
qqnorm(nhanes$pulse)
Since you’ll be looking to see whether the points in this graph fall in a straight line, it can be helpful to ask R to add a straight line to the graph for comparison. You can do this with the qqline() func-tion, using the same variable you gave toqqnorm(), likeso:
qqline(nhanes$pulse)
Test of a Mean
When conducting a statistical test for a mean, you could compute the test statistic and p-value manually, but R also provides a convenient function that will conduct this test for you. The function is called t.test(). This function will actually run several different types of tests that use the t distri-bution, buthere, we’ll only look atthe usage for the one-group test we have covered in class so far.
To run atestofamean with t.test(), yousupplythevariable withyour sample data (using the $ operator style) and the parameter for the null hypothesis as the named argument mu. For example, to test whetherthe mean weight of our horseshoe crab population is 2500 grams, we could execute:
t.test(crabs$weight, mu = 2500)
This returns the following output:
One Sample t-test
data: crabs$weight
t = -1.4317, df = 172, p-value = 0.154
alternative hypothesis: true mean is not equal to 2500 95 percent confidence interval:
2350.597 2523.784 sample estimates: mean of x
2437.191
The function tells us that the test statistic is -1.43, that the sampling distribution for this test statistic has172 degreesoffreedom, andthatthep-value for this test is .154.Bydefault,t.test()conducts a two-sided test. To conduct a one-sided test,you can specifywhichside ("less"or"greater") inthe alternativenamed argument, likethis:
t.test(crabs$weight, mu = 2500, alternative = "less")
Finally, notice that t.test() also gave us a 95 percent confidence interval for the population mean:(2350.6,2523.8).Ifwe want a different confidence interval,we can specifythe confidence level using the conf.levelnamed argument. For example, for a 90 percent confidence interval:
t.test(crabs$weight, mu = 2500, conf.level = .90)
The output from this version tells us that the90 percent confidence interval is (2364.6, 2509.7).
Scientific Notation
As you have been working with pnorm(), you may have run across some funny looking results. For example, if you try to run the following:
pnorm(-4, 0, 1)
R will reply with 3.167124e-05. The “e” in a number is R’s way of writing scientific notation. This number is actually 3.167124 ∙ 10!" = 0.00003167124.
This Question Hasn’t Been Answered Yet! Do You Want an Accurate, Detailed, and Original Model Answer for This Question?
Copyright © 2012 - 2024 Apaxresearchers - All Rights Reserved.