Suppose training data are oversampled in the event group to make the number of events and non-events roughly equal. A logistic regression is run and the probabilities are output to a data set NEW and given the variable name PE. A decision rule considered is, "Classify data as an event if probability is greater than 0.5." Also the data set NEW contains a variable TG that indicates whether there is an event (1=Event, 0= No event).
The following SAS program was used.
What does this program calculate?
A. Depth
B. Sensitivity
C. Specificity
D. Positive predictive value
Refer to the exhibit:
The plots represent two models, A and B, being fit to the same two data sets, training and validation.
Model A is 90.5% accurate at distinguishing blue from red on the training data and 75.5% accurate at doing the same on validation data. Model B is 83% accurate at distinguishing blue from red on the training data and 78.3% accurate at
doing the same on the validation data.
Which of the two models should be selected and why?
A. Model A. It is more complex with a higher accuracy than model B on training data.
B. Model A. It performs better on the boundary for the training data.
C. Model B. It is more complex with a higher accuracy than model A on validation data.
D. Model B. It is simpler with a higher accuracy than model A on validation data.
In order to perform honest assessment on a predictive model, what is an acceptable division between training, validation, and testing data?
A. Training: 50% Validation: 0% Testing: 50%
B. Training: 100% Validation: 0% Testing: 0%
C. Training: 0% Validation: 100% Testing: 0%
D. Training: 50% Validation: 50% Testing: 0%
Including redundant input variables in a regression model can:
A. Stabilize parameter estimates and increase the risk of overfitting.
B. Destabilize parameter estimates and increase the risk of overfitting.
C. Stabilize parameter estimates and decrease the risk of overfitting.
D. Destabilize parameter estimates and decrease the risk of overfitting.
The selection criterion used in the forward selection method in the REG procedure is:
A. Adjusted R-Square
B. SLE
C. Mallows' Cp
D. AIC
Refer to the exhibit:
SAS output from the RSQUARE selection method, within the REG procedure, is shown. The top two models in each subset are given.
Based on the AIC statistic, which model is the champion model?
A. Age Weight RunTime RunPulse MaxPulse
B. Age Weight RunTime RunPulse RestPulse MaxPulse
C. RestPulse
D. RunTime
A marketing analyst assessed the effect of web page design (A, B, or C) on customers' intent to purchase an expensive product. The focus group was divided randomly into three sub-groups, each of which was asked to view one of the web pages and then give their intent to purchase on a scale from 0 to 100. The analyst also asked the customers to give their income, which was coded as: I (lowest), II (medium), or III (highest). After analyzing the data, the analyst claimed that there was significant interaction and the webpage design mainly influenced high income people.
Which graph supports the analyst's conclusion?
A. Option A
B. Option B
C. Option C
D. Option D
Refer to the exhibit.
Which conclusion is justified concerning Sales, comparing stores A, B, and C?
A. Store B is significantly different from store A.
B. Store C is significantly different from Store A.
C. Store B is significantly different from store C.
D. There is no significant difference between stores.
Refer to the exhibit.
These graphs were created using the GLM procedure with the plots(only)=diagnostics option.
Which plot do you use to identify influential observations?
A. Cook's D by Observation
B. Residual by Quantile
C. Residual by Predicted
D. Fit - Mean and Residual Plot
Which statistic is based on the maximum vertical distance between the primary event EDF and the secondary event EDF?
A. KS
B. SBC
C. Max EDF
D. Brier Score