Question: Practical Assignment 1a: Kvamme’s Gain Statistic: ARCL0103: Spatial Statistics, Network

04 Nov 2022,9:05 PM

Practical Assignment 1a: Kvamme’s Gain Statistic: ARCL0103: Spatial Statistics, Network

Analysis and Human History

This assessment is relatively short (less than 1000 words, excluding captions, code and bibli-ography) and asks you to calculate Kvamme’s gain statistic for the predictive model that you constructed in in practical 4.

1 Background

You will recall that in practical 4 you used multivariate logistic regression to create a predictive model for a synthetic data set. Visual inspection suggested that the model predictions were quite accurate, but your task for this assessment is to explore the validity of that model further. One way of assessing the validity of a model is via a testing sample that was withheld from the model-building process. The basic idea is to establish how many of the observed sites from the testing sample fall within the area where sites are predicted to be found. For example, if 16 out of 25 observed sites fall in the area where sites are predicted, then the model can be expressed as correctly predicting site location 64% of the time. In reality, however, matters are

not quite so simple, for two main reasons:

Prediction is probabilistic Very few, if any, models predict site occurrence with absolute certainty of presence or absence. Consequently it usually only makes sense to talk about the model correctly predicting site presence at some specied probability, p, between 0.0 and 1.0. Models tend be more accurate at low probabilities and less accurate at high probabilities.

Non-sites matter Often it is possible to specify a probability for the occurrence of sites that is so low that all observed sites do actually fall within the area where sites are predicted, in other words, so that the model is 100% accurate. However, the corollary is usually that a large number of non-sites also fall in the area where sites are predicted, so the model is very inaccurate at predicting the lack of archaeological sites. This would clearly be very undesirable if the purpose of the model was, for example, to identify a route for a new road that minimised the damage to archaeological sites.

Clearly then, it is important to consider the accuracy of a model with reference to the problem at hand. One method that facilitates this is the production of cumulative percent correct prediction curves for both sites and non-sites (Kvamme 1988; Conolly and Lake 2006). Figure 1 shows just such a graph, in which the number of sites falling in areas where they are predicted decreases as the probability of site occurrence increases, while the number of correct non-sites increases as the probability of site occurrence increases. In this case, if it was important to avoid damaging sites then one might choose to avoid areas with even a relatively low probability of site occurrence. However, a further complication which arises at this point is that the relevant area is likely to be so large (since these are cumulative probabilities the area in question includes all locations with a low probability or greater) as to render the prediction virtually worthless. There are at least two solutions to this dilemma. One is to pay attention to the trade-o between correctly predicting site and non-site locations, while another is to examine the predictive gain oered by the model (Kvamme 1988: 329) denes the gain, G, as

G = 1 S (1)

where S is the % of the total area where sites are predicted and O is the % of observed sites within area where they are predicted. G, which is calculated for a specied probability of site occurrence, ranges from 1 (high predictive utility) through 0 (no predictive utility) to -1 (the model predicts the reverse of what it is supposed to).

Kvamme developed this measure for archaeological site prediction, and there are other ap-proaches for the purpose of evaluating binary classication models that are used in other subject areas and occasionally in archaeology (e.g. receiver operating characteristic curves, F1 scores or Matthews correlation coecients). The most important property of Kvamme’s measure is that it can distinguish a correct but relatively worthless model from an ostensibly less correct but more useful one. For example, a model that correctly predicts 80% of sites and predicts site occurrence over 70% of the landscape is probably not very useful, which is reected in the low gain of 0.13. On the other hand, a model that correctly predicts 70% of sites and predicts site occurrence over a mere 5% of the landscape would provide a better basis for many decisions, which is reected in the gain of 0.93. Further suggestions for testing predictive models can be found in (Kvamme 1988).

2 Tasks

The tasks you must complete for this assignment are as follows:

1. Calculate Kvamme’s gain statistic for the model you created in week 4. Calculate the gain statistic for at least 4 relative probability thresholds. This will require that you:

• Use raster map algebra in R (see the end of practical 4 for an example or two) to calculate four new binary maps, each showing where sites are predicted at a given probability threshold and above in the map relprob. This can be achieved using logical map algebra similar to that used to create the dummy variables.

• Use the extract() and hist() functions from the R ‘raster’ package to obtain tabular data that will enable you to calculate the % of the total area where sites are predicted at the relevant probability threshold and above. If you are struggling with this or the previous step then I am happy to oer some hints.

2. Describe how you calculated the gain statistic, including details of any R commands and statistical operations, and report the results. You should provide appropriate maps and other graphical plots to illustrate the results.

3. Briey explain how Kvamme’s gain statistic may help evaluate the utility of a predictive model and comment on your results in the light of this discussion.

Just to emphasise that you are not supposed to build a new model (except as extra bonus work if you have the spare words and inclination) but rather to work with the one you created in practical 4 and saved as ’relprob.asc’. So when you wish to do the practical assignment load the locations data and relprob data into a new R session with the vec() and rast() commands respectively, and then consider how to create a new map from relprob which shows only those site probabilities over 0.9, such as with the following map algebra in R:

prob09 <- relprob >= 0.9

Note also that there are several ways to get the number of cells above 0.9 and the number below from this data, such as by looking at the ’count’ section produced by the following:

hist(prob09, plot=FALSE)

In terms of the number of sites and non-sites falling on 0.9 or above probabilities (or not), you can extract this as follows:

extract(prob09, locations)

The resulting values in the ’slope’ column are 1 where a site has been predicted by the prob09 surface and 0 where it has not. However, remember your ’locations’ data include both sites and not-sites, so you might get a better picture from gluing your observed sites and non-sites column from locations and your extracted predictions as follows:

ObsandPredicted09 <- cbind(locations$value, extract(prob09, locations)$slope) colnames(ObsandPredicted09) <- c("Observed","Predicted")

ObsandPredicted09

A combination of this kind of output and the hist() output from your four binary maps should be enough to complete the practical.

3 Allocation of marks

Marks will be allocated as follows:-

30% for correct completion of task 1;

35% for your answer to 2;

35% for your answer to 3.

4 References

Conolly, J. and M. Lake 2006. Geographical Information Systems in Archaeology, Cambridge: Cambridge University Press.

Kvamme, K.L. 1988. Development and testing of quantitative models, In W.J. Judge and L. Sebastian (eds.) Quantifying the Present and Predicting the Past: Theory, Method, and Application of Archaeological Predictive Modeling: 325-428. Denver: U.S. Department of the Interior, Bureau of Land Management.

Figure 1: Cumulative percent correct predictions for model sites and non-sites for all probabilities of occurrence. Reproduced from Kvamme 1988: g 8.11B

Expert answer

This Question Hasn’t Been Answered Yet! Do You Want an Accurate, Detailed, and Original Model Answer for This Question?

Ask an expert

Stuck Looking For A Model Original Answer To This Or Any Other
Question?
Our skilled experts only need your instructions and deadline to help you produce an original and flawless paper.

Question: Practical Assignment 1a: Kvamme’s Gain Statistic: ARCL0103: Spatial Statistics, Network

Expert answer

Related Questions

What Clients Say About Us

Editing Service

Revision and Proofreading Service

Summary Writing

Paper Help

Article Writing

Math Question - Tutor

Resume Writing

Data Analysis Help

Topic Research

Question: Practical Assignment 1a: Kvamme’s Gain Statistic: ARCL0103: Spatial Statistics, Network

Expert answer

Related Questions

How does the concept of distributed cognition help us to understand that ML & GPTs have further extended...

How can the concepts of Zone of Proximal Development, Experienced Other and Authoring be used to...

How do Western debates about AI - such as those surrounding surveillance, data collection, democratization...

Why is the concept of fusion skill (hybrid human + machine) helpful when thinking about the challenges...

What Clients Say About Us

Editing Service

Revision and Proofreading Service

Summary Writing

Paper Help

Article Writing

Math Question - Tutor

Resume Writing

Data Analysis Help

Topic Research