Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layerCor() $mean could be a single column #1323

Closed
AMBarbosa opened this issue Oct 24, 2023 · 7 comments
Closed

layerCor() $mean could be a single column #1323

AMBarbosa opened this issue Oct 24, 2023 · 7 comments

Comments

@AMBarbosa
Copy link
Contributor

layerCor() with the default arguments returns a square matrix of correlations and a square matrix of means. Is there a reason for the latter to be a square matrix, rather than just a one-column matrix or a named vector? The columns are confusing this way, and the rows have redundant info. Cheers!

@rhijmans
Copy link
Member

Thanks. I do not know what the reason for this was, if there was any. So I have made it the same as for the other values of "fun"

@ailich
Copy link
Contributor

ailich commented Oct 25, 2023

@rhijmans @AMBarbosa, the reason for returning a matrix is related to the paired means ,which is the mean of that layer counting only the cells that are not NA in either raster (See issue #1056 ). I'm not sure if that's super useful, but that is what was previously being returned. The new version of terra instead attempts to return the global mean for each layer but provides an incorrect result in the presence of NA values.

Old Version

The old version returns a matrix of paired means with NA's along the diagonal. The row indicates the target layer whose mean is taken and the column indicates which layer it is being paired with. The mean is taken of that target layer, only considering values where complete pairwise observations are present for calculating the correlation between two layers.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.47

r<- rast(volcano, 
         extent= ext(2667400, 2667400 + ncol(volcano)*10, 
                     6478700, 6478700 + nrow(volcano)*10), 
         crs = "EPSG:27200")
names(r)<- "elev"

slp<- terrain(r, v="slope", unit="degrees")
asp<- terrain(r, v="aspect", unit="degrees")


set.seed(5)
values(r)[sample(1:ncell(r), size = 100)]<- NA
values(slp)[sample(1:ncell(slp), size = 1000)]<- NA
values(asp)[sample(1:ncell(asp), size = 500)]<- NA
plot(c(r,slp,asp))

layerCor(c(r,slp,asp), fun="pearson", na.rm=TRUE)$mean
#>             elev    slope    aspect
#> elev         NaN 131.6323 131.64390
#> slope   14.96869      NaN  14.84961
#> aspect 178.02437 178.8241       NaN

# Overall mean of elevation
global(r, mean, na.rm=TRUE)
#>          mean
#> elev 130.1602

#Mean of elevation when removing values where slope and elevation are NA
#Row 1, Column 2 of mean matrix
mean(values(r)[!(is.na(values(slp))|is.na(values(r)))])
#> [1] 131.6323


#Mean of elevation when removing values where aspect and elevation are NA
#Row 1, Column 3 of mean matrix
mean(values(r)[!(is.na(values(asp))|is.na(values(r)))])
#> [1] 131.6439

Created on 2023-10-25 with reprex v2.0.2

New Version

With NA's present, the means returned will NOT match the global means for each layer

library(terra)
#> Warning: package 'terra' was built under R version 4.2.3
#> terra 1.7.57

r<- rast(volcano, 
         extent= ext(2667400, 2667400 + ncol(volcano)*10, 
                     6478700, 6478700 + nrow(volcano)*10), 
         crs = "EPSG:27200")
names(r)<- "elev"

slp<- terrain(r, v="slope", unit="degrees")
asp<- terrain(r, v="aspect", unit="degrees")


set.seed(5)
values(r)[sample(1:ncell(r), size = 100)]<- NA
values(slp)[sample(1:ncell(slp), size = 1000)]<- NA
values(asp)[sample(1:ncell(asp), size = 500)]<- NA
plot(c(r,slp,asp))

layerCor(c(r,slp,asp), fun="pearson", na.rm=TRUE)$mean
#>      elev     slope    aspect 
#> 131.63808  14.90915 178.42426

# Overall mean of elevation
global(r, mean, na.rm=TRUE)
#>          mean
#> elev 130.1602

#Mean of elevation when removing values where slope and elevation are NA
#Row 1, Column 2 of mean matrix
mean(values(r)[!(is.na(values(slp))|is.na(values(r)))])
#> [1] 131.6323


#Mean of elevation when removing values where aspect and elevation are NA
#Row 1, Column 3 of mean matrix
mean(values(r)[!(is.na(values(asp))|is.na(values(r)))])
#> [1] 131.6439

Created on 2023-10-25 with reprex v2.0.2

New Version (No NA's)

Without NA's the means returned will match the global means for each layer

library(terra)
#> Warning: package 'terra' was built under R version 4.2.3
#> terra 1.7.57

r<- rast(volcano, 
         extent= ext(2667400, 2667400 + ncol(volcano)*10, 
                     6478700, 6478700 + nrow(volcano)*10), 
         crs = "EPSG:27200")
names(r)<- "elev"

slp<- terrain(r, v="slope", unit="degrees")
asp<- terrain(r, v="aspect", unit="degrees")

slp[is.na(values(slp))]<- global(slp, mean, na.rm=TRUE) #Fill NA's
asp[is.na(values(asp))]<- global(asp, mean, na.rm=TRUE) #Fill NA's


layerCor(c(r,slp,asp), fun="pearson", na.rm=TRUE)$mean
#>      elev     slope    aspect 
#> 130.18787  14.89747 178.66084

# Overall mean
global(c(r, slp, asp), mean, na.rm=TRUE)
#>             mean
#> elev   130.18787
#> slope   14.89747
#> aspect 178.66084

Created on 2023-10-25 with reprex v2.0.2

@ailich
Copy link
Contributor

ailich commented Oct 25, 2023

@rhijmans, following up on a comment in #1056 where using fun="weighted.cov" returns a vector of weighted means rather than a matrix of paired weighted means. I mentioned this was inconsistent with whar fun="pearson" returns, but I just realized that this also provides an incorrect result in the presence of NA values regardless of whether a newer or older version of terra is used. I suspect this issue is related to something like using the number of cells in the raster as the number of observations in the calculation rather than the number of cells that are not NA.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.47

r<- rast(volcano, 
         extent= ext(2667400, 2667400 + ncol(volcano)*10, 
                     6478700, 6478700 + nrow(volcano)*10), 
         crs = "EPSG:27200")
names(r)<- "elev"

slp<- terrain(r, v="slope", unit="degrees")
asp<- terrain(r, v="aspect", unit="degrees")
slp[is.na(values(slp))]<- global(slp, mean, na.rm=TRUE) #Fill NA's
asp[is.na(values(asp))]<- global(asp, mean, na.rm=TRUE) #Fill NA's

set.seed(5)
w<- rast(r)
values(w)<- sample(0:100, replace = TRUE, size = ncell(w)) #Weights layer

# Without NA's the value returned by layerCor returns the global weighted mean
layerCor(c(r, slp, asp), "weighted.cov", w=w, na.rm=TRUE)$weighted_mean
#>      elev     slope    aspect 
#> 130.16713  14.92675 179.11686

weighted.mean(values(r), w = values(w), na.rm=TRUE)
#> [1] 130.1671
weighted.mean(values(slp), w = values(w), na.rm=TRUE)
#> [1] 14.92675
weighted.mean(values(asp), w = values(w), na.rm=TRUE)
#> [1] 179.1169

#Introduce NA's
set.seed(5)
values(r)[sample(1:ncell(r), size = 100)]<- NA
values(slp)[sample(1:ncell(slp), size = 1000)]<- NA
values(asp)[sample(1:ncell(asp), size = 500)]<- NA

# With NA's the value returned by layerCor does NOT match the global weighted mean
layerCor(c(r, slp, asp), "weighted.cov", w=w, na.rm=TRUE)$weighted_mean
#>      elev     slope    aspect 
#> 127.99890  12.16293 161.46173

weighted.mean(values(r), w = values(w), na.rm=TRUE)
#> [1] 130.187
weighted.mean(values(slp), w = values(w), na.rm=TRUE)
#> [1] 14.99564
weighted.mean(values(asp), w = values(w), na.rm=TRUE)
#> [1] 178.3456

Created on 2023-10-25 with reprex v2.0.2

@ailich
Copy link
Contributor

ailich commented Oct 26, 2023

Actually, this is not related to counting the number of non-NA cells. This line here is just calculating the row-wise mean of the pairwise means since an apply statment was added in commit 5ac826e. This is only equal to the global mean if all layers have NA's for the exact same cells (i.e. all values in a row are equal).

library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.47

r<- rast(volcano, 
         extent= ext(2667400, 2667400 + ncol(volcano)*10, 
                     6478700, 6478700 + nrow(volcano)*10), 
         crs = "EPSG:27200")
names(r)<- "elev"

slp<- terrain(r, v="slope", unit="degrees")
asp<- terrain(r, v="aspect", unit="degrees")
slp[is.na(values(slp))]<- global(slp, mean, na.rm=TRUE) #Fill NA's
asp[is.na(values(asp))]<- global(asp, mean, na.rm=TRUE) #Fill NA's


set.seed(5)
idx<- sample(1:ncell(slp), size = 1000)
values(r)[idx]<- NA
values(slp)[idx]<- NA
values(asp)[idx]<- NA

global(c(r,slp,asp), fun=mean, na.rm=TRUE)
#>             mean
#> elev   130.08776
#> slope   14.94358
#> aspect 179.72616
rowMeans(layerCor(c(r,slp,asp), fun="pearson", na.rm=TRUE)$mean, na.rm = TRUE)
#>      elev     slope    aspect 
#> 130.08776  14.94358 179.72616

Created on 2023-10-26 with reprex v2.0.2

@ailich
Copy link
Contributor

ailich commented Oct 26, 2023

However, with weighted.cov it does appear to be related to counting the number of Non-NA cells.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.47

m<- matrix(data=c(rep(NA, 8), 1), nrow=3, ncol=3)
w<- matrix(data=1/9, nrow=3, ncol=3) #Equal weights per cell

# Layer means should be 1, 2, and 3
r<- c(rast(m), rast(m*2), rast(m*3))
names(r)<- c("lyr.1", "lyr.2", "lyr.3")


apply(values(r), MARGIN = 2, FUN = weighted.mean, w= w, na.rm=TRUE)
#> lyr.1 lyr.2 lyr.3 
#>     1     2     3

layerCor(r, "weighted.cov", w=rast(w), na.rm=TRUE)$weighted_mean #layerCor provides identical result to dividing by 9 even though there is only 1 non-NA cell
#>     lyr.1     lyr.2     lyr.3 
#> 0.1111111 0.2222222 0.3333333

apply(values(r), MARGIN = 2, FUN = weighted.mean, w= w, na.rm=TRUE)/9
#>     lyr.1     lyr.2     lyr.3 
#> 0.1111111 0.2222222 0.3333333

Created on 2023-10-26 with reprex v2.0.2

rhijmans added a commit that referenced this issue Oct 27, 2023
rhijmans added a commit that referenced this issue Oct 27, 2023
rhijmans added a commit that referenced this issue Oct 27, 2023
@rhijmans
Copy link
Member

Thanks a lot for that Alex. I think it now works as it should, and we are keeping the matrix with means as the mean for a layer can change depending on the NAs in the other layers.

@ailich
Copy link
Contributor

ailich commented Oct 27, 2023

Awesome. Seems to be working great now! I put in a minor pull request to clarify the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants