-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LayerCor fails with na.rm=TRUE if more than 2 layers #1056
Comments
Thanks for reporting. Fixed now
Note that there now are two mean values for each layer, except for the first or last layer. This is because the means depend on the cells that are NA in a pair of layers. |
Thanks. I'm having trouble installing the new version though. Based on the github badges/R-Universe it seems like there's a build failure. Also, I am a bit confused about the two lines in mean. Wouldn't the mean for each layer just be the equivalent of |
Should be OK to install now. As we are working with pairs, the cells removed by na.rm, and hence mean, depends on the pair:
|
But I was mixing things up. This function computes values for s all pairs of layers and thus it should also have a matrix of the means (or not return the means at all). |
Hmmn, yeah I'm not sure if the means are needed. Also if it is pairs shouldn't it return a square matrix of means where the dimensions are the number of variables? E.g. in the example below, the row is the mean of the variable and the column is the corresponding variable used to remove NA's for the "paired" mean. # Example data
d <- cbind(1:20, 20:1, 1:20)
d[3:6, 1] <- NA
d[8:11, 2] <- NA
d[12:15, 3] <- NA
## normal mean
colMeans(d, na.rm=TRUE)
#> [1] 12.00 10.25 9.75
#[1] 12.00 10.25 9.75
## paired mean (if na.rm=TRUE)
matrix(c(mean(d[!is.na(d[,1]),1], na.rm=TRUE), # mean of lyr1 paired with itself
mean(d[!is.na(d[,2]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 2
mean(d[!is.na(d[,3]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 3
mean(d[!is.na(d[,1]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 1
mean(d[!is.na(d[,2]),2], na.rm=TRUE), # mean of lyr2 paired with itself
mean(d[!is.na(d[,3]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 3
mean(d[!is.na(d[,1]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 1
mean(d[!is.na(d[,2]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 2
mean(d[!is.na(d[,3]),3], na.rm=TRUE) # mean of lyr3 paired with itself
), nrow=ncol(d), byrow=TRUE)
#> [,1] [,2] [,3]
#> [1,] 12.000000 12.833333 11.50000
#> [2,] 8.166667 10.250000 11.16667
#> [3,] 11.500000 9.833333 9.75000 Created on 2023-03-09 with reprex v2.0.2 |
Yes, thank you, you are right. That I what I was trying to say, too hastily. I now get
I do not remember why the mean was added, but it needs to be computed anyway, so I will just leave it in.
|
@rhijmans, the correlation values seem correct but there still seems to be something a bit wonky with the matrix of means. In the example below, layer 1 should have a mean of about 10, layer 2 of about 20, and layer 3 of about 30. The means matrix correctly has the three diagonals cells set to NA, but of the remaining six, four correspond to layer 1 (mean of 10), two correspond to layer 2 (mean of 20), and none correspond to layer 3 (mean of 30). Additionally, the layer 2 means don't appear in the same row or column. I've included some code below showing the comparison as well as two ways I believe the mean matrix should look depending on whether you want it to be by row or by column. library(terra)
#> Warning: package 'terra' was built under R version 4.2.2
#> terra 1.7.21
set.seed(5)
#index to insert NA's
idx<- sample(1:10, size=6, replace = FALSE)
# Create layers
var1<- rast(matrix(data = rnorm(n = 100, mean = 10), nrow=10, ncol=10))
var1[idx[1],idx[2]]<- NA
var2<- rast(matrix(data = rnorm(n = 100, mean = 20), nrow=10, ncol=10))
var2[idx[3],idx[4]]<- NA
var3<- rast(matrix(data = rnorm(n = 100, mean=30), nrow=10, ncol=10))
var3[idx[5],idx[6]]<- NA
# Stack layers
env_var <- c(var1, var2, var3)
names(env_var)<- c("lyr1", "lyr2", "lyr3")
#Calculate Correlation and means
lc<- layerCor(env_var, "pearson", na.rm=TRUE)
lc
#> $pearson
#> [,1] [,2] [,3]
#> [1,] 1.00000000 -0.04857323 0.15825102
#> [2,] -0.04857323 1.00000000 -0.04310552
#> [3,] 0.15825102 -0.04310552 1.00000000
#>
#> $mean
#> [,1] [,2] [,3]
#> [1,] NaN 10.06361 10.05192
#> [2,] 10.06361 NaN 19.82057
#> [3,] 10.05192 19.82057 NaN
# Create data frame
df<- data.frame(lyr1=values(var1, mat=FALSE), lyr2=values(var2, mat=FALSE), lyr3=values(var3, mat=FALSE))
rc<- cor(df, method="pearson", use = "pairwise.complete.obs")
rc-lc$pearson # Results are essentially equivalent to R's correlation function
#> lyr1 lyr2 lyr3
#> lyr1 0.000000e+00 -1.321859e-14 -2.567391e-14
#> lyr2 -1.321859e-14 0.000000e+00 -3.365364e-15
#> lyr3 -2.567391e-14 -3.365364e-15 0.000000e+00
# Paired Mean Mat
## By Row
matrix(c(NA, # mean of lyr1 paired with itself
mean(df[!is.na(df[,2]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 2
mean(df[!is.na(df[,3]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 3
mean(df[!is.na(df[,1]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 1
NA, # mean of lyr2 paired with itself
mean(df[!is.na(df[,3]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 3
mean(df[!is.na(df[,1]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 1
mean(df[!is.na(df[,2]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 2
NA # mean of lyr3 paired with itself
), nrow=ncol(df), byrow=TRUE)
#> [,1] [,2] [,3]
#> [1,] NA 10.06361 10.05192
#> [2,] 19.79897 NA 19.82057
#> [3,] 30.03971 30.06410 NA
## By Column
matrix(c(NA, # mean of lyr1 paired with itself
mean(df[!is.na(df[,2]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 2
mean(df[!is.na(df[,3]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 3
mean(df[!is.na(df[,1]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 1
NA, # mean of lyr2 paired with itself
mean(df[!is.na(df[,3]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 3
mean(df[!is.na(df[,1]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 1
mean(df[!is.na(df[,2]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 2
NA # mean of lyr3 paired with itself
), nrow=ncol(df), byrow=FALSE)
#> [,1] [,2] [,3]
#> [1,] NA 19.79897 30.03971
#> [2,] 10.06361 NA 30.06410
#> [3,] 10.05192 19.82057 NA Created on 2023-03-14 with reprex v2.0.2 |
I'm noticing that yours is symmetric across the diagonal and appears to be the upper triangle of my "by row" matrix. I don't think it should be symmetric though. For example, the mean of layer 1 using complete pair observations from layers 1 and 3 (row 1, col 3) should not be equivalent to the mean of layer 3 using the complete pair observations of layers 3 and 1 (row 3, col 1). |
Thank your tenaciousness, and pardon my sloppy hastiness. I believe this is fixed now as I get
Which is the same as your "by row" result. |
Thanks! Seems to be working great. The only potential inconsistency I'm seeing is that weighted means are reported as a vector whereas unweighted means are reported as a matrix of means using complete pairwise observations. library(terra)
#> Warning: package 'terra' was built under R version 4.2.2
#> terra 1.7.21
set.seed(5)
#index to insert NA's
idx<- sample(1:10, size=6, replace = FALSE)
# Create layers
var1<- rast(matrix(data = rnorm(n = 100, mean = 10), nrow=10, ncol=10))
var1[idx[1],idx[2]]<- NA
var2<- rast(matrix(data = rnorm(n = 100, mean = 20), nrow=10, ncol=10))
var2[idx[3],idx[4]]<- NA
var3<- rast(matrix(data = rnorm(n = 100, mean=30), nrow=10, ncol=10))
var3[idx[5],idx[6]]<- NA
w<- rast(matrix(data = sample(0:100, replace = TRUE, size = 100), nrow=10, ncol=10))
# Stack layers
env_var <- c(var1, var2, var3)
names(env_var)<- c("lyr1", "lyr2", "lyr3")
#Calculate Correlation and means
lc<- layerCor(env_var, "weighted.cov", w=w, na.rm=TRUE)
lc
#> $weighted_covariance
#> lyr1 lyr2 lyr3
#> lyr1 1.0630874 -0.07739530 0.20238065
#> lyr2 -0.0773953 1.20684523 -0.02240855
#> lyr3 0.2023806 -0.02240855 1.06605579
#>
#> $weighted_mean
#> lyr1 lyr2 lyr3
#> 9.995927 19.773162 29.730472 Created on 2023-03-15 with reprex v2.0.2 |
LayerCor
will throw an error if there are more than 2 layers andna.rm=TRUE
. This appears to be due to a change to address #1034. If I install a version prior to commit d0cf214 then the code will run but following that commit it breaks. The current CRAN and development versions will throw an error saying that the names of "mean" are not the proper length.Created on 2023-03-08 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: