Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LayerCor fails with na.rm=TRUE if more than 2 layers #1056

Closed
ailich opened this issue Mar 8, 2023 · 10 comments
Closed

LayerCor fails with na.rm=TRUE if more than 2 layers #1056

ailich opened this issue Mar 8, 2023 · 10 comments

Comments

@ailich
Copy link
Contributor

ailich commented Mar 8, 2023

LayerCor will throw an error if there are more than 2 layers and na.rm=TRUE. This appears to be due to a change to address #1034. If I install a version prior to commit d0cf214 then the code will run but following that commit it breaks. The current CRAN and development versions will throw an error saying that the names of "mean" are not the proper length.

library(terra)
#> Warning: package 'terra' was built under R version 4.1.3
#> terra 1.7.19

set.seed(5)
var1<- rast(matrix(data = rnorm(n = 100), nrow=10, ncol=10))
var1[1,1]<- NA
var2<- rast(matrix(data = rnorm(n = 100), nrow=10, ncol=10))
var2[1,1]<- NA
var3<- rast(matrix(data = rnorm(n = 100), nrow=10, ncol=10))
var3[1,1]<- NA

env_var <- c(var1, var2, var3)
names(env_var)<- c("lyr1", "lyr2", "lyr3")
layerCor(env_var, "pearson")
#> $pearson
#>      lyr1 lyr2 lyr3
#> lyr1  NaN  NaN  NaN
#> lyr2  NaN  NaN  NaN
#> lyr3  NaN  NaN  NaN
#> 
#> $mean
#> lyr1 lyr2 lyr3 
#>   NA   NA   NA
layerCor(env_var, "pearson", na.rm=TRUE)
#> Error in names(means) <- names(x): 'names' attribute [3] must be the same length as the vector [2]
layerCor(env_var[[1:2]], "pearson", na.rm=TRUE)
#> $pearson
#>           lyr1      lyr2
#> lyr1 1.0000000 0.1046015
#> lyr2 0.1046015 1.0000000
#> 
#> $mean
#>       lyr1       lyr2 
#> 0.03683265 0.03683265

Created on 2023-03-08 by the reprex package (v2.0.1)

@rhijmans
Copy link
Member

rhijmans commented Mar 9, 2023

Thanks for reporting. Fixed now

layerCor(env_var, "pearson", na.rm=TRUE)
#$pearson
#           lyr1       lyr2       lyr3
#lyr1 1.00000000 0.10460149 0.05804355
#lyr2 0.10460149 1.00000000 0.06179808
#lyr3 0.05804355 0.06179808 1.00000000

#$mean
#           lyr1       lyr2          lyr3
#[1,]         NA 0.03683265 -0.0002860035
#[2,] 0.04044805 0.03683265            NA

Note that there now are two mean values for each layer, except for the first or last layer. This is because the means depend on the cells that are NA in a pair of layers.

@ailich
Copy link
Contributor Author

ailich commented Mar 9, 2023

Thanks. I'm having trouble installing the new version though. Based on the github badges/R-Universe it seems like there's a build failure. Also, I am a bit confused about the two lines in mean. Wouldn't the mean for each layer just be the equivalent of global(x, fun="mean", na.rm=TRUE) or global(x, fun="mean", na.rm=FALSE) with the value of na.rm taken from the input of layerCor?

@rhijmans
Copy link
Member

rhijmans commented Mar 9, 2023

Should be OK to install now.

As we are working with pairs, the cells removed by na.rm, and hence mean, depends on the pair:

# Example data 
d <- cbind(1:20, 20:1, 1:20)
d[3:6, 1] <- NA
d[8:11, 2] <- NA
d[12:15, 3] <- NA

## normal mean
colMeans(d, na.rm=TRUE)
#[1] 12.00 10.25  9.75

## paired mean (if na.rm=TRUE)
matrix(c(mean(d[!is.na(d[,2]),1], na.rm=TRUE), NA,
	mean(d[!is.na(d[,1]),2], na.rm=TRUE),
	mean(d[!is.na(d[,3]),2], na.rm=TRUE),
	mean(d[!is.na(d[,2]),3], na.rm=TRUE), NA), nrow=2)

#         [,1]      [,2]     [,3]
#[1,] 12.83333  8.166667 9.833333
#[2,]       NA 11.166667       NA

@rhijmans
Copy link
Member

rhijmans commented Mar 9, 2023

But I was mixing things up. This function computes values for s all pairs of layers and thus it should also have a matrix of the means (or not return the means at all).

@ailich
Copy link
Contributor Author

ailich commented Mar 10, 2023

Hmmn, yeah I'm not sure if the means are needed. Also if it is pairs shouldn't it return a square matrix of means where the dimensions are the number of variables? E.g. in the example below, the row is the mean of the variable and the column is the corresponding variable used to remove NA's for the "paired" mean.

# Example data 
d <- cbind(1:20, 20:1, 1:20)
d[3:6, 1] <- NA
d[8:11, 2] <- NA
d[12:15, 3] <- NA

## normal mean
colMeans(d, na.rm=TRUE)
#> [1] 12.00 10.25  9.75
#[1] 12.00 10.25  9.75

## paired mean (if na.rm=TRUE)
matrix(c(mean(d[!is.na(d[,1]),1], na.rm=TRUE), # mean of lyr1 paired with itself
         mean(d[!is.na(d[,2]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 2
         mean(d[!is.na(d[,3]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 3
         
         mean(d[!is.na(d[,1]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 1
         mean(d[!is.na(d[,2]),2], na.rm=TRUE), # mean of lyr2 paired with itself
         mean(d[!is.na(d[,3]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 3
         
         mean(d[!is.na(d[,1]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 1
         mean(d[!is.na(d[,2]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 2
         mean(d[!is.na(d[,3]),3], na.rm=TRUE) # mean of lyr3 paired with itself
         ), nrow=ncol(d), byrow=TRUE)
#>           [,1]      [,2]     [,3]
#> [1,] 12.000000 12.833333 11.50000
#> [2,]  8.166667 10.250000 11.16667
#> [3,] 11.500000  9.833333  9.75000

Created on 2023-03-09 with reprex v2.0.2

@rhijmans
Copy link
Member

rhijmans commented Mar 10, 2023

Yes, thank you, you are right. That I what I was trying to say, too hastily.

I now get

$pearson
           [,1]       [,2]       [,3]
[1,] 1.00000000 0.10460149 0.05804355
[2,] 0.10460149 1.00000000 0.06179808
[3,] 0.05804355 0.06179808 1.00000000

$mean
           [,1]       [,2]       [,3]
[1,]        NaN 0.04044805 0.04044805
[2,] 0.04044805        NaN 0.03683265
[3,] 0.04044805 0.03683265        NaN

I do not remember why the mean was added, but it needs to be computed anyway, so I will just leave it in.

layerCor(x, "pearson") is now done in C++ and it probably is considerably faster than before.

@ailich
Copy link
Contributor Author

ailich commented Mar 14, 2023

@rhijmans, the correlation values seem correct but there still seems to be something a bit wonky with the matrix of means. In the example below, layer 1 should have a mean of about 10, layer 2 of about 20, and layer 3 of about 30. The means matrix correctly has the three diagonals cells set to NA, but of the remaining six, four correspond to layer 1 (mean of 10), two correspond to layer 2 (mean of 20), and none correspond to layer 3 (mean of 30). Additionally, the layer 2 means don't appear in the same row or column. I've included some code below showing the comparison as well as two ways I believe the mean matrix should look depending on whether you want it to be by row or by column.

library(terra)
#> Warning: package 'terra' was built under R version 4.2.2
#> terra 1.7.21

set.seed(5)

#index to insert NA's
idx<- sample(1:10, size=6, replace = FALSE)

# Create layers
var1<- rast(matrix(data = rnorm(n = 100, mean = 10), nrow=10, ncol=10))
var1[idx[1],idx[2]]<- NA
var2<- rast(matrix(data = rnorm(n = 100, mean = 20), nrow=10, ncol=10))
var2[idx[3],idx[4]]<- NA
var3<- rast(matrix(data = rnorm(n = 100, mean=30), nrow=10, ncol=10))
var3[idx[5],idx[6]]<- NA

# Stack layers
env_var <- c(var1, var2, var3)
names(env_var)<- c("lyr1", "lyr2", "lyr3")

#Calculate Correlation and means
lc<- layerCor(env_var, "pearson", na.rm=TRUE)
lc
#> $pearson
#>             [,1]        [,2]        [,3]
#> [1,]  1.00000000 -0.04857323  0.15825102
#> [2,] -0.04857323  1.00000000 -0.04310552
#> [3,]  0.15825102 -0.04310552  1.00000000
#> 
#> $mean
#>          [,1]     [,2]     [,3]
#> [1,]      NaN 10.06361 10.05192
#> [2,] 10.06361      NaN 19.82057
#> [3,] 10.05192 19.82057      NaN

# Create data frame
df<- data.frame(lyr1=values(var1, mat=FALSE), lyr2=values(var2, mat=FALSE), lyr3=values(var3, mat=FALSE))
rc<- cor(df, method="pearson", use = "pairwise.complete.obs")

rc-lc$pearson # Results are essentially equivalent to R's correlation function
#>               lyr1          lyr2          lyr3
#> lyr1  0.000000e+00 -1.321859e-14 -2.567391e-14
#> lyr2 -1.321859e-14  0.000000e+00 -3.365364e-15
#> lyr3 -2.567391e-14 -3.365364e-15  0.000000e+00

# Paired Mean Mat

## By Row
matrix(c(NA, # mean of lyr1 paired with itself
         mean(df[!is.na(df[,2]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 2
         mean(df[!is.na(df[,3]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 3
         
         mean(df[!is.na(df[,1]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 1
         NA, # mean of lyr2 paired with itself
         mean(df[!is.na(df[,3]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 3
         
         mean(df[!is.na(df[,1]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 1
         mean(df[!is.na(df[,2]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 2
         NA # mean of lyr3 paired with itself
), nrow=ncol(df), byrow=TRUE)
#>          [,1]     [,2]     [,3]
#> [1,]       NA 10.06361 10.05192
#> [2,] 19.79897       NA 19.82057
#> [3,] 30.03971 30.06410       NA

## By Column
matrix(c(NA, # mean of lyr1 paired with itself
         mean(df[!is.na(df[,2]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 2
         mean(df[!is.na(df[,3]),1], na.rm=TRUE), # mean of lyr1 paired with lyr 3
         
         mean(df[!is.na(df[,1]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 1
         NA, # mean of lyr2 paired with itself
         mean(df[!is.na(df[,3]),2], na.rm=TRUE), # mean of lyr2 paired with lyr 3
         
         mean(df[!is.na(df[,1]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 1
         mean(df[!is.na(df[,2]),3], na.rm=TRUE), # mean of lyr3 paired with lyr 2
         NA # mean of lyr3 paired with itself
), nrow=ncol(df), byrow=FALSE)
#>          [,1]     [,2]     [,3]
#> [1,]       NA 19.79897 30.03971
#> [2,] 10.06361       NA 30.06410
#> [3,] 10.05192 19.82057       NA

Created on 2023-03-14 with reprex v2.0.2

@ailich
Copy link
Contributor Author

ailich commented Mar 14, 2023

I'm noticing that yours is symmetric across the diagonal and appears to be the upper triangle of my "by row" matrix. I don't think it should be symmetric though. For example, the mean of layer 1 using complete pair observations from layers 1 and 3 (row 1, col 3) should not be equivalent to the mean of layer 3 using the complete pair observations of layers 3 and 1 (row 3, col 1).

rhijmans added a commit that referenced this issue Mar 15, 2023
@rhijmans
Copy link
Member

Thank your tenaciousness, and pardon my sloppy hastiness. I believe this is fixed now as I get

> lc
$pearson
            [,1]        [,2]        [,3]
[1,]  1.00000000 -0.04857323  0.15825102
[2,] -0.04857323  1.00000000 -0.04310552
[3,]  0.15825102 -0.04310552  1.00000000

$mean
         [,1]     [,2]     [,3]
[1,]      NaN 10.06361 10.05192
[2,] 19.79897      NaN 19.82057
[3,] 30.03971 30.06410      NaN

Which is the same as your "by row" result.

@ailich
Copy link
Contributor Author

ailich commented Mar 15, 2023

Thanks! Seems to be working great. The only potential inconsistency I'm seeing is that weighted means are reported as a vector whereas unweighted means are reported as a matrix of means using complete pairwise observations.

library(terra)
#> Warning: package 'terra' was built under R version 4.2.2
#> terra 1.7.21

set.seed(5)

#index to insert NA's
idx<- sample(1:10, size=6, replace = FALSE)

# Create layers
var1<- rast(matrix(data = rnorm(n = 100, mean = 10), nrow=10, ncol=10))
var1[idx[1],idx[2]]<- NA
var2<- rast(matrix(data = rnorm(n = 100, mean = 20), nrow=10, ncol=10))
var2[idx[3],idx[4]]<- NA
var3<- rast(matrix(data = rnorm(n = 100, mean=30), nrow=10, ncol=10))
var3[idx[5],idx[6]]<- NA

w<- rast(matrix(data = sample(0:100, replace = TRUE, size = 100), nrow=10, ncol=10))

# Stack layers
env_var <- c(var1, var2, var3)
names(env_var)<- c("lyr1", "lyr2", "lyr3")

#Calculate Correlation and means
lc<- layerCor(env_var, "weighted.cov", w=w, na.rm=TRUE)
lc
#> $weighted_covariance
#>            lyr1        lyr2        lyr3
#> lyr1  1.0630874 -0.07739530  0.20238065
#> lyr2 -0.0773953  1.20684523 -0.02240855
#> lyr3  0.2023806 -0.02240855  1.06605579
#> 
#> $weighted_mean
#>      lyr1      lyr2      lyr3 
#>  9.995927 19.773162 29.730472

Created on 2023-03-15 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants