-
Notifications
You must be signed in to change notification settings - Fork 93
Expand file tree
/
Copy pathMercer_Kernel.Rmd
More file actions
169 lines (134 loc) · 4.8 KB
/
Mercer_Kernel.Rmd
File metadata and controls
169 lines (134 loc) · 4.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
title: "Mercer_Kernel"
output: github_document
---
Notes in support of [https://win-vector.com/2020/12/25/abs-and-relu-are-not-mercer-kernels/](https://win-vector.com/2020/12/25/abs-and-relu-are-not-mercer-kernels/).
John Mount
December 2020
Here are some new notes on what is and what is not a Mercer Kernel style function.
* `abs(x dot y)` and `relu(x dot y)` are not in general Mercer Kernels [https://github.com/WinVector/Examples/blob/main/Mercer_kernel/Mercer_Kernel.md](https://github.com/WinVector/Examples/blob/main/Mercer_kernel/Mercer_Kernel.md).
* How to check if a 3 by 3 matrix is positive (semi) definite [https://github.com/WinVector/Examples/blob/main/Mercer_kernel/Dets.ipynb](https://github.com/WinVector/Examples/blob/main/Mercer_kernel/Dets.ipynb).
* Why dot product is a Mercer Kernel [https://github.com/WinVector/Examples/blob/main/Mercer_kernel/dot.md](https://github.com/WinVector/Examples/blob/main/Mercer_kernel/dot.md).
I've written before on Kernel methods (in the context of support vector machines).
* [https://win-vector.com/2011/10/07/kernel-methods-and-support-vector-machines-de-mystified/](https://win-vector.com/2011/10/07/kernel-methods-and-support-vector-machines-de-mystified/)
* [https://win-vector.com/2015/02/14/how-sure-are-you-that-large-margin-implies-low-vc-dimension/](https://win-vector.com/2015/02/14/how-sure-are-you-that-large-margin-implies-low-vc-dimension/)
Here we will show an example of non-kernel functions.
Both `abs(x dot y)` and `relu(x dot y)` (`=max(0, x dot y)`) are not positive semi-definite Mercer kernels when `x, y` are 2-dimensional vectors or larger. This can be subtle as for scalars `abs(x * y)` = `abs(x) * abs(y)` is exactly the kind of decomposition used to establish kernel properties.
By Mercer's theorem anything that generates only positive semi-definite Gram matrices can be realized as the inner product of vectors mapped into a, possibly infinite dimensional, vector space.
```{r}
n <- 4 # dimension of Gram matrix
d <- 2 # dimension of vectors
# abs(x.y) and relu(x.y) are not Mercer Kernels
#
# Gram matrix
# True for 3x3 case by Sylvester's criterion
# https://math.stackexchange.com/questions/1355088/is-the-absolute-value-of-a-p-d-s-matrix-p-d-s
# https://math.stackexchange.com/questions/1355088/is-the-absolute-value-of-a-p-d-s-matrix-p-d-s#comment2758302_1355088
# and true for scalar arguments as abs(x*y) = abs(x)*abs(y)
dot_f <- function(x, y) {
sum(x*y)
}
relu_f <- function(x, y) {
max(0, sum(x*y))
}
abs_f <- function(x, y) {
abs(sum(x*y))
}
# build the Gram matrix for a given kernel
Gram_matrix <- function(n, k_f, x) {
idxs = expand.grid(
i = seq_len(n),
j = seq_len(n))
matrix(vapply(
seq_len(nrow(idxs)),
function(i) k_f(x[idxs[i, 1], ], x[idxs[i, 2], ]),
numeric(1)),
nrow = n,
ncol = n)
}
```
```{r, eval=FALSE}
fn <- function(n, d) {
# x <- matrix(rnorm(n = n*d), nrow = n, ncol = d)
x <- matrix(
sample(c(-1, 0, 1),
size = n*d, replace = TRUE),
nrow = n,
ncol = d)
mat_relu <- Gram_matrix(n = n, k_f = relu_f, x = x)
eigen_relu = min(eigen(mat_relu)$values)
mat_abs <- Gram_matrix(n = n, k_f = abs_f, x = x)
eigen_abs = min(eigen(mat_abs)$values)
list(v = max(eigen_relu, eigen_abs),
x = x,
eigen_relu = eigen_relu,
eigen_abs = eigen_abs)
}
example <- lapply(
1:10000,
function(i) fn(n = n, d = d))
idx_relu = which.min(vapply(example, function(v) v$eigen_relu, numeric(1)))
print(example[[idx_relu]])
idx_abs = which.min(vapply(example, function(v) v$eigen_abs, numeric(1)))
print(example[[idx_abs]])
idx <- which.min(vapply(example, function(v) v$v, numeric(1)))
example <- example[[idx]]
print(example)
x <- example$x
t(x)
```
```{r}
x <- matrix(c(-1, -1, -1, 0, -1, 1, 0, -1), nrow = n, ncol = d)
t(x)
```
The dot-product is a Kernel, in fact the prototypical one. In fact `g^t G(a) g`, (`G(a)` being the Gram matrix with
`G(a)[i, j] = a(i) . a(j)`) is equal to `||[a_1, ... a_n] g||^2`.
```{r}
mat_dot <- Gram_matrix(n = n, k_f = dot_f, x = x)
mat_dot
```
```{r}
eigen_dot <- eigen(mat_dot)
if(min(eigen_dot$values)<0) {
stop("expected eigen_dot to have only non-negative eigenvalues")
}
eigen_dot
```
```{r}
mat_abs <- Gram_matrix(n = n, k_f = abs_f, x = x)
mat_abs
```
```{r}
eigen_abs <- eigen(mat_abs)
eigen_abs
```
```{r}
example_abs <- eigen_abs$vectors[, n]
example_abs
```
```{r}
v_abs <- as.numeric(t(example_abs) %*% mat_abs %*% example_abs)
if(v_abs>=0) {
stop("expected v_abs negative")
}
v_abs
```
```{r}
mat_relu <- Gram_matrix(n = n, k_f = relu_f, x = x)
mat_relu
```
```{r}
eigen_relu <- eigen(mat_relu)
eigen_relu
```
```{r}
example_relu <- eigen_relu$vectors[, n]
example_relu
```
```{r}
v_relu <- as.numeric(t(example_relu) %*% mat_relu %*% example_relu)
if(v_relu>=0) {
stop("expected v_relu negative")
}
v_relu
```