Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic kernel #61

Closed
andreaskoher opened this issue Apr 9, 2021 · 12 comments
Closed

Periodic kernel #61

andreaskoher opened this issue Apr 9, 2021 · 12 comments
Assignees

Comments

@andreaskoher
Copy link
Contributor

Hi @willtebbutt,

From the ToDo list #59 I have tried to go ahead with the periodic kernel as proposed by Solin & Särkkä (2014). The periodic kernel is approximated by a Taylor series and I imagine there are multiple ways to implement the idea. One is to implement the approximated periodic kernel directly as a SimpleKernel but, here I chose to build it up from a sum of elementary oscillators (I am absolutely not sure which approach is more efficient though). Also, I assumed that the kernel takes only a one dimensional input as done in the paper, but probably this can be extended. Finally, I also had to modify lgssm_components(k::KernelSum ...) in order to allow for more than two kernel components.

struct OscillatorKernel{T,I} <: KernelFunctions.SimpleKernel
    l
end
struct ApproxPeriodicKernel{T} <: KernelFunctions.Kernel
    kernels::KernelFunctions.KernelSum
end

ApproxPeriodicKernel( k::PeriodicKernel{T}; n=3 ) where T =
    ApproxPeriodicKernel{T}( sum( OscillatorKernel{T,i}( 1/(4*only( k.r )^2) ) for i in 1:n ) )
#the parametrisation in the paper is slightly different from KernelFunctions: l -> 2r
ApproxPeriodicKernel(; r=1.) = ApproxPeriodicKernel(PeriodicKernel(r=[r]))

function TemporalGPs.to_sde(k::OscillatorKernel{T,I}, s::SArrayStorage{T}) where {T<:Real,I}
    F = SMatrix{2, 2, T}(0, π*I, -π*I, 0)
    q = convert(T, 2*besseli(I, k.l)/exp(k.l))
    H = SVector{2, T}(1, 0)
    return F, q, H
end
function TemporalGPs.to_sde(k::OscillatorKernel{T,0}, s::SArrayStorage{T}) where {T<:Real}
    F = SMatrix{2, 2, T}(0, 0, 0, 0)
    q = convert(T, besseli(0,k.l)/exp(k.l))
    H = SVector{2, T}(1, 0)
    return F, q, H
end

function TemporalGPs.stationary_distribution(k::OscillatorKernel{T,I}, s::SArrayStorage{T}) where {T<:Real,I}
    q = 2*besseli(I,k.l)/exp(k.l)
    m = SVector{2, T}(0, 0)
    P = SMatrix{2, 2, T}( q, 0, 0, q )
    return TemporalGPs.Gaussian(m, P)
end
function TemporalGPs.stationary_distribution(k::OscillatorKernel{T,0}, s::SArrayStorage{T}) where {T<:Real}
    q = besseli(0,k.l)/exp(k.l)
    m = SVector{2, T}(0, 0)
    P = SMatrix{2, 2, T}( q, 0, 0, q )
    return TemporalGPs.Gaussian(m, P)
end

function TemporalGPs.lgssm_components(k::KernelSum, ts::AbstractVector, storage_type::TemporalGPs.StorageType)
    n = length(k.kernels)
    As_l, as_l, Qs_l, emission_proj_l, x0_l = TemporalGPs.lgssm_components(k.kernels[1], ts, storage_type)
    for i in 2:n
        As_r, as_r, Qs_r, emission_proj_r, x0_r = TemporalGPs.lgssm_components(k.kernels[i], ts, storage_type)
        As_l = map(TemporalGPs.blk_diag, As_l, As_r)
        as_l = map(vcat, as_l, as_r)
        Qs_l = map(TemporalGPs.blk_diag, Qs_l, Qs_r)
        emission_proj_l = TemporalGPs._sum_emission_projections(emission_proj_l, emission_proj_r)
        x0_l = TemporalGPs.Gaussian(vcat(x0_l.m, x0_r.m), TemporalGPs.blk_diag(x0_l.P, x0_r.P))
    end
    return As_l, as_l, Qs_l, emission_proj_l, x0_l
end

TemporalGPs.lgssm_components( k::ApproxPeriodicKernel{T}, t::AbstractVector, storage_type::TemporalGPs.StorageType{T},
) where {T<:Real} = TemporalGPs.lgssm_components( k.kernels, t, storage_type )

a simple usage example:

Random.seed!(1234)
f = GP( 10*transform(PeriodicKernel(r=[.5]), 1/50) )
x = RegularSpacing(0.0, 1., 365)
fx = f(x, .0001)
y = rand(fx)

f_naive_approx = GP( 10*transform(ApproxPeriodicKernel(r = .5), 1/50) )
f_approx = to_sde(f_naive_approx, SArrayStorage(Float64))
f_post_approx = posterior(f_approx(x, .0001), y)
y_post_approx = rand(f_post_approx(x, .0001))

plot(y)
plot!(y_post)

comparison

However, I ran into an issue with Zygote again, when calculating the gradient:

function objective(θ)
    f_naive = GP( 10*transform(ApproxPeriodicKernel(r = .5), 1/50) )
    f = to_sde(f_naive, SArrayStorage(Float64))
    return -logpdf(f(x, θ), y)
end
Zygote.gradient(objective, .1)

Need an adjoint for constructor SVector{2, Float64}. Gradient is of type Vector{Float64}

@andreaskoher
Copy link
Contributor Author

andreaskoher commented Apr 12, 2021

Alternatively to the above, the approach below is somewhat simpler as it directly constructs a SimpleKernel following Arno's implementation in GPStuff.

struct ApproxPeriodicKernel{J,T} <: KernelFunctions.SimpleKernel
    l::T # length scale as defined in Arno's text book 
end

# assuming length( k.r ) == 1
ApproxPeriodicKernel(k::PeriodicKernel{T}; degree_of_approx = 3) where {T} = ApproxPeriodicKernel{degree_of_approx,T}( 2*only(k.r) )
ApproxPeriodicKernel(; r=1/2, degree_of_approx = 3 ) = ApproxPeriodicKernel(PeriodicKernel(; r=[r]); degree_of_approx = degree_of_approx)
# k = ApproxPeriodicKernel()

function to_sde(k::ApproxPeriodicKernel{J,T}, s::SArrayStorage{T}) where {J,T<:Real}
    D  = 2*(J+1)
    w0 = 2π
    F  = kron( Diagonal(0:J), [0 -w0; w0 0])
    H  = kron( ones(1, J+1),[1 0])
    Fs = SMatrix{D,D,T}(F)
    q  = T(0)
    Hs = SVector{D,T}(H)#SMatrix{1,D,T}(H)
    return Fs, q, Hs
end

function stationary_distribution(k::ApproxPeriodicKernel{J,T}, s::SArrayStorage{T}) where {J,T<:Real}
    D = 2*(J+1)
    q = [ j>0 ? 2*besseli(j, k.l^-2)/exp(k.l^-2) : besseli(0, k.l^-2)/exp(k.l^-2) for j in 0:J]
    P = kron(Diagonal(q), Diagonal([1, 1]))
    m = SVector{D, T}( zeros(T,D) )
    Ps = SMatrix{D,D,T}(P)
    return Gaussian(m, Ps)
end

@willtebbutt
Copy link
Member

Sorry for the slow response.

I think this second approach is probably better for now -- I'm not sure how useful the oscillator kernel is likely to be outside of the context of approximating this stuff.

@willtebbutt
Copy link
Member

Regarding the Zygote + StaticArrays problems -- this is something that needs fixing.

I'll have a hack at a general solution this evening.

@willtebbutt
Copy link
Member

Actually, looks like it'll have to be tomorrow unfortunately.

@andreaskoher
Copy link
Contributor Author

no worries. I completely agree with the second approach

@willtebbutt
Copy link
Member

@andreaskoher how's it going with this? Is there anything blocking it that I could assist with?

@andreaskoher
Copy link
Contributor Author

thanks for the reminder :)
With the second definition of ApproxPeriodicKernel, I ran into the following problem with Zygote as mentioned in #65:

const x = RegularSpacing(0., 0.01, 1000)
const y = rand( to_sde( GP( ApproxPeriodicKernel() ) )(x) )
function objective(θ)
    k = θ * ApproxPeriodicKernel()
    f = to_sde( GP(k), SArrayStorage(Float64))
    -logpdf(f(x), y)
end
Zygote.gradient(objective, 1.)

no method matching iterate(::Nothing)

It references the list comprehension in stationary_distribution(k::ApproxPeriodicKernel{J,T}, s::SArrayStorage{T}):

q = [ j>0 ? 2*besseli(j, k.l^-2)/exp(k.l^-2) : besseli(0, k.l^-2)/exp(k.l^-2) for j in 0:J]

I can fix the problem by iteratively applying vcat in order to construct q as follows:

function TemporalGPs.stationary_distribution(k::ApproxPeriodicKernel{J,T}, s::SArrayStorage{T}) where {J,T<:Real}
    D = 2*(J+1)
    q = [besseli(0, k.l^-2)/exp(k.l^-2)]
    for j in 1:J
        q  = vcat(q, [2*besseli(j, 1/k.l^-2)/exp(k.l^-2)])
    end
    # q = [ j>0 ? 2*besseli(j, k.l^-2)/exp(k.l^-2) : besseli(0, k.l^-2)/exp(k.l^-2) for j in 0:J]
    P = kron(Diagonal(q), Diagonal([1, 1]))
    m = SVector{D, T}( zeros(T,D) )
    Ps = SMatrix{D,D,T}(P)
    return TemporalGPs.Gaussian(m, Ps)
end

It looks somewhat inefficient to me but, I am also no expert on good coding style. In any case, with this fix, I can correctly infer the parameters:

# Declare model parameters using `ParameterHandling.jl` types.
Random.seed!(1234)

params = (
    s = positive(10.),
    l = positive(1/20),
    v  = positive(.01),
)

function build_gp(ps)
    k = ps.s* ApproxPeriodicKernel()  ScaleTransform(ps.l)
    return to_sde( GP(k), SArrayStorage(Float64))
end

# Generate some synthetic data from the prior.
x = RegularSpacing(0.0, 0.01, 10_000)
y = rand(build_gp(value(params))(x, value(params.v)))

# Construct mapping between structured and Vector representation of parameters.
flat_initial_params, unflatten = flatten(params)

# Specify an objective function for Optim to minimise in terms of x and y.
function objective(flat_params)
    params = value(unflatten(flat_params))
    f = build_gp(params)
    return -logpdf(f(x, params.v), y)
end

# Optimise using Optim.
training_results = Optim.optimize(
    objective,
    θ -> only(Zygote.gradient(objective, θ)),
    flat_initial_params+rand(4), # Add some noise to make learning non-trivial
    Optim.LBFGS(
        alphaguess = Optim.LineSearches.InitialStatic(scaled=true),
        linesearch = Optim.LineSearches.BackTracking(),
    ),
    Optim.Options(show_trace = true);
    inplace=false,
)

# Extracting the final values of the parameters.
final_params = value(unflatten(training_results.minimizer))

# compare data to inference
f = build_gp(value(final_params))
fx = posterior(f(x, final_params.v),y)

plot(x,y)
plot!(x, fx(x, final_params.v))

data_vs_posterior

The problem remains that I can still not infer the length scale r of the periodic kernel. Maybe the problem is related to the Zygote issue with besseli that you opened:

gradient(l->besseli(0, l^-2), .1)

not implemented

As a better workaround than using vcat to construct the vector q as above, we could define a function

getq(l,J) = [ j>0 ? 2*besseli(j, k.l^-2)/exp(k.l^-2) : besseli(0, k.l^-2)/exp(k.l^-2) for j in 0:J]

and implement the analytically derived adjoints as in Arnos implementation in GPStuff. What do you think?

@willtebbutt
Copy link
Member

willtebbutt commented Apr 27, 2021

The problem remains that I can still not infer the length scale r of the periodic kernel. Maybe the problem is related to the Zygote issue with besseli that you opened:

Ahh okay -- this sounds like the kind of problem that I'd expect. We should be getting a fix for this soon.

and implement the analytically derived adjoints as in Arnos implementation in GPStuff. What do you think?

I think this would be an excellent idea for now! Are you familiar with doing this using ChainRules? Should just require an @scalar_rule if I'm not mistaken.

edit: also, sorry for the slow response time!

@andreaskoher
Copy link
Contributor Author

no rush :)

I added the following analytical derivatives:

using ChainRulesCore
q0(l) = besseli(0, l^-2)/exp(l^-2)
δq0_δl(l) = l^-3/exp(l^-2)*(
         2*besseli(0,l^-2) -
         2*besseli(1,l^-2)
)
@scalar_rule(q0(x), δq0_δl(x))

qj(l,j) = 2*exp(-l^-2)*besseli(j,1/l^2)
δqj_δl(l,j) = 
        4besseli(j, 1/l^2) / exp( l^-2 ) / l^3 - 
        2(
            besseli(j-1, 1/l^2) + 
            besseli(j+1, 1/l^2)
        ) / exp( l^-2 ) / l^3
          
@scalar_rule(qj(x,j), (δqj_δl(x,j), DoesNotExist()))

and checked the gradients against ForwardDiff and FiniteDifferences. At this level everything seems fine, however, the following gradient with respect to the length scale r still fails:

const x = RegularSpacing(0., 0.01, 1000)
const y = rand( to_sde( GP( ApproxPeriodicKernel() ) )(x) )
function objective(θ)
    k = ApproxPeriodicKernel( r=θ )
    f = to_sde( GP(k), SArrayStorage(Float64))
    -logpdf(f(x), y)
end
Zygote.gradient(objective, 1.)

no method matching iterate(::Nothing)

As before, it references the same list comprehension in

function TemporalGPs.stationary_distribution(k::ApproxPeriodicKernel{J,T}, s::SArrayStorage{T}) where {J,T<:Real}
    D = 2*(J+1)
    q = [ j>0 ? qj(k.l, j) : q0(k.l) for j in 0:J]
    P = kron(Diagonal(q), Diagonal([1, 1]))
    m = SVector{D, T}( zeros(T,D) )
    Ps = SMatrix{D,D,T}(P)
    return TemporalGPs.Gaussian(m, Ps)
end

I will try and dig into the issue these days

@anhi
Copy link

anhi commented Jun 15, 2022

Hey there,

I was wondering if there is any progress on this issue (and #64). I tried using the above approach with current versions of all involved libraries (changing DoesNotExist() to NoTangent()), but am running into

ERROR: MethodError: no method matching size(::TemporalGPs.LGSSM{TemporalGPs.GaussMarkovModel{TemporalGPs.Forward, FillArrays.Fill{StaticArrays.SMatrix{8, 8, Float64, 64}, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{FillArrays.Zeros{Float64, 1, Tuple{Base.OneTo{Int64}}}, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{StaticArrays.SMatrix{8, 8, Float64, 64}, 1, Tuple{Base.OneTo{Int64}}}, TemporalGPs.Gaussian{StaticArrays.SVector{8, Float64}, StaticArrays.SMatrix{8, 8, Float64, 64}}}, StructArrays.StructVector{TemporalGPs.ScalarOutputLGC{LinearAlgebra.Adjoint{Float64, StaticArrays.SVector{8, Float64}}, Float64, Float64}, NamedTuple{(:A, :a, :Q), Tuple{FillArrays.Fill{LinearAlgebra.Adjoint{Float64, StaticArrays.SVector{8, Float64}}, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float64}}}, Int64}})```

@willtebbutt
Copy link
Member

Hiya. Unfortunately I've not had time to work on this in the recent past, and I'm not entirely sure when I'm going to have the time in the near future.

@theogf
Copy link
Member

theogf commented Mar 15, 2023

I am planning to work on this issue once #90 is finished

@theogf theogf self-assigned this Mar 15, 2023
@theogf theogf closed this as completed May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants