Skip to content

JuliaReinforcementLearning/ReinforcementLearningTrajectories.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

336 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReinforcementLearningTrajectories

Build Status Coverage PkgEval

Design

The relationship of several concepts provided in this package:

┌───────────────────────────────────┐
│ Trajectory                        │
│ ┌───────────────────────────────┐ │
│ │ EpisodesBuffer wrapping a     | |
| | AbstractTraces                │ │
│ │             ┌───────────────┐ │ │
│ │ :trace_A => │ AbstractTrace │ │ │
│ │             └───────────────┘ │ │
│ │                               │ │
│ │             ┌───────────────┐ │ │
│ │ :trace_B => │ AbstractTrace │ │ │
│ │             └───────────────┘ │ │
│ │  ...             ...          │ │
│ └───────────────────────────────┘ │
│          ┌───────────┐            │
│          │  Sampler  │            │
│          └───────────┘            │
│         ┌────────────┐            │
│         │ Controller │            │
│         └────────────┘            │
└───────────────────────────────────┘

Trajectory

A Trajectory contains 3 parts:

  • A container to store data. (Usually an AbstractTraces)
  • A sampler to determine how to sample a batch from container
  • A controller to decide when to sample a new batch from the container

Typical usage:

julia> t = Trajectory(
               container = Traces(a=Int[], b=Bool[]), 
               sampler = BatchSampler(3), 
               controller = InsertSampleRatioController(1.0, 3, 0, 0)
           );

julia> push!(t, (a=1,));

julia> for i in 1:5
           push!(t, (a=i, b=iseven(i)))
       end

julia> for batch in t
           println(batch)
       end
(a = [1, 3, 1], b = Bool[1, 1, 1])
(a = [4, 1, 4], b = Bool[0, 0, 0])
(a = [1, 4, 1], b = Bool[1, 0, 0])
(a = [1, 1, 4], b = Bool[1, 0, 0])

Traces

  • Traces
  • MultiplexTraces
  • CircularSARTTraces
  • NormalizedTraces

Samplers

  • BatchSampler
  • MetaSampler
  • MultiBatchSampler
  • EpisodesSampler

Controllers

  • InsertSampleRatioController
  • AsyncInsertSampleRatioController

Please refer tests for common usage. (TODO: generate docs and add links to above data structures)

Acknowledgement

This async version is mainly inspired by deepmind/reverb.

About

A generalized experience replay buffer for reinforcement learning

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7