Containers
Monty
simulations operate on three main data types defined by the package, which function as containers for different information. These container types are useful because they abstract away some details when running simulations and because they only allow certain operations to be done on them, which helps enforce correctness. They are described in the sections below in detail. Doing very specific simulations or modifying the package will require some understanding of these container types and their internals.
SamplePlan
The SamplePlan
defines the ideal outcome for a sampling strategy.
As the docstring indicates, a sample plan contains fields for the location, sampling round, time, control status, and coordinates (points) of each point in the sampling plan. These fields are all flat vectors (1-dimensional) and a SamplePlan
functions is much like a table with these vectors in its columns. Because of this, it can easily be converted to a GeoTable
. Also, note that the fields of a sample plan are ReadOnly
. They can't be modified once constructed.
To explain these vectors more fully:
- The
location
field is an integer index that indicates the location status of a sample. This is usually only relevant for paired sample plans when locations are supposed to be revisited. For example, a 2 x 2 grid of points has 4 locations, and each of these locations will be repeated in each sampling round. If there were two sampling rounds, thelocation
vector might look like:[1, 2, 3, 4, 1, 2, 3, 4]
. This means that each location is visited twice, one time in each sampling round. - The
round
vector is also an integer index. It indicates the ordering/timing of the sample. For example, if there were two rounds of sampling on a 2 x 2 grid of points, theround
vector would look like[1, 1, 1, 1, 2, 2, 2, 2]
. - The
time
vector indicates the timing of each sample. Samples in the sameround
will have identical times and there is a one-to-one relationship betweenround
andtime
. - The
control
vector contains booleans (true
/false
) indicating whether samples are controls. Controls do not get feedstock applied to them aftertime = 0
. - The
points
vector contains the two-dimensional coordinates of each point in the plan. These arePoint
types from theMeshes.jl
.
A SamplePlan
can be created by defining all of these fields directly, but this is not generally an efficient way to do it. For example, here is a very simple plan for two cells of a grid, one of which is a control cell
using Monty
using Meshes
# locations are repeated across two rounds of sampling
location = UInt16[1, 2, 1, 2]
# there are two sampling rounds for two locations
round = UInt16[1, 1, 2, 2]
# the times are arbitrary, but there are only two of them for the two sampling rounds
time = [0.0, 0.0, 1.0, 1.0]
# one cell is control, the other is treatment, across both sampling rounds
control = [true, false, true, false]
# the two cell's coordinates are also repeated
points = [Point(1.0, 1.0), Point(1.0, 2.0), Point(1.0, 1.0), Point(1.0, 2.0)]
SamplePlan(location, round, time, control, points)
SamplePlan{Float64)
├─ total samples = 4
├─ sampling rounds = 2
├─ sampling times = [0.0, 1.0]
└─ bounding box
├─ min = Point(1.0, 1.0)
└─ max = Point(1.0, 2.0)
It's usually easier to use the sample planning functions, pairedsampleplan
and randomsampleplan
, to generate a valid sample plan. For example, using a 2 x 2 grid
grid = CartesianGrid(2, 2)
plan = pairedsampleplan(grid .|> centroid, [0.0, 1.0])
SamplePlan{Float64)
├─ total samples = 8
├─ sampling rounds = 2
├─ sampling times = [0.0, 1.0]
└─ bounding box
├─ min = Point(0.5, 0.5)
└─ max = Point(1.5, 1.5)
CoreSet
The CoreSet
is the second container needed for simulations and it's very simple. It's just a wrapper around a two-dimensional array (a matrix) of points. Its definition is just this:
struct CoreSet{𝒯}
points::Matrix{Point{2,𝒯}}
end
The points
field contains the coordinates of each core when a sample plan is executed. Each column of the matrix contains the core locations that will be composited into an individual sample. There should be exactly as many columns of the points
matrix as there are samples in the SamplePlan
for a simulation.
Generally, it should be easiest to allocate a CoreSet
by calling the constructor on an existing SamplePlan
. For example, using the one from above and specifying 10 cores per sample:
coreset = CoreSet(plan, 10)
CoreSet{Float64}
├─ total samples = 8
├─ cores per sample = 10
├─ total cores = 80
└─ bounding box
├─ min = Point(0.0, 0.0)
└─ max = Point(0.0, 0.0)
Initially, the core set is just empty. This is what the internal matrix looks like:
coreset.points
10×8 Matrix{Meshes.Point2}:
(0.0, 0.0) (0.0, 0.0) (0.0, 0.0) … (0.0, 0.0) (0.0, 0.0) (0.0, 0.0)
(0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0)
(0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0)
(0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0)
(0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0)
(0.0, 0.0) (0.0, 0.0) (0.0, 0.0) … (0.0, 0.0) (0.0, 0.0) (0.0, 0.0)
(0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0)
(0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0)
(0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0)
(0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0) (0.0, 0.0)
Simulation
The Simulation
container is the biggest one and it's worth looking at the docstring to see all it's internal fields.
The nsamp
and ncore
fields are just integers indicating the total number of samples and cores in the plan. The a
field is a scalar indicating the cross-sectional area of the soil cores. Most of the other fields are matrices (two-dimensional arrays) of numbers or named tuples of matrices. If the fields apply to multiple analytes, they contain named tuples of matrices where the keys/names indicated the analyte names. The cores
field is a matrix of soil Sample
types and the composites
field is a flat vector for the results of compositing/combining the cores
. The measurements
field is also a vectors, which stores the results of applying random measurement error to the composites
.
Allocating a simulation container is usually easiest with a pre-existing CoreSet
. For example, using the one from above:
sim = Simulation((:Ca, :Mg, :Fe, :Cr), coreset)
Simulation{Float64} with 4 analytes: Ca, Mg, Fe, Cr
├─ total samples = 8
├─ cores per sample = 10
├─ total cores = 80
└─ simulated parameters μ ± σ (rsd) ∈ [max, min]
├─ core cross-sectional area (a)
│ 0.00125
├─ applied feedstock (Q)
│ NaN ± NaN (NaN %) ∈ [NaN, NaN]
├─ feedstock fraction (γ)
│ NaN ± NaN (NaN %) ∈ [NaN, NaN]
├─ core depth (d)
│ NaN ± NaN (NaN %) ∈ [NaN, NaN]
├─ feedstock bulk density (ρf)
│ NaN ± NaN (NaN %) ∈ [NaN, NaN]
├─ feedstock concentrations (cf)
│ Ca: NaN ± NaN (NaN %) ∈ [NaN, NaN]
│ Mg: NaN ± NaN (NaN %) ∈ [NaN, NaN]
│ Fe: NaN ± NaN (NaN %) ∈ [NaN, NaN]
│ Cr: NaN ± NaN (NaN %) ∈ [NaN, NaN]
├─ soil bulk density (ρs)
│ NaN ± NaN (NaN %) ∈ [NaN, NaN]
├─ background soil concentrations (cs)
│ Ca: NaN ± NaN (NaN %) ∈ [NaN, NaN]
│ Mg: NaN ± NaN (NaN %) ∈ [NaN, NaN]
│ Fe: NaN ± NaN (NaN %) ∈ [NaN, NaN]
│ Cr: NaN ± NaN (NaN %) ∈ [NaN, NaN]
├─ feedstock leached fractions (𝓁)
│ Ca: NaN ± NaN (NaN %) ∈ [NaN, NaN]
│ Mg: NaN ± NaN (NaN %) ∈ [NaN, NaN]
│ Fe: NaN ± NaN (NaN %) ∈ [NaN, NaN]
│ Cr: NaN ± NaN (NaN %) ∈ [NaN, NaN]
└─ feedstock mass loss fraction (ℒ)
NaN ± NaN (NaN %) ∈ [NaN, NaN]
The printed summary for a Simulation
is helpful, but intentionally obscures the structure of internal fields. Any of the fields can be accessed directly, however. Here is what the cs
field, which is a named tuple of matrices for soil concentrations, looks like for our :Ca
analyte (calcium). There are 10 rows for the 10 cores in each composite sample and 8 columns for 8 total composite samples:
sim.cs[:Ca]
10×8 Matrix{Float64}:
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
This is what the 𝓁
field (loss fractions) looks like for the :Fe
analyte, also just a matrix full of NaN
s:
sim.𝓁[:Fe]
10×8 Matrix{Float64}:
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
and so on...
All of these fields are initially empty or NaN
. Performing a simulation requries filling in all of these fields, γ
through ℒ
. They represent the mixing and weathering parameters for the mixing model. There are helper functions to fill some of these fields in, like spreading!
, soilconcentration!
, leaching!
, mixing functions like triangularmixing!
, and others. Once all the fields are filled, we call the analyze!
function to automatically fill in cores
, composites
, and measurements
What matters is really how the mixing fields are filled in and this is up to you. They can be filled with a single value everywhere, by drawing from a distribution, by drawing from a GaussianSimulator
or GaussianCosimulator
, or really by any method that aligns with some kind of simulation goal.
The simulation tutorial page goes through one example in detail.