Containers

Monty simulations operate on three main data types defined by the package, which function as containers for different information. These container types are useful because they abstract away some details when running simulations and because they only allow certain operations to be done on them, which helps enforce correctness. They are described in the sections below in detail. Doing very specific simulations or modifying the package will require some understanding of these container types and their internals.

SamplePlan

The SamplePlan defines the ideal outcome for a sampling strategy.

As the docstring indicates, a sample plan contains fields for the location, sampling round, time, control status, and coordinates (points) of each point in the sampling plan. These fields are all flat vectors (1-dimensional) and a SamplePlan functions is much like a table with these vectors in its columns. Because of this, it can easily be converted to a GeoTable. Also, note that the fields of a sample plan are ReadOnly. They can't be modified once constructed.

To explain these vectors more fully:

  • The location field is an integer index that indicates the location status of a sample. This is usually only relevant for paired sample plans when locations are supposed to be revisited. For example, a 2 x 2 grid of points has 4 locations, and each of these locations will be repeated in each sampling round. If there were two sampling rounds, the location vector might look like: [1, 2, 3, 4, 1, 2, 3, 4]. This means that each location is visited twice, one time in each sampling round.
  • The round vector is also an integer index. It indicates the ordering/timing of the sample. For example, if there were two rounds of sampling on a 2 x 2 grid of points, the round vector would look like [1, 1, 1, 1, 2, 2, 2, 2].
  • The time vector indicates the timing of each sample. Samples in the same round will have identical times and there is a one-to-one relationship between round and time.
  • The control vector contains booleans (true/false) indicating whether samples are controls. Controls do not get feedstock applied to them after time = 0.
  • The points vector contains the two-dimensional coordinates of each point in the plan. These are Point types from the Meshes.jl.

A SamplePlan can be created by defining all of these fields directly, but this is not generally an efficient way to do it. For example, here is a very simple plan for two cells of a grid, one of which is a control cell

using Monty
using Meshes

# locations are repeated across two rounds of sampling
location = UInt16[1, 2, 1, 2]
# there are two sampling rounds for two locations
round = UInt16[1, 1, 2, 2]
# the times are arbitrary, but there are only two of them for the two sampling rounds
time = [0.0, 0.0, 1.0, 1.0]
# one cell is control, the other is treatment, across both sampling rounds
control = [true, false, true, false]
# the two cell's coordinates are also repeated
points = [Point(1.0, 1.0), Point(1.0, 2.0), Point(1.0, 1.0), Point(1.0, 2.0)]

SamplePlan(location, round, time, control, points)
SamplePlan{Float64)
├─ total samples = 4
├─ sampling rounds = 2
├─ sampling times = [0.0, 1.0]
└─ bounding box
  ├─ min = Point(1.0, 1.0)
  └─ max = Point(1.0, 2.0)

It's usually easier to use the sample planning functions, pairedsampleplan and randomsampleplan, to generate a valid sample plan. For example, using a 2 x 2 grid

grid = CartesianGrid(2, 2)
plan = pairedsampleplan(grid .|> centroid, [0.0, 1.0])
SamplePlan{Float64)
├─ total samples = 8
├─ sampling rounds = 2
├─ sampling times = [0.0, 1.0]
└─ bounding box
  ├─ min = Point(0.5, 0.5)
  └─ max = Point(1.5, 1.5)

CoreSet

The CoreSet is the second container needed for simulations and it's very simple. It's just a wrapper around a two-dimensional array (a matrix) of points. Its definition is just this:

struct CoreSet{𝒯}
    points::Matrix{Point{2,𝒯}}
end

The points field contains the coordinates of each core when a sample plan is executed. Each column of the matrix contains the core locations that will be composited into an individual sample. There should be exactly as many columns of the points matrix as there are samples in the SamplePlan for a simulation.

Generally, it should be easiest to allocate a CoreSet by calling the constructor on an existing SamplePlan. For example, using the one from above and specifying 10 cores per sample:

coreset = CoreSet(plan, 10)
CoreSet{Float64}
├─ total samples = 8
├─ cores per sample = 10
├─ total cores = 80
└─ bounding box
  ├─ min = Point(0.0, 0.0)
  └─ max = Point(0.0, 0.0)

Initially, the core set is just empty. This is what the internal matrix looks like:

coreset.points
10×8 Matrix{Meshes.Point2}:
 (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)  …  (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)
 (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)     (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)
 (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)     (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)
 (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)     (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)
 (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)     (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)
 (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)  …  (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)
 (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)     (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)
 (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)     (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)
 (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)     (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)
 (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)     (0.0, 0.0)  (0.0, 0.0)  (0.0, 0.0)

Simulation

The Simulation container is the biggest one and it's worth looking at the docstring to see all it's internal fields.

The nsamp and ncore fields are just integers indicating the total number of samples and cores in the plan. The a field is a scalar indicating the cross-sectional area of the soil cores. Most of the other fields are matrices (two-dimensional arrays) of numbers or named tuples of matrices. If the fields apply to multiple analytes, they contain named tuples of matrices where the keys/names indicated the analyte names. The cores field is a matrix of soil Sample types and the composites field is a flat vector for the results of compositing/combining the cores. The measurements field is also a vectors, which stores the results of applying random measurement error to the composites.

Allocating a simulation container is usually easiest with a pre-existing CoreSet. For example, using the one from above:

sim = Simulation((:Ca, :Mg, :Fe, :Cr), coreset)
Simulation{Float64} with 4 analytes: Ca, Mg, Fe, Cr
├─ total samples = 8
├─ cores per sample = 10
├─ total cores = 80
└─ simulated parameters μ ± σ (rsd) ∈ [max, min]
  ├─ core cross-sectional area (a)
  │    0.00125
  ├─ applied feedstock (Q)
  │    NaN ± NaN (NaN %) ∈ [NaN, NaN]
  ├─ feedstock fraction (γ)
  │    NaN ± NaN (NaN %) ∈ [NaN, NaN]
  ├─ core depth (d)
  │    NaN ± NaN (NaN %) ∈ [NaN, NaN]
  ├─ feedstock bulk density (ρf)
  │    NaN ± NaN (NaN %) ∈ [NaN, NaN]
  ├─ feedstock concentrations (cf)
  │    Ca: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  │    Mg: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  │    Fe: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  │    Cr: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  ├─ soil bulk density (ρs)
  │    NaN ± NaN (NaN %) ∈ [NaN, NaN]
  ├─ background soil concentrations (cs)
  │    Ca: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  │    Mg: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  │    Fe: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  │    Cr: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  ├─ feedstock leached fractions (𝓁)
  │    Ca: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  │    Mg: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  │    Fe: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  │    Cr: NaN ± NaN (NaN %) ∈ [NaN, NaN]
  └─ feedstock mass loss fraction (ℒ)
       NaN ± NaN (NaN %) ∈ [NaN, NaN]

The printed summary for a Simulation is helpful, but intentionally obscures the structure of internal fields. Any of the fields can be accessed directly, however. Here is what the cs field, which is a named tuple of matrices for soil concentrations, looks like for our :Ca analyte (calcium). There are 10 rows for the 10 cores in each composite sample and 8 columns for 8 total composite samples:

sim.cs[:Ca]
10×8 Matrix{Float64}:
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

This is what the 𝓁 field (loss fractions) looks like for the :Fe analyte, also just a matrix full of NaNs:

sim.𝓁[:Fe]
10×8 Matrix{Float64}:
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

and so on...

All of these fields are initially empty or NaN. Performing a simulation requries filling in all of these fields, γ through . They represent the mixing and weathering parameters for the mixing model. There are helper functions to fill some of these fields in, like spreading!, soilconcentration!, leaching!, mixing functions like triangularmixing!, and others. Once all the fields are filled, we call the analyze! function to automatically fill in cores, composites, and measurements

What matters is really how the mixing fields are filled in and this is up to you. They can be filled with a single value everywhere, by drawing from a distribution, by drawing from a GaussianSimulator or GaussianCosimulator, or really by any method that aligns with some kind of simulation goal.

The simulation tutorial page goes through one example in detail.