Quick start
First, lets just run through some examples to see where we are going by simulating a simple example population which we observe as a survey. Let’s say we are in a giant sphere surrounded by fire flies that fill the volume homogeneously. Furthermore, the light they emit follows a Pareto distribution (power law) in luminosity. Of course, this population can be anything; active galactic nuclei (AGN), gamma-ray bursts (GRBs), etc. The framework provided in popsynth is intended to be generic.
[1]:
%matplotlib inline
import matplotlib.pyplot as plt
from jupyterthemes import jtplot
jtplot.style(context="notebook", fscale=1, grid=False)
purple = "#B833FF"
yellow = "#F6EF5B"
import popsynth
popsynth.update_logging_level("INFO")
import networkx as nx
import numpy as np
import warnings
warnings.simplefilter("ignore")
A spherically homogenous population of fire flies with a pareto luminosity function
popsynth comes with several types of populations included, though you can easily construct your own. To access the built in population synthesizers, one simply instantiates the population from the popsynth.populations module. Here, we will simulate a survey that has a homogenous spherical spatial distribution and a pareto distributed luminosity.
[3]:
homogeneous_pareto_synth = popsynth.populations.ParetoHomogeneousSphericalPopulation(
Lambda=5, Lmin=1, alpha=2.0 # the density normalization # lower bound on the LF
) # index of the LF
print(homogeneous_pareto_synth)
Luminosity Function
pareto
\frac{\alpha L_{\rm min}^{\alpha}}{L^{\alpha+1}}
Lmin: 1
alpha: 2.0
Spatial Function
cons_sphere
\Lambda
Lambda: 5
r_max: 5
[4]:
homogeneous_pareto_synth.display()
Luminosity Function
parameter | value | |
---|---|---|
0 | Lmin | 1.0 |
1 | alpha | 2.0 |
Spatial Function
parameter | value | |
---|---|---|
0 | Lambda | 5 |
1 | r_max | 5 |
If you have networkx and graviz, you can plot a graph of the connections.
[5]:
# we can also display a graph of the object
options = {"node_color": purple, "node_size": 3000, "width": 0.5}
pos = nx.drawing.nx_agraph.graphviz_layout(homogeneous_pareto_synth.graph, prog="dot")
nx.draw(homogeneous_pareto_synth.graph, with_labels=True, pos=pos, **options)
Creating a survey
We can now sample from this population with the draw_survey function, but first we need specify how the flux is selected by adding a flux selection function. Here, we will use a hard selection function in this example, but you can make your own. The selection function will mark objects with observed fluxes below the selection boundary as “hidden”, but we will still have access to them in our population.
[6]:
flux_selector = popsynth.HardFluxSelection()
flux_selector.boundary = 1e-2
homogeneous_pareto_synth.set_flux_selection(flux_selector)
And by observed fluxes, we mean those where the latent flux is obscured by observational error, here we sample the observational error from a log normal distribution with \(\sigma=1\). In the future, popsynth
will have more options.
[7]:
population = homogeneous_pareto_synth.draw_survey(flux_sigma=0.1)
INFO | The volume integral is 2617.9938779914946
INFO | Expecting 2567 total objects
INFO | applying selection to fluxes
INFO | Detected 573 distances
INFO | Detected 573 objects out to a distance of 4.99
We now have created a population survey. How did we get here?
Once the spatial and luminosity functions are specified, we can integrate out to a given distance and compute the number of expected objects.
A Poisson draw with this mean is made to determine the number of total objects in the survey.
Next all quantities are sampled (distance, luminosity)
If needed, the luminosity is converted to a flux with a given observational error
The selection function (in this case a hard cutoff) is applied
A population object is created
We could have specified a soft cutoff (an inverse logit) with logarithmic with as well:
[8]:
homogeneous_pareto_synth.clean()
flux_selector = popsynth.SoftFluxSelection()
flux_selector.boundary = 1e-2
flux_selector.strength = 20
homogeneous_pareto_synth.set_flux_selection(flux_selector)
population = homogeneous_pareto_synth.draw_survey(flux_sigma=0.1)
WARNING | removing all registered Auxiliary Samplers
WARNING | removing flux selector
WARNING | removing distance selector
WARNING | removing spatial selector
INFO | The volume integral is 2617.9938779914946
INFO | Expecting 2567 total objects
INFO | applying selection to fluxes
INFO | Detected 609 distances
INFO | Detected 609 objects out to a distance of 4.99
More detail on the process behind the simulation can be found deeper in the documentation
The Population Object
The population object stores all the information about the sampled survey. This includes information on the latent parameters, measured parameters, and distances for both the selected and non-selected objects.
We can have a look at the flux-distance distribution from the survey. Here, yellow dots are the latent flux value, i.e., without observational noise, and purple dots are the measured values for theselected* objects. Arrows point from the latent to measured values.
[9]:
fig = population.display_fluxes(obs_color=purple, true_color=yellow)
For fun, we can display the fluxes on in a simulated universe in 3D
[10]:
fig = population.display_obs_fluxes_sphere(background_color="black")
The population object stores a lot of information. For example, an array of selection booleans:
[11]:
population.selection
[11]:
array([False, False, False, ..., False, False, False])
Every variable that is simulated is stored as a SimulatedVariable
object. This object acts as a normal numpy array with a few special properties. The main view of the array are the observed values. These are values that can be obscured from the latent values by measurement error.
Let’s examine the observed flux.
[12]:
# the observed values
population.fluxes
# the latent values
population.fluxes.latent
[12]:
array([0.00407874, 0.00411367, 0.00478737, ..., 0.00253392, 0.00445139,
0.00333201])
Any math operation on the a simulated variable is applied to the latent values as well. Thus, it is easy to manipulate both in one operation. When a variable is not marked as observed, this implies that the latent (true) values and the observed values will be the same. Thus, the values stored will be the same.
To access the selected or non-selected values of a variable:
[13]:
selected_fluxes = population.fluxes.selected
non_selected_fluxes = population.fluxes.non_selected
This returns another SimulatedVaraible
object which allows access to the latent and observed values as well.
We can retrieve selected and non-selected distances:
[14]:
distances = population.distances.selected
[15]:
hidden_distances = population.distances.non_selected
[16]:
fig, ax = plt.subplots()
bins = np.linspace(0, 6, 20)
ax.hist(hidden_distances, bins=bins, fc=yellow, ec="k",lw=1)
ax.hist(distances, bins=bins, fc=purple, ec="k",lw=1)
ax.set_xlabel("z")
[16]:
Text(0.5, 0, 'z')
Saving the population
We can record the results of a population synth to an HDF5 file that maintains all the information from the run. The true values of the population parameters are always stored in the truth dictionary:
[17]:
population.truth
[17]:
{'cons_sphere': {'Lambda': 5, 'r_max': 5}, 'pareto': {'Lmin': 1, 'alpha': 2.0}}
[18]:
population.writeto("saved_pop.h5")
[19]:
reloaded_population = popsynth.Population.from_file("saved_pop.h5")
[20]:
reloaded_population.truth
[20]:
{'cons_sphere': {'Lambda': 5, 'r_max': 5}, 'pareto': {'Lmin': 1, 'alpha': 2.0}}
Creating populations from YAML files
It is sometimes easier to quickly write down population in a YAML file without having to create all the objects in python. Let’s a take a look at the format:
# the seed
seed: 1234
# specifiy the luminosity distribution
# and it's parmeters
luminosity distribution:
ParetoDistribution:
Lmin: 1e51
alpha: 2
# specifiy the flux selection function
# and it's parmeters
flux selection:
HardFluxSelection:
boundary: 1e-6
# specifiy the spatial distribution
# and it's parmeters
spatial distribution:
ZPowerCosmoDistribution:
Lambda: .5
delta: -2
r_max: 5
# specify the distance selection function
# and it's parmeters
distance selection:
BernoulliSelection:
probability: 0.5
# a spatial selection if needed
spatial selection:
# None
# all the auxiliary functions
# these must be known to the
# registry at run time if
# the are custom!
auxiliary samplers:
stellar_mass
type: NormalAuxSampler
observed: False
mu: 0
sigma: 1
selection:
secondary:
init variables:
demo:
type: DemoSampler
observed: False
selection:
UpperBound:
boundary: 20
demo2:
type: DemoSampler2
observed: True
selection:
secondary: [demo, stellar_mass] # other samplers that this sampler depends on
We can load this yaml file into a population synth. We use a saved file to demonstrate:
[21]:
my_file = popsynth.utils.package_data.get_path_of_data_file("pop.yml")
ps = popsynth.PopulationSynth.from_file(my_file)
print(ps)
INFO | registering derived luminosity sampler: demo2
Luminosity Function
demo2
observed: True
demo
stellar_mass
Spatial Function
zpow_cosmo
\Lambda (z+1)^{\delta}
Lambda: 0.5
delta: -2.0
r_max: 5.0
demo
observed: False
parents: ['demo2']
stellar_mass
observed: False
mu: 0.0
sigma: 1.0
parents: ['demo2']
[22]:
ps.display()
Luminosity Function
parameter | value |
---|
Spatial Function
parameter | value | |
---|---|---|
0 | Lambda | 0.5 |
1 | delta | -2.0 |
2 | r_max | 5.0 |
demo
parameter | value |
---|
stellar_mass
parameter | value | |
---|---|---|
0 | mu | 0.0 |
1 | sigma | 1.0 |
[23]:
options = {"node_color": purple, "node_size": 3000, "width": 0.5}
pos = nx.drawing.nx_agraph.graphviz_layout(ps.graph, prog="dot")
nx.draw(ps.graph, with_labels=True, pos=pos, **options)
We can see that our population was created correctly for us.
Now, this means we can easily pass populations around to our collaborators for testing
[24]:
pop = ps.draw_survey(flux_sigma=0.5)
INFO | The volume integral is 3.5702831713758014
INFO | Expecting 5 total objects
INFO | Sampling: demo2
INFO | demo2 is sampling its secondary quantities
INFO | Sampling: demo
INFO | Sampling: stellar_mass
INFO | Getting luminosity from derived sampler
INFO | applying selection to fluxes
INFO | Applying selection from demo which selected 5 of 5 objects
INFO | Before auxiliary selection there were 5 objects selected
WARNING | NO HIDDEN OBJECTS
INFO | Detected 3 distances
INFO | Detected 5 objects out to a distance of 3.37
Now, since we can read the population synth from a file, we can also write one we have created with classes to a file:
[25]:
ps.to_dict()
[25]:
{'seed': 1234,
'spatial distribution': {'ZPowerCosmoDistribution': {'Lambda': 0.5,
'delta': -2.0,
'r_max': 5.0},
'is_rate': True},
'luminosity distribution': {'ParetoDistribution': {'Lmin': 1e+51,
'alpha': 2.0}},
'flux selection': {'HardFluxSelection': {'boundary': 1e-06}},
'distance selection': {'BernoulliSelection': {'probability': 0.5}},
'auxiliary samplers': {}}
[26]:
ps.write_to("/tmp/my_pop_synth.yml")
but our population synth is also serialized to our population!
[27]:
pop.pop_synth
[27]:
{'seed': 1234,
'spatial distribution': {'ZPowerCosmoDistribution': {'Lambda': 0.5,
'delta': -2.0,
'r_max': 5.0},
'is_rate': True},
'luminosity distribution': {'ParetoDistribution': {'Lmin': 1e+51,
'alpha': 2.0}},
'flux selection': {'HardFluxSelection': {'boundary': 1e-06}},
'distance selection': {'BernoulliSelection': {'probability': 0.5}},
'auxiliary samplers': {}}
Therefore we always know exactly how we simulated our data.