Alexandra K. Diem

Personal Website.
No other sport lives "Work hard, play hard" quite like randonnée does #hemsedal #nibbi #skitur #topptur #randonnée #backcountryski #powder #utpåtur #utno #fjelljenter #turjenter #nowaynorway Nibbiiii ❄️⛷️ #hemsedal #nibbi #skitur #topptur #backcountryski #powder #utpåtur #utno #fjelljenter #turjenter #nowaynorway Lava caves full of pahoehoe, a'a lava, lava nipples, turds and dykes. Yes, these are all official scientific volcanological terms. Yes, volcanology is a very much male dominated field 🇮🇸 #iceland #vulcano #volcanology #science #lava #lavacave This beach reminds me so much of Bunes Beach in Lofoten ❄️🇮🇸 #snæfellsnes #snæfellsnesjökull #iceland  #beach #blacksand

Managing simulation runs using pandas

At work I run a lot of simulations of the same code, but using slightly different parameters. Sometimes, simulation data can take up quite a bit of space, so often I want to store those data somewhere other than my laptop. I try my best to use sensible folder names, but often, six or so months later, I have to look into several folders to figure out which one contains the data I am looking for. The Python library pandas provides a much more elegant solution to this problem, that only requires me to store meta data in a text file, which I can easily keep on my laptop, that will point me to the folder in question.

The solution is based on the storing simulation parameters in a .cfg file. These look something like this:

id = simulation0
solver = direct
debug = 0

N = 2
TOL = 1e-7
rho = 1000 * kg/m**3
K = 1e-7 * m**2/Pa/s
phi = 0.1
beta = 1
qi = 0
qo = 0
tf = 0.5 * s
dt = 0.1 * s
theta = 0.5

Using pandas, we can create scripts that automatically filter this type of meta data for certain parameter values, so that we can quickly figure out, which name we gave to our data folder. This means that now it doesn’t matter anymore what we call our data folders and we can automate the naming process by for example using random numbers (or just simply count from 0).

We need to import the following libraries into Python:

import pandas
import glob
from configparser import ConfigParser

Glob returns a list of all paths fitting a pattern,

files = glob.glob("./data/*.cfg")

such that the output looks similar to this:


We initialise a ConfigParser to read the .cfg files and tell it which section(s) we are interested in:

config = ConfigParser()
config.optionxform = str
sections = ['Simulation', 'Parameter']

Create a dictionary of dictionaries holding all simulation parameters from the files. The truncated file name serves as the key for each parameter dictionary d in data.

data = {}
for file in files:
    d = {}
    for section in sections:
        options = config.items(section)
        for key, value in options:
            d[key] = value
    fname = file.split("/")[-1]
    data[fname] = d

Create a pandas table from the dictionary data

tab = pandas.DataFrame.from_dict(data, orient="index")

Now we can look at the values of a parameter for each file in our table:

simulation0.cfg    1e-7
simulation1.cfg    1e-7
simulation2.cfg    1e-7
simulation3.cfg    1e-7
Name: TOL, dtype: object
simulation0.cfg      1
simulation1.cfg      1
simulation2.cfg    0.1
simulation3.cfg      1
Name: beta, dtype: object
simulation0.cfg      0
simulation1.cfg      0
simulation2.cfg      0
simulation3.cfg    0.1
Name: qi, dtype: object

We can also filter by parameter values

tab[tab.beta == '0.1']
N TOL rho K phi beta qi qo tf dt theta
simulation2.cfg 2 1e-7 1000 * kg/m**3 1e-7 * m**2/Pa/s 0.1 0.1 0 0 0.5 * s 0.1 * s 0.5
tab[tab.qi == '0']
N TOL rho K phi beta qi qo tf dt theta
simulation0.cfg 2 1e-7 1000 * kg/m**3 1e-7 * m**2/Pa/s 0.1 1 0 0 0.5 * s 0.1 * s 0.5
simulation1.cfg 2 1e-7 1000 * kg/m**3 1e-7 * m**2/Pa/s 0.2 1 0 0 0.5 * s 0.1 * s 0.5
simulation2.cfg 2 1e-7 1000 * kg/m**3 1e-7 * m**2/Pa/s 0.1 0.1 0 0 0.5 * s 0.1 * s 0.5