How to use pygeochemtools

Introduction

pygeochemtools is designed to be used in a number of ways. If you’re familiar with Python, the pygeochemtools package can be used just like any other Python library. However, for those who are not used to using Python scripts, or just want to quickly convert a data file or make a map, pygeochemtools is also designed to be used as a simple command line tool (CLI).

pygeochemtools Python api

The pygeochemtools package provides a relatively simple API to perform either the main automated functions, similar to the output of the CLI, or access to the various functions and methods available.

To get started using the pygeochemtools api, you first need to:

import pygeochemtools as pygt

All the pygeochemtools functions and methods return a pandas DataFrame object, so you can incorporate any of the pygeochemtools functionality into your data workflows.

For the complete api, see the API Documentation.

Using the CLI

Using the CLI is a simple method to quickly perform the main high-level functions and transformations in the pygeochemtools kit and output new, filtered and transformed csv files.

Getting started with the CLI

Once you have pygeochemtools properly installed in a virtual environment, and the virtual environment active, the CLI should just work. In your shell of choice (e.g. bash, sh, cmd, powershell, Anaconda Prompt) type:

pygt --help

This will bring up the main help/entry info for the CLI:

Usage: pygt [OPTIONS] COMMAND [ARGS]...

Run pygeochemtools.

An eclectic set of geochemical data manipulation, QC and plotting tools.

Options:
-v, --verbose  Enable verbose output; 1 = less, 4 = more.  [x>=0]
--help         Show this message and exit.

Commands:
show-config                  Display the user configuration.
get-config-path              Display path to user editable config.yml...
edit-config                  Launch default editor to edit user...
version                      Get the library version.
list-columns                 Display the column headers in the loaded...
list-sample-types            Display the sample types listed in the...
list-elements                Display the list of element labels in dataset
convert-long-to-wide         Convert sarig long form data to wide form.
extract-element              Extract single element dataset(s)
plot-max-downhole            Plot maximum downhole geochemical values map
plot-max-downhole-intervals  Plot maximum downhole geochemical values...
The CLI is built around four main functions:
  • setting user configuration

  • displaying dataset metadata

  • transforming data

  • plotting geochemical maps

To find out extra information, or how to use any of the listed commands, simply type the command and the --help option:

~$ pygt extract-element --help
Usage: pygt extract-element [OPTIONS] PATH

  Extract single element dataset(s)

  Requires path to file and element to extract. You can extract multiple
  elements at once by providing multiple element options.

  Will output a file called 'element'_processed.csv to either the current
  working directory or a directory specified with the -o option.

  By selecting --dh_only, will filter dataset to only include samples with a
  drillhole_id.

  Example:

  extract three element datasets from drillholes only from input datafile

  `$ pygt extract-element /test_input.csv -el Au -el Cu -el Fe --dh-only -t
  sarig`

Options:
  -el, --element TEXT     Select one or more elements
  --dh-only               Filter to only drillholes
  -t, --type [sarig|gen]  Select input file structure
  -o, --out-path TEXT     Optional path to place output file, defaults to PATH
  --help                  Show this message and exit.

User configuration

In order to allow for some generalities and greater user control on the command line, pygeochemtools has a user configuration file to pre-set a number of variables. This includes:

  • Setting column header names (This one is important for using different datasets, in the format ‘variable_name’: ‘equivalent name in dataset’)

  • Map place names and locations to annotate map output (latitude, longitude, label in decimal degrees)

  • Map extents (west, east, south, north in decimal degrees)

  • Map projection (EPSG number for your projection from https://epsg.io)

  • Average crustal abundance values for data normalisations

At runtime, pygeochemtools looks for the user_config.yml file, reads the config and applies the values.

Commands:

show-config                  Display the user configuration.
get-config-path              Display path to user editable user_config.yml file.
edit-config                  Launch default editor to edit user_config.yml file.

Warning

If you want to edit the config, it is recommended to duplicate the existing file and renaming it to ensure you have a backup of the original user_config file.

Dataset metadata

These commands are useful to interrogate the labels in the dataset. This is important when trying to filter data, as the labels need to exactly match those in the data.

Commands

list-columns                 Display the column headers in the loaded dataset
list-sample-types            Display the sample types listed in the loaded dataset
list-elements                Display the list of element labels in dataset

Example usage

~$ pygt list-columns --help

Usage: pygt list-columns [OPTIONS] PATH

Display the column headers in the loaded dataset

Options:
-t, --type [sarig|general]  Select input file structure
--help                      Show this message and exit.
~$ pygt list-columns -t sarig ~/geochem_data/test_input.csv
Dataset structure set to sarig
Data loaded
Index(['SAMPLE_NO', 'SAMPLE_SOURCE_CODE', 'SAMPLE_SOURCE',
    'ROCK_GROUP_CODE', 'ROCK_GROUP', 'LITHO_CODE', 'LITHO_CONF',
    'LITHOLOGY_NAME', 'LITHO_MODIFIER', 'MAP_SYMBOL', 'STRAT_CONF',
    'STRAT_NAME', 'COLLECTED_BY', 'COLLECTORS_NUMBER', 'COLLECTED_DATE',
    'DRILLHOLE_NUMBER', 'DH_NAME', 'DH_DEPTH_FROM', 'DH_DEPTH_TO',
    'SITE_NO', 'EASTING_GDA2020', 'NORTHING_GDA2020', 'ZONE_GDA2020',
    'LONGITUDE_GDA2020', 'LATITUDE_GDA2020', 'LONGITUDE_GDA94',
    'LATITUDE_GDA94', 'SAMPLE_ANALYSIS_NO', 'OTHER_ANALYSIS_ID',
    'ANALYSIS_TYPE_DESC', 'LABORATORY', 'CHEM_CODE', 'VALUE', 'UNIT',
    'CHEM_METHOD_CODE', 'CHEM_METHOD_DESC'],
    dtype='object')
    (

Transforming data

There are two tools available for data filtering and transformations:

Convert SARIG dataset from long to wide

~$ pygt convert-long-to-wide --help
Usage: pygt convert-long-to-wide [OPTIONS] PATH

Convert sarig long form data to wide form.

Requires path to sarig_rs_chem_exp file. You can filter this dataset down to
a manageable size either by providing a list of elements, sample types or
drillhole numbers, or a combination of the three.

Options:
-el, --elements TEXT     Enter a list of elements to filter to, or nothing
-st, --sample-type TEXT  Enter a list of sample types to filter to, or
                        nothing
-dh, --drillholes TEXT   Enter a list of drillhole numbers to filter to, or
                        Nothing
--dh-only                Filter to only drillholes
--add-units              Include chem units
--add-methods            Export method metadata for the filtered samples
-o, --out-path TEXT      Optional path to place output file, defaults to
                        PATH
--help                   Show this message and exit.

Extract single element datasets

~$ pygt extract-element --help
Usage: pygt extract-element [OPTIONS] PATH

Extract single element dataset(s)

Requires path to file and element to extract. You can extract multiple
elements at once by providing multiple element options.

Options:
-el, --element TEXT     Select one or more elements
--dh-only               Filter to only drillholes
-t, --type [sarig|gen]  Select input file structure
-o, --out-path TEXT     Optional path to place output file, defaults to PATH
--help                  Show this message and exit.

Both of these commands will output a new data file to either the current working directory (default) or a specified directory location.

Plotting geochemical maps

Once you have a single element dataset(s) available, pygeochemtools will allow you to generate either a point plot or interpolated gridded map displaying either the maximum down hole chemical values, or the maximum values over a selected interval.

~$ pygt plot-max-downhole --help
Usage: pygt plot-max-downhole [OPTIONS] PATH ELEMENT

Plot maximum downhole geochemical values map

Requires path to extracted single element data file and element to plot.

Options:
-t, --plot-type [point|interpolate]
                                Select map type
-s, --scale TEXT                Select either log-scale (default) or set to
                                False for linear scale
-o, --out-path TEXT             Optional path to place output file, defaults
                                to current working directory
--add-inset                     Optional flag to add inset map with
                                drillhole locations
--help                          Show this message and exit.
~$ pygt plot-max-downhole-intervals --help
Usage: pygt plot-max-downhole-intervals [OPTIONS] PATH ELEMENT INTERVAL

Plot maximum downhole geochemical values map for each interval

Requires path to extracted single element data file, element and interval.
The interval should be in whole meters as an integer.

Options:
-t, --plot-type [point|interpolate]
                                Select map type
-s, --scale TEXT                Select either log-scale (default) or set to
                                False for linear scale
-o, --out-path TEXT             Optional path to place output file, defaults
                                to current working directory
--add-inset                     Optional flag to add inset map with
                                drillhole locations
--help                          Show this message and exit.

Both of these map creation commands will output maps, as jpg files, to either the current working directory, or another specified directory location.

Note

For more detailed instructions and examples, run pygt [COMMAND] --help on the CLI and visit the Example usage page.