pygeochemtools

A CLI based eclectic set of geochemical data manipulation, QC and plotting tools.

Pygeochemtools is a python library and command line interface tool to enable rapid manipulation, filtering, QC and plotting of geochemical data. It is primarily designed to allow people with limited or no coding experience to deal with very large datasets when programs like Excel will struggle. It is designed to natively load and manipulate the geochemical datasets output by the Geological Survey of South Australia, but will easily handle other datasets with a little bit of configuration in later updates.

Why pygeochemtools

The SA Geodata database (available via the SARIG portal data catalogue here) contains over 10 Gb of geochemical data. That’s a lot of chemistry. Explorers often request extracts of this data set, but then find it a challenge to handle all that data. Because of the size and amount of data, programs like Excel won’t even open the file, and if the extract is small enough to open, explorers often find the format of the data a challenge. Generally, people like to use wide data for analysis, where each row in a table represents all the data about a single sample. But database exports are in a long format where each row represents a single data point.

Pygeochemtools provides an abstraction and cli to make loading, filtering and restructuring this data easy. It uses python libraries like dask and pandas under the hood to be able to deal with ‘larger than memory’ datasets, so you can load and filter those large datasets and then output something easier to handle with Excel or other tools.

Pygeochemtools is not a geochemical data analysis tool. For that I’d suggest tools like pyrolite or checking out the list of other amazing open source geoscience projects compiled by Software Underground.

Functionality

Currently pygeochemtools provides the following functionality:
  • Filter large datasets based on a list of elements, sample type or drillhole numbers (or a combination of all three) and convert from long to wide format.

  • Add detailed geochemical methods columns onto the SARIG geochemical dataset.

  • Extract single element datasets from large geochemical datasets.

  • Plot maximum down hole geochemical data maps.

  • Plot maximum down hole chemistry per interval geochemical data maps.

Note

This project is under active development. Suggestions, corrections and contributions are welcome. See the Contributing section on how to contribute.

Future Functionality

Future additions to pygeochemtools will include:
  • The ability to load and transform generic geochemical data in long format, not specifically the SARIG data structure.

  • A data QA/QC function to generate accuracy and precision metrics on commercial lab data.

  • Other suggestions people may have?

To install pygeochemtools, visit the Getting started page and then have a look at the How to use pygeochemtools page and the Example usage page to see how to use pygeochemtools.

Indices and tables