pygeochemtools.geochem¶
Geochemical data manipulation module
- pygeochemtools.geochem.make_sarig_element_dataset(path: str, element: str, dh_only: bool = True, export: bool = False, out_path: Optional[str] = None) pandas.DataFrame[source]¶
Create a ‘clean’ single element drillhole dataset derived from the sarig_rs_chem_exp.csv.
This isolates the selected element from the whole dataset, converts BDL values to a low, non zero value, drops rows that contain other symbols such as ‘>’ and ‘-’ and converts oxides to elements and all values to ppm. It also adds chem methods to the dataset where possible to allow further EDA.
This data is used to create input data for further processing. This function uses dask to handle very large input datasets.
Important note: the the sarig_rs_chem_exp.csv data is in a long format, with each individual analysis as a single row!
This dataset may need additional EDA and cleaning prior to further processing. In that case set export to True to do further processing on the returned dataset.
- Parameters
path (str) – Path to main sarig_rs_chem_exp.csv input file.
element (str) – The element to extract and create a sub-dataset of.
export (bool) – Wether to export a csv version of the element dataset. Defaults to False.
out_path (str, optional) – Path to place out put file. Defaults to path.
- Returns
Dataframe of cleaned geochemical data
- Return type
pd.DataFrame
- pygeochemtools.geochem.sarig_long_to_wide(path: str, elements: Optional[List[str]] = None, sample_type: Optional[List[str]] = None, drillholes: Optional[Union[List[int], bool]] = None, include_units: bool = False, export_methods: bool = False, export: bool = False, out_path: Optional[str] = None) pandas.DataFrame[source]¶
Convert sarig long form data to wide form.
Takes optional list of elements, sample types or drillhole numbers and filters large dataset based on these inputs. Has the option to include or exclude units with the values. Can also export an additional methods file.
It handles duplicate values based on sample_id and element_id by taking the first duplicate value initially, then catching the second duplicate, performing a second pivot, and appengind the duplicates to the final table. It does not handle duplicate duplicates, in which case it will return only the first value.
- Parameters
path (str) – Path to main sarig_rs_chem_exp.csv input file.
elements (Optional[List[str]]) – List of elements to filter dataset to.
sample_type (Optional[List[str]]) – List of sample types to filter dataset to.
drillholes (Optional[Union[List[int], bool]]) – List of drillhole numbers to filter dataset to.
include_units (bool) – Option to include units in the data export. Defaults to False.
export_methods (bool) – Option to include methods file in the data export. Defaults to False.
export (bool) – Option to export data to a csv file. Defaults to False
out_path (Optional[str]) – Optional path to output export file location. Defaults to path.
- Returns
- Dataframe with filtered datapoints converted to a wide form data
structure.
- Return type
pd.DataFrame
pygeochemtools.geochem.aggregation¶
Functions to calculate the max chem value down hole
- pygeochemtools.geochem.aggregation.max_dh_chem(input_data: Union[str, pandas.DataFrame], drillhole_id: str) pandas.DataFrame[source]¶
Function to aggregate the processed elemental geochemical data and return a dataframe containing max value in each drillhole.
Requires long format data.
- Parameters
input_data (Union[str, pd.DataFrame]) – Path to clean and processed single element dataset in csv format or Pandas dataframe of clean and processed single element dataset.
drillhole_id (str) – drillhole identifier in dataset.
- Raises
ValueError – Error raised if input file is not a valid csv file
- Returns
Dataframe containing only the maximum value from each drill hole
- Return type
pd.DataFrame
- pygeochemtools.geochem.aggregation.max_dh_chem_interval(input_data: Union[str, pandas.DataFrame], interval: int, drillhole_id: str, start_depth_label: str, end_depth_label: str) pandas.DataFrame[source]¶
Function to aggregate the processed singel elemental geochemical data and return a dataframe containing max value in each interval down hole for each drillhole.
Requires long format data.
- Parameters
input_data (Union[str, pd.DataFrame]) – Input single element geochemical data, in long form, as either a path to a csv input file or a pandas dataframe.
interval (int) – The interval, in whole meters, overwhich to aggregate down hole.
drillhole_id (str) – Column headder containing the drill hole identifier.
start_depth_label (str) – Column headder containing the start or from depth data.
end_depth_label (str) – Column headder containing the finish or to depth data.
- Raises
ValueError – Error if input file is not a valid csv file
- Returns
- Dataframe continging the maximum value for each specified
interval.
- Return type
pd.DataFrame
pygeochemtools.geochem.conversions¶
Functions to perform conversions on geochem data
- pygeochemtools.geochem.conversions.convert_oxides(df: pandas.DataFrame, element: str, value: str) pandas.DataFrame[source]¶
Convert selected oxides to elements
- Parameters
df (pd.DataFrame) – Input dataframe
element (str) – Oxide to convert. Can be any of: ‘Fe2O3’, ‘FeO’, ‘U3O8’, ‘CoO’, ‘NiO’
value (str) – Name of column containing geochemical data values.
- Returns
Dataframe with oxides converted in place
- Return type
pd.DataFrame
- pygeochemtools.geochem.conversions.convert_ppm(df: pandas.DataFrame, value: str, units: str, convert_wtperc: bool = True) pandas.DataFrame[source]¶
Create new column called ‘converted_ppm’ and converts values to ppm.
- Parameters
df (pd.DataFrame) – Input dataframe
value (str) – Name of column containing geochemical data values.
units (str) – Name of column containing geochemical data units.
convert_wtperc (bool) – Wether to convert wt% to ppm. Defaults to True
- Returns
Dataframe with new ‘converted_ppm’ column
- Return type
pd.DataFrame
pygeochemtools.geochem.create_dataset¶
Functions to load and filter input geochem data
- class pygeochemtools.geochem.create_dataset.LoadAndFilter[source]¶
Bases:
objectClass to load and filter geochem datasets from csv input.
- load_chem_data(path: str) None[source]¶
Not implemented yet. Func to load generic datasets.
- Parameters
path (str) – Path to input csv file.
- load_sarig_data(path: str) None[source]¶
Load data from the sarig_rs_chem_exp.csv dataset.
This function uses dask to handle very large input datasets.
Warning
The the sarig_rs_chem_exp.csv data is in a long format, with each individual analysis as a single row!
- Parameters
path (str) – Path to main sarig_rs_chem_exp.csv input file.
- sarig_filter(sample_type: Optional[List[str]] = None, elements: Optional[List[str]] = None, drillholes: Optional[Union[List[int], bool]] = None) pandas.DataFrame[source]¶
Filter sarig dataset.
Reduce the size of the sarig_rs_chem_exp.csv dataset by filtering samples based on a list of elements, sample types and/or drillhole numbers, or a combination of all three.
- Parameters
sample_type (Optional[List[str]], optional) – List of sample types to include. Defaults to None.
elements (Optional[List[str]], optional) – List of elements to include. Defaults to None.
drillholes (Optional[Union[List[int], bool]], optional) – Either a list of drillhole numbers to filter to, or True to filter dataset to just those samples from drillholes. Defaults to None.
Raises: MemoryError: If filtered dataset is still too large to fit in avaliable memory.
- Returns
- Dataframe containing only those samples belonging to the
listed sample types
- Return type
pd.DataFrame
- sarig_filter_drillhole_element(element: str, dh_only: bool) pandas.DataFrame[source]¶
Create a ‘clean’ single element dataset derived from the sarig_rs_chem_exp.csv.
This isolates samples from drillholes (ones that have a drill hole id) and the selected element from the whole dataset and is used to create input data for further processing.
- Parameters
element (str) – The element to extract and create a sub-dataset of.
dh_only (bool) – Wether to filter to drillholes only or return all sample types.
- Returns
Dataframe filtered to the desired element.
- Return type
pd.DataFrame
- pygeochemtools.geochem.create_dataset.add_sarig_chem_method(df: pandas.DataFrame) pandas.DataFrame[source]¶
Add normalised chem method columns to dataset.
Function to map normalised chem method types onto the SARIG CHEM_METHOD_CODE column. The chem methods provided in the SARIG dataset relate to individual lab codes. This function maps those codes, where known, to a generic analysis method, digestion and fusion type.
This is useful for further EDA and cleaning of data, as some methods are no longer applicable, or contain too much noise.
- Parameters
df (pd.DataFrame) – Input dataframe
- Returns
Dataframe with ‘CHEM_METHODE_CODE mapped to three new columns: ‘DETERMINATION’, ‘DIGESTION’ and ‘FUSION’
- Return type
pd.DataFrame
- pygeochemtools.geochem.create_dataset.clean_dataset(df: pandas.DataFrame, value: str, dash_BDL_indicator: bool = False) pandas.DataFrame[source]¶
Remove non-numeric characters.
Clean non-numeric characters from dataframe and flag below detection limit rows (1), and greater than measurable rows (2) in new BDL column.
- Parameters
df (pd.DataFrame) – Input dataframe to clean.
value (str) – Name of column containing geochemical data values.
dash_BDL_indicator (bool) – Indicator if the ‘-’ sign indicates below detection limits or not. Defaults to False.
- Returns
Cleaned dataframe
- Return type
pd.DataFrame
- pygeochemtools.geochem.create_dataset.handle_BDL(df: pandas.DataFrame, units: str) pandas.DataFrame[source]¶
Convert below detection limit values to low, non-zero values.
Converts below detection limit values, like “<10”, to low numeric ppm values. All BDL units are converted to a value of 0.001ppm except ppb values which are converted to 0.00001ppm.
Note
Requires clean_dataset() function to be run to create the “BDL” flag column first.
- Parameters
df (pd.DataFrame) – Input dataframe to clean.
units (str) – Name of the units column headder in df.
- Returns
- DataFrame with BDL values converted to low ppm values in the
”converted_ppm” column.
- Return type
pd.DataFrame
pygeochemtools.geochem.normalisation¶
Normalisation functions
- pygeochemtools.geochem.normalisation.normalise_crustal_abundace(df: pandas.DataFrame, element: str, ppm_column_name: str) pandas.DataFrame[source]¶
Create a column with ppm element values normalised against average crusta abundance.
Uses the average crustal abundance values of Rudnick and Gao, 2004.
Note
Must be an elemental value and not an oxide. New elements can be added to the config file.
- Parameters
df (pd.DataFrame) – Dataframe containing elemental ppm values to normalise.
element (str) – The element to normalise, to retreive value from config file.
ppm_column_name (str) – Column headder containing the ppm values to normalise.
- Returns
Dataframe with Normalised_crustal_abund_(ppm) column added.
- Return type
pd.DataFrame
pygeochemtools.geochem.transform¶
Data transformation
- pygeochemtools.geochem.transform.long_to_wide(df: pandas.DataFrame, sample_id: str, element_id: str, value: str, units: str, include_units: bool = False) pandas.DataFrame[source]¶
Convert geochemical data tables from long to wide form.
This function takes a dataframe of long form geochemical data, i.e. data with one row per element, and runs a pivot to convert it to a standard wide form data with one row per sample and each element in a separate column.
It handles duplicate values based on sample_id and element_id by taking the first duplicate value initially, then catching the second duplicate, performing a second pivot, and appengind the duplicates to the final table. It does not handle duplicate duplicates, in which case it will return only the first value.
- Parameters
df (pd.DataFrame) – Dataframe containing long form data.
sample_id (str) – Name of column containing sample ID’s.
element_id (str) – Name of column containing geochemical element names.
value (str) – Name of column containing geochemical data values.
units (str) – Name of column containing geochemical data units.
include_units (bool, optional) – Wether to include units in the output. Defaults to False.
- Returns
Dataframe converted to wide table format with one sample per row and columns for each element. Contains only sample_id and element/unit values.
- Return type
pd.DataFrame
- pygeochemtools.geochem.transform.sarig_methods_wide(df: pandas.DataFrame, sample_id: str, element_id: str) pandas.DataFrame[source]¶
Create a corresponding methods table to match the pivoted wide form data.
Note
This requires the input dataframe to already have had methods mapping applied by running
pygeochemtools.geochem.create_dataset.add_sarig_chem_methodfunction.- Parameters
df (pd.DataFrame) – Dataframe containing long form data.
sample_id (str) – Name of column containing sample ID’s.
element_id (str) – Name of column containing geochemical element names.
- Returns
- Dataframe with mapped geochemical methods converted to wide form
with one method per sample.
- Return type
pd.DataFrame