Preprocessing

This section contains documentation for the Preprocessing module. NOTE: Those methods will be modified in the future to be more flexible and to allow for more complex workflows.

Data Management

stoneforge.preprocessing.data_management.depth_zones(df, dept, ranges)[source]

Given a DataFrame and a depth column, this function creates zones based on the specified depth ranges.

Parameters:

df (pd.DataFrame) – A pandas DataFrame containing the depth data.
dept (str) – The name of the depth column in the DataFrame.
ranges (tuple) – A tuple containing the depth ranges to create zones. The first element is the top depth, the last element is the bottom depth, and the middle elements are the range boundaries.

Returns:

A dictionary where keys are zone indices and values are DataFrames containing the data for each zone.

Return type:

dict

Example

>>> df = pd.DataFrame({'Depth': [100, 200, 300, 400, 500], 'Value': [1, 2, 3, 4, 5]})
>>> dept = 'Depth'
>>> ranges = (150, 250, 350)
>>> zones = depth_zones(df, dept, ranges)
>>> for zone, data in zones.items():
...     print(f"Zone {zone}:")
...     print(data)
>>> # Output:
>>> # Zone 0:
>>> #    Depth  Value
>>> # 0    100      1
>>> # Zone 1:
>>> #    Depth  Value
>>> # 1    200      2

class stoneforge.preprocessing.data_management.project(data_path='.')[source]

Bases: object

Creates a project object to manage well log data.

Example

>>> proj = project(data_path='path/to/well/logs')
>>> proj.import_folder(ext='.las')  # Import all LAS files in the folder
>>> print(proj.well_names_paths)  # Check imported well names and paths

>>> proj.import_several_wells()  # Import all wells data in a given folder into the project

class_counts(class_value, class_dict=False, seed=99)[source]

Counts the occurrences of each class value in a list and returns a dictionary with class names, random colors, and counts. (for fast plot)

Parameters:

class_value (list) – A list of class values to count.
class_dict (dict, optional) – A dictionary containing class codes, names, and colors for substitution. If not provided, random colors will be generated.
seed (int, optional) – A seed for random number generation to ensure reproducibility. Default is 99.

Example

>>> class_values = [57, 54, 25, 49]
>>> class_dict = [
...     {"code": 57, "name": "Sand", "patch_property": {"color": "#FF0000"}},
...     {"code": 54, "name": "Shale", "patch_property   : {"color": "#00FF00"}},
...     {"code": 25, "name": "Coal", "patch_property": {"color": "#0000FF"}},
...     {"code": 49, "name": "Limestone", "patch_property": {"color": "#FFFF00"}}

convert_into_matrix(reference_mnemonics=False)[source]

Converts an manly dictionary database into an matrix database with tree values: mnemonics, units and data.

Parameters:: reference_mnemonics (list, optional) – A list of mnemonics to be used as a reference for the well data. (If not provided, all mnemonics in the well data will be used).

Example

>>> proj.convert_into_matrix(reference_mnemonics=['RHOB', 'NPHI', 'GR'])

data_replacement(ref)[source]

Replaces mnemonics in the well data with those from a reference dictionary.

Parameters:: ref (dict) – A dictionary where keys are new mnemonics and values are lists of old mnemonics to be replaced.
Return type:: None

Example

>>> ref = {
...     'RHOB': ['RHO', 'RHOZ'],   # New mnemonic 'RHOB' replaces 'RHO' and 'RHOZ'
...     'NPHI': ['PHI', 'PHIN']    # New mnemonic 'NPHI' replaces 'PHI' and 'PHIN'
... }
>>> proj.data_replacement(ref)

import_folder(ext='.las')[source]

Imports all file paths with a given extension from a folder into the project.

Parameters:: ext (str, optional) – The file extension to look for in the folder. Default is ‘.las’.
Return type:: None

Example

>>> proj.import_folder(ext='.las')  # Import all LAS files in the folder
>>> print(proj.well_names_paths)  # Check imported well names and paths

import_several_wells()[source]

Imports all well log data from the specified folder into the project.

Example

>>> proj.import_several_wells()

import_well(name)[source]

Imports a single well log data from a specified file path into the project.

Parameters:: name (str) – The name of the well log data file (without extension) to be imported.
Return type:: None

Example

>>> proj.import_well(name='well1')  # Import well log data from 'well1.las'

shape_check(ref)[source]

If an well has less mnemonics than the others, than this function removes this well.

Parameters:: ref (dict) – A dictionary where keys are new mnemonics and values are lists of old mnemonics to be replaced.
Return type:: None

Example

>>> ref = {
...     'RHOB': ['RHO', 'RHOZ'],
...     'NPHI': ['PHI', 'PHIN']
... }

>>> proj.shape_check(ref) # Removes wells with less mnemonics than the reference dictionary.

Data Processing

stoneforge.preprocessing.data_processing.data_assemble(main_data, data_key)[source]

Transform a dictionary of dict[wells][‘data_key’][data] into a dictionary of compact data like dict[data], mostly used for machine learning purpose.

Parameters:

main_data (dict) – A dictionary containing well data, where keys are well names and values are dictionaries with data.
data_key (str) – The key for the data in the well data dictionary.

Returns:

A numpy array containing the assembled data from all wells, with each row corresponding to a data point from each well.

Return type:

np.array

Example

>>> main_data = {
...     'Well1': {'data_key': [[1, 2], [3, 4]]},
...     'Well2': {'data_key': [[5, 6], [7, 8]]}
... }
>>> data_key = 'data_key'
>>> mega_data = data_assemble(main_data, data_key)

class stoneforge.preprocessing.data_processing.predict_processing(data, data_key)[source]

Bases: object

Processes the data for machine learning predictions. This class handles the preparation of data for machine learning predictions, including handling NaN values and splitting data into training and testing sets.

Example

>>> pp = predict_processing(data, data_key='data')
>>> clean_data = pp.matrix_values() # Returns a dictionary of cleaned data without NaN values.
>>> curves = pp.return_curve(y) # Returns a dictionary of curves with values filled in
>>> train_test_data = pp.train_test_split(X, y) # Splits the data into training and testing sets.
>>> train_data, valid_data = well_train_test_split(well_names, well_database)
>>> mega_data = data_assemble(main_data, data_key='data') # Assembles data from multiple wells into a single matrix.

matrix_values()[source]

Returns a dictionary of cleaned data without NaN values.

Example

>>> clean_data = pp.matrix_values() # Returns a dictionary of cleaned data without NaN

return_curve(y)[source]: Returns a dictionary of curves with values filled in.

train_test_split(X, y, test_size=0.3, random_state=99)[source]: Splits the data into training and testing sets.

stoneforge.preprocessing.data_processing.well_train_test_split(well_names, well_database)[source]

Splits the well database into training and testing sets based on well names.

Parameters:

well_names (list) – A list of well names to be used for validation.
well_database (dict) – A dictionary containing well data, where keys are well names and values are the corresponding data.

Returns:

A tuple containing two dictionaries: the first for training wells and the second for validation wells.

Return type:

tuple

Example

>>> well_names = ['Well1', 'Well2']
>>> well_database = {'Well1': data1, 'Well2': data2, 'Well3': data3}
>>> train_data, valid_data = well_train_test_split(well_names, well_database)