Can we use the same geochemical data and labels to generate a predictive model for future drill holes which can label samples on whether they are in class A or class B?
More data has been acquired since the geochemist completed her work — can we predict labels onto these data points (labelled “?”)?
- Understand its structure and content
- Check for errors, missing values, and outliers
- Clean the data and prepare it for modelling
Fields (type):
Unique_ID(object)holeid(object)from(int64)to(float64)As(float64)Au(object)Pb(float64)Fe(float64)Mo(float64)S(float64)Cu(float64)Zn(float64)Class(object)
Dataset:
- 140 Drillholes
- 4,004 Classified data points
- 767 Unclassified data points
Notes:
- Missing collar coordinates
- Missing units
- Some missing values (
AsMissing 1503 values) - Not all intervals present
QAQC:
- Missing collar coordinates
- Missing units
<0.005inAureplaced with half detection limit0.0025Auconverted from object to float-999values removed- Some missing values (e.g. in
As) - Not all intervals present
- Plotted
holeidalong the x-axis and depth on the y-axis - Created two versions: one coloured by
Pb, one byClass
- Iterated through Pb cut-off values to find the one best aligned with Class
- Pb = 160 gave the highest accuracy: 78.4%
- Used a weighted sum of key elements (
Pb,Mo,Au,As) based on correlation - Elements
S,Zn,Cu, andFewere excluded due to low correlation - Accuracy improved slightly to 78.91%
- Applied XGBoost to model non-linear relationships and interactions
- Validated using 20% random test split
- Accuracy improved to 84.52%
Figure: Confusion matrix for XGBoost model
- Added depth interval as an additional parameter to model spatial relationships
- Final model accuracy: 93.51%
Figure: Predicted classes using XGBoost model with depth intervals
Figure: Predicted classes using XGBoost model with depth intervals
- Apply collar location to plot drillholes in 3D space and better understand spatial relationships
- Validate models by removing entire drillholes instead of random samples
- Compare with other ML methods (e.g. Random Forest, SVM)
- Include geological data such as lithology, alteration, and structure










