Intelligent Geospatial Application with Oracle Spatial AI and Oracle APEX
In this article, I built a simple end-to-end spatial analytics application using Oracle Spatial AI in Oracle Machine Learning for Python (OML4Py) and then prepared the results for presentation in Oracle APEX.
The scenario I implemented is based on grouping geographic areas by household income and comparing two different clustering approaches:
- Traditional non-spatial clustering, where areas are grouped only by similar numeric values.
- Spatial clustering with regionalization, where areas are grouped not only by similar values, but also by geographic neighborhood relationships.
The main idea behind this scenario is very practical. If I only cluster areas by income, I may get groups that are statistically similar but scattered across the map. But if I use spatial clustering, I can generate clusters that are both income-similar and geographically coherent. This makes the results more meaningful for regional analysis, location intelligence, and map-based applications.
Oracle Spatial AI is designed exactly for this kind of problem. It brings location-based predictions and pattern detection into OML4Py on Autonomous Database Serverless . It also provides spatial machine learning algorithms, spatial preprocessing, and a Python API for common Oracle Spatial operations.
I used Oracle Machine Learning notebooks to run the spatial clustering workflow, and then I used Oracle APEX to build an application that can present the clustering output in a more user-friendly way.
❓Why I Chose This Use Case
The dataset used in the notebook is based on block groups and includes fields such as:
GEOIDMEDIAN_INCOMEgeometry
The goal was to compare how clustering behaves when I ignore location versus when I explicitly include location.
This is a good demonstration scenario because Oracle Spatial AI documentation highlights that Spatial AI can be used to:
My clustering example is especially relevant because it shows one of the most important differences between standard machine learning and spatial machine learning:
- standard clustering finds similar values,
- spatial clustering finds similar values in meaningful geographic regions.
👉The work was completed in two main phases:
Phase 1 – Oracle Machine Learning / Spatial AI
In OML4Py, I:
- imported the required Python and Spatial AI libraries,
- loaded the spatial dataset,
- created a
SpatialDataFrame, - prepared the data,
- trained two clustering models,
- compared the output visually,
- generated result tables with semantic labels such as Low Income and Very High Income.
Phase 2 – Oracle APEX
In APEX, I used the exported clustering output to build an application layer that could present the model results. The attached APEX export confirms that the final application is named Demo_1_Final and contains 3 pages, 7 regions, 4 processes, and supporting shared components such as authentication and navigation
So, the complete workflow was:
Spatial data → OML notebook → clustering models → result tables → APEX application
This matches the typical Spatial AI modeling flow described in the documentation, which includes:
- data loading,
- preprocessing and analysis,
- model training,
- prediction,
- post-processing
Before start steps : Import required libraries ( oraclesai )
The first step in the notebook was to import the required Python libraries and Spatial AI packages.
The clustering notebook shows imports such as:
pandasmatplotlibStandardScalerSpatialDataFrameenable_geodataframesSpatialPipelineAgglomerativeClusteringKNNWeightsDefinitionplot_clusters
What each library does in simple terms
- pandas: used to create and manipulate tabular results such as clustering summaries.
- matplotlib: used to create side-by-side visual comparisons of the two clustering models.
- StandardScaler: used to standardize the numeric feature before clustering.
- SpatialDataFrame: a Spatial AI structure used to work with spatial data in a machine learning workflow
- enable_geodataframes(z): enables geospatial rendering inside the notebook environment .
- SpatialPipeline: used to chain preprocessing and modeling steps in a workflow .
- AgglomerativeClustering: the clustering algorithm used in this project.
- KNNWeightsDefinition: defines spatial neighborhood relationships using k-nearest neighbors.
- plot_clusters: used to display cluster results on the map.
The import step is important because Oracle Spatial AI is accessed through Python APIs in OML notebooks, and the product is integrated with OML4Py on Autonomous Database Serverless
This is an important stage in any spatial machine learning workflow, even if the demonstration keeps it simple.
Oracle Spatial AI supports preprocessing and feature engineering through the oraclesai.preprocessing package . The product also supports spatial preprocessing features such as:
- spatial feature engineering,
- spatial lag,
- spatial imputation,
- preprocessing for machine learning workflows .
What preprocessing means here in simple language
Before training a clustering model, I need to make sure the data is in a good usable state. In this project, preprocessing includes:
- selecting the correct columns,
- making sure the geometry is available,
- working in the correct coordinate reference system,
- optionally preparing the dataset for later extensions such as spatial imputation or spatial lag.
In the extended clustering notebook, there is also evidence of a more advanced pipeline that includes:
SpatialImputerSpatialLagTransformer
This is useful when:
- some values are missing,
- or when I want to enrich the data with neighborhood-based features before clustering.
Even though the main clustering comparison shown in the notebook uses a simpler version, the presence of these preprocessing steps shows that the workflow can easily be expanded into a richer spatial machine learning pipeline.
Step 4: Training the Spatial Clustering Model with Regionalization
This model uses a very similar pipeline structure, but the clustering step includes:
spatial_weights_definition=
What this means in simple terms
This tells the model to use k-nearest-neighbor spatial relationships. In this case, each observation considers its 5 nearest neighbors.
So unlike the first model, this version does not only ask:
“Which areas have similar income values?”
It also asks:
“Which areas are similar and geographically close enough to form coherent spatial regions?”
Why this matters
This is the key difference between ordinary clustering and spatial clustering.
In the non-spatial model:
- groups are based only on similarity.
In the spatial model:
- groups are based on similarity plus geographic regionalization.
This makes the result much more meaningful for maps, regional studies, planning, and spatial decision-making.
The clustering notebook calls this:
- Without regionalization (non-spatial)
- With regionalization (KNN weights)
This comparison is the core of the scenario.
Creating Result Tables from the Model Output
After fitting the models, I transformed the labels into result tables using pandas .
The notebook creates:
nonspatial_resultspatial_result
Each result contains:
GEOIDMEDIAN_INCOMEcluster_label
Then, I generated summary tables by grouping on cluster_label and calculating:
countminmaxmeanmedian
Why this step is useful
This makes the clustering results easier to interpret.
Instead of just having labels like 0, 1, 2, etc., I can now inspect the average income profile of each cluster.
This is a practical post-processing step because raw cluster IDs are not meaningful to business users. Summary statistics help transform algorithm output into interpretable categories.
To make the output more readable, I mapped each cluster to a semantic label.
The notebook defines a list of names:
- Low Income
- Medium-Low Income
- Medium Income
- Medium-High Income
- High Income
- Very High Income
Then the summary tables are sorted by mean income, and the semantic names are assigned to the clusters in order
Part II – Work Done in Oracle APEX
After generating the clustering results in Oracle Machine Learning, the next step was to present them in an application layer.
This is where Oracle APEX comes in.
SELECT
b.GEOID,
b.geometry,
c."cluster_label",
s."cluster_meaning",
s."min",
s."max",
s."mean",
s."median",
s."cluster_meaning" || ' | Range: ' || s."min" || ' - ' || s."max" AS cluster_legend
FROM la_block_groups b
JOIN "temp_cluster_labels_spatial_demo1" c
ON b.GEOID = c.GEOID
JOIN "temp_cluster_summary_spatial_demo1" s
ON c."cluster_label" = s."cluster_label"
Comments
Post a Comment