Show the code
::p_load(sf, tidyverse, readr, readxl, ggplot2, dplyr, tidyr, units) pacman
January 16, 2023
March 6, 2023
Usage of the code chunk below :
p_load( ) - pacman - to load packages into R environment. This function will attempt to install the package from CRAN or the pacman repository list if it is not installed.
Remarks :
sf, sp, (rgdal, rgeos both are retiring by year 2023), tidyverse, questionr, janitor, psych, ggplot2, gcookbook, tmap, ggpubr, egg, corrplot, gtsummary, regclass, caret, heatmaply, ggdendro, cluster, factoextra, spdep, ClustGeo, GGally, skimr, stringr, funModeling, knitr, caTools, viridis, rgeoda, cowplot, patchwork.
Alternate code chunk -
Aspatial Data
Geospatial Data
Included two extra functions when importing the data :
transform the boundary data to “26392”
check parsing error.
bdy_nga.sf <- st_read(dsn = "data/geospatial",
layer = "geoBoundaries-NGA-ADM2") %>%
select(shapeName) %>%
st_transform(crs = 26392)
Reading layer `geoBoundaries-NGA-ADM2' from data source
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
Remarks :
Observations : 774 in multipolygon
CRS : Projected CRS i.e. WGS 84
wp_attribute <- read_csv("data/aspatial/WPdx_NGAv1.2.1.csv",
col_select = c(`row_id`,
`New Georeferenced Column`,
`is_urban`)) %>%
rename(lat_deg = `#lat_deg`,
lon_deg = `#lon_deg`,
water_source = `#water_source`,
water_source_clean = `#water_source_clean`,
water_source_category = `#water_source_category`,
water_tech_clean = `#water_tech_clean`,
water_tech_category = `#water_tech_category`,
status_clean = `#status_clean`,
status = `#status`,
status_id = `#status_id`,
clean_adm1 = `#clean_adm1`,
clean_adm2 = `#clean_adm2`)
Rows: 95008 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (12): #status_id, #water_source_clean, #water_source_category, #water_te...
dbl (10): row_id, #lat_deg, #lon_deg, rehab_priority, water_point_population...
lgl (1): is_urban
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Name | wp_attribute |
Number of rows | 95008 |
Number of columns | 23 |
_______________________ | |
Column type frequency: | |
character | 12 |
logical | 1 |
numeric | 10 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
New Georeferenced Column | 0 | 1.00 | 11 | 45 | 0 | 95008 | 0 |
lat_lon_deg | 0 | 1.00 | 8 | 42 | 0 | 95008 | 0 |
water_source | 0 | 1.00 | 3 | 32 | 0 | 16 | 0 |
water_source_clean | 302 | 1.00 | 8 | 22 | 0 | 5 | 0 |
water_source_category | 302 | 1.00 | 4 | 11 | 0 | 3 | 0 |
water_tech_clean | 10055 | 0.89 | 8 | 26 | 0 | 11 | 0 |
water_tech_category | 10055 | 0.89 | 8 | 15 | 0 | 4 | 0 |
status_clean | 10656 | 0.89 | 9 | 32 | 0 | 8 | 0 |
status | 10656 | 0.89 | 14 | 156 | 0 | 834 | 0 |
status_id | 0 | 1.00 | 2 | 7 | 0 | 3 | 0 |
clean_adm1 | 0 | 1.00 | 3 | 25 | 0 | 37 | 0 |
clean_adm2 | 0 | 1.00 | 3 | 19 | 0 | 753 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
is_urban | 0 | 1 | 0.21 | FAL: 75444, TRU: 19564 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
row_id | 0 | 1.00 | 199975.48 | 189726.13 | 10732.00 | 52632.75 | 86952.50 | 323671.50 | 681838.00 | ▇▃▂▂▁ |
lat_deg | 0 | 1.00 | 9.33 | 2.48 | 4.30 | 7.36 | 9.09 | 11.83 | 13.87 | ▃▇▅▅▆ |
lon_deg | 0 | 1.00 | 7.50 | 2.25 | 2.71 | 5.52 | 7.89 | 9.08 | 14.22 | ▃▃▇▃▁ |
water_point_population | 539 | 0.99 | 1246.32 | 4027.41 | 0.00 | 117.00 | 413.00 | 1169.00 | 384595.00 | ▇▁▁▁▁ |
local_population_1km | 539 | 0.99 | 3723.15 | 7417.59 | 0.00 | 597.00 | 1756.00 | 4393.00 | 384595.00 | ▇▁▁▁▁ |
crucialness_score | 6879 | 0.93 | 0.41 | 0.34 | 0.00 | 0.13 | 0.30 | 0.63 | 1.00 | ▇▅▃▁▅ |
pressure_score | 6879 | 0.93 | 3.21 | 9.04 | 0.00 | 0.40 | 1.18 | 3.10 | 776.97 | ▇▁▁▁▁ |
usage_capacity | 0 | 1.00 | 488.63 | 310.95 | 50.00 | 300.00 | 300.00 | 1000.00 | 1000.00 | ▁▇▁▁▃ |
staleness_score | 0 | 1.00 | 44.94 | 6.29 | 23.13 | 41.49 | 42.87 | 44.34 | 99.00 | ▁▇▁▁▁ |
rehab_priority | 53109 | 0.44 | 1545.45 | 5243.53 | 0.00 | 136.50 | 522.00 | 1527.00 | 384595.00 | ▇▁▁▁▁ |
Remarks :
Observation : 95,008 water points
Variable : Out of 23 variables, there are 7 variables with missing value n percent.
Remarks :
Need to have “geometry” variable, else will encounter the following error -
” Error in st_sf(wp_attribute, crs = 26392) : no simple features geometry column present ”
Coordinate Reference System:
User input: EPSG:26392
PROJCRS["Minna / Nigeria Mid Belt",
ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
CONVERSION["Nigeria Mid Belt",
METHOD["Transverse Mercator",
PARAMETER["Latitude of natural origin",4,
PARAMETER["Longitude of natural origin",8.5,
PARAMETER["Scale factor at natural origin",0.99975,
PARAMETER["False easting",670553.98,
PARAMETER["False northing",0,
SCOPE["Engineering survey, topographic mapping."],
AREA["Nigeria between 6°30'E and 10°30'E, onshore and offshore shelf."],
In order to use “shapeName” as the unique reference id, the duplicated shapeName will be append with state name respectively to become unique value.
bdy_nga.sf$shapeName[c(94,95,304,305,355,356,519,520,546,547,693,694)] <-
c("Bassa Kogi",
"Bassa Plateau",
"Ifelodun Kwara",
"Ifelodun Osun",
"Irepodun Kwara",
"Irepodun Osun",
"Nasarawa Kano",
"Nasarawa Nasarawa",
"Obi Nasarawa",
"Obi Benue",
"Surulere Lagos",
"Surulere Oyo")
[1] "Bassa Kogi" "Bassa Plateau" "Ifelodun Kwara"
[4] "Ifelodun Osun" "Irepodun Kwara" "Irepodun Osun"
[7] "Nasarawa Kano" "Nasarawa Nasarawa" "Obi Nasarawa"
[10] "Obi Benue" "Surulere Lagos" "Surulere Oyo"
Usage of the code chunk below : The codechunk below is to verify the output from previous step.
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: Minna / Nigeria Mid Belt
[1] shapeName bdy_nga.sf$shapeName geometry
<0 rows> (or 0-length row.names)
Compare different approaches in combining the attribute and boundary of the water points into a simple feature object :
x | y | left |
wp.sf | bdy_nga.sf | TRUE |
wp.sf | bdy_nga.sf | NULL |
bdy_nga.sf | wp.sf | NULL |
Geometry set for 95008 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 2.707441 ymin: 4.301812 xmax: 14.21828 ymax: 13.86568
Projected CRS: Minna / Nigeria Mid Belt
First 5 geometries:
POINT (10.47318 10.60104)
POINT (6.95009 6.78599)
POINT (7.615451 6.799595)
POINT (7.30539 6.30817)
POINT (10.44625 10.50681)
Geometry set for 95008 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 2.707441 ymin: 4.301812 xmax: 14.21828 ymax: 13.86568
Projected CRS: Minna / Nigeria Mid Belt
First 5 geometries:
POINT (10.47318 10.60104)
POINT (6.95009 6.78599)
POINT (7.615451 6.799595)
POINT (7.30539 6.30817)
POINT (10.44625 10.50681)
Geometry set for 774 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 26662.71 ymin: 30523.38 xmax: 1344157 ymax: 1096029
Projected CRS: Minna / Nigeria Mid Belt
First 5 geometries:
MULTIPOLYGON (((548795.5 119641, 548687.4 11968...
MULTIPOLYGON (((541412.3 122192.3, 541544.6 122...
MULTIPOLYGON (((1248985 1048169, 1247285 104795...
MULTIPOLYGON (((510864.9 578541.6, 508736.2 577...
MULTIPOLYGON (((594269 120968.5, 594389.6 12087...
STHDA (Statistical tools for high-throughput data analysis), (N.A.), ggplot2 scatter plots : Quick start guide - R software and data visualization.
Runfola, D. et al. (2020) geoBoundaries: A global database of political administrative boundaries. PLoS ONE 15(4): e0231866.↩︎