### Use of ggplot and knitr in a rmarkdown to report results of activated sludge simulation

I am currently attending an online specialization course on water sanitation modelling at Unesco-IHE.
On the first assignment, I consider writing the report straight away in R using Rmarkdown PDF template, rather than jumping from Excel to Word by modelling, formatting and writing.
I found the experience of using Rmarkdown time saving and enjoyable, as I was capable of performing a regression on the fly as I was writing the report and produce neat and nice visualizations and tables with ggplot and knitr::kable.
Rmarkdown stand out with several capabilities:

• Writing nice formulas like chemical reaction kinetics,
• representing the chemical processes (endogenous respiration and air mass transfer in Petersen matrix), by using the kable function
• Embedding R code for data import, visualization, transformation and modelling using readxl, ggplot, and linear regression.
• Embedding jpg images in the repor using the command ![]+image filepath.

You can see the rmd and html sample files in the following urls:

Enjoy!

# Goal

To enable Business Intelligence team visualising legacy database table dependencies on designing a migration process

# Scope

As this an exploratory study, R was chosen to read, process and transform a JSON file with database table dependencies into a file with a graph format.

# Procedure

## 1.- Read the JSON file

To read the JSON file, I used the rjon library to read from a text file having the data in a JSON structure.

library("rjson")
json_file <- "jsonTV.txt"
json_data <- fromJSON(file=json_file)

## 2.- Transform JSON structure into a data frame

For transforming JSON structure into a data frame, I developed a function populateData.

populateDFData <- function(json_data,df_data) {
rowIndex <- 0
for (i in 1:length(json_data)) {

lengthImport <- length(json_data[[i]]$imports) if (lengthImport>0) { for (j in 1:lengthImport) { rowIndex <- rowIndex+1 df_data[rowIndex,"name"]<-json_data[[i]]$name
df_data[rowIndex,"size"]<-json_data[[i]]$size import <- json_data[[i]]$imports
df_data[rowIndex,"imports"]<-import[[j]]
} } else{
rowIndex <- rowIndex + 1
df_data[rowIndex,"name"]<-json_data[[i]]$name df_data[rowIndex,"size"]<-json_data[[i]]$size
df_data[rowIndex,"imports"]<-"<NA>"
}

}
}

I declare a data frame whose number of rows were calculated using a function calculateLength, that defines a record for each occurrence of the tuple (name, import), where name is the table and import its dependencies.

calculateLength <- function(jsonObject) {
dfLength <- 0
for (i in 1:length(jsonObject)) {
importSize <- length(jsonObject[[i]]$import) if (importSize == 0) {incrm <- 1} else {incrm<-importSize} dfLength <- dfLength + incrm } return(dfLength) } dfLength <- calculateLength(json_data) df_data <- data.frame(name=rep("",dfLength), size=rep("",dfLength), imports=rep("",dfLength)) df_data$name<- as.character(df_data$name) df_data$size<- as.character(df_data$size) df_data$imports<- as.character(df_data$imports) Data frame is populated df_data <- populateDFData(json_data = json_data, df_data = df_data) ## 3.- Export data to file with graph format As we wanted to visualise the graph with Gephi, data input needs to contain headers named Source and Target. A data frame declaration is necessary. Besides, we need to filter only tables containing the expression “AMS” and dependencies with one or more tables. Hence, it was necessary to define two filter conditions. A graph data frame was defined using the function graph.data.frame, and this object was exported to a graphml file using the function write.graph. yaml {r} library("igraph") gephi_df <- data.frame(Source=rep("",nrow(df_data)), Target=rep("",nrow(df_data)), Label=rep("",nrow(df_data))) gephi_df$Source <- df_data$imports gephi_df$Target <- df_data$name gephi_df$Label <- df_data$name gephi_df.filter1 <- grepl("AMS",gephi_df$Target) gephi_df.filter2 <- !grepl("<NA>",gephi_df\$Source) gephi_df.export <- gephi_df[gephi_df.filter1&gephi_df.filter2,] gephi_df.g <- graph.data.frame(gephi_df.export[,c("Source", "Target")]) write.graph(gephi_df.g, file="tvia-objects-dependencies.graphml", format = "graphml")

### Indian Water Quality Visualization Using Heatmaps

Recently I download a file with Indian data quality data in Kaggle (https://www.kaggle.com/venkatramakrishnan/india-water-quality-data). I decided to explore data visualization package to learn how to make neat and tidy visualization using heat maps.File has the following structure:

water.quality<-readRDS("water.quality.rds")
head(water.quality)
##       State.Name     District.Name     Block.Name  Panchayat.Name
##               Village.Name                      Habitation.Name
## 2     PANDAVULAPALEM(022 )     PANDAVULAPALEM(0404410022010400)
## 3         G. KOTHURU(023 )         G. KOTHURU(0404410023010600)
## 4        GAJJANAPUDI(029 )        GAJJANAPUDI(0404410029010600)
## 5         CHINTALURU(028 )         CHINTALURU(0404410028011000)
##   Quality.Parameter     Year
## 1          Salinity 1/4/2009
## 2          Fluoride 1/4/2009
## 3          Salinity 1/4/2009
## 4          Salinity 1/4/2009
## 5          Salinity 1/4/2009
## 6          Fluoride 1/4/2009

I decided to organize the work in two R scripts: Data preparation and data visualization My first attempt was to use an advance heat maps function (heatmap.2) I found in blog post by Joseph Rickert (http://www.r-bloggers.com/r-for-more-powerful-clustering). As heatmap uses clustering, a matrix of at least 2 rows by 2 columns was required. I added five custom fields, each one with the name of chemical existing in each sample. As heatmaps requires data to be of class numeric matrix, I had to prepare date in order to ensure that a chemical ocurrence in matrix was marked as 1 (existing) or 0 (absent).I gave up using all data as object generated by scaled function was 116GB in size, and went through using samples of state names. Graphics obtained were cluttered and I was not able to add the year in visualization

source("water-quality-data-preparation-v1.R")
source("water-quality-exploration-v1.R")
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
##     lowess

I searched then in my collection of R-Blogger post and found a recipe of R code for visualization posted by Dikesh Jariwala (https://www.r-bloggers.com/7-visualizations-you-should-learn-in-r/) and found a code for plotting heatmaps using ggplot. I adapted the code to a discrete variable and obtained a more neat and tidy heatmap.

library(devtools)
library(ggplot2)
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, size= 5), strip.text.x = element_text(angle = 90, vjust = 0.5, size = 8))