Use of ggplot and knitr in a rmarkdown to report results of activated sludge simulation

I am currently attending an online specialization course on water sanitation modelling at Unesco-IHE.
On the first assignment, I consider writing the report straight away in R using Rmarkdown PDF template, rather than jumping from Excel to Word by modelling, formatting and writing.
I found the experience of using Rmarkdown time saving and enjoyable, as I was capable of performing a regression on the fly as I was writing the report and produce neat and nice visualizations and tables with ggplot and knitr::kable.
Rmarkdown stand out with several capabilities:

  • Writing nice formulas like chemical reaction kinetics,
  • representing the chemical processes (endogenous respiration and air mass transfer in Petersen matrix), by using the kable function
  • Embedding R code for data import, visualization, transformation and modelling using readxl, ggplot, and linear regression.
  • Embedding jpg images in the repor using the command ![]+image filepath.

You can see the rmd and html sample files in the following urls:

Enjoy!

Database dependencies graph visualization



Database dependencies graph visualization

The journey from JSON file to graph format, and visualization with Gephi

Rafael Ventura

2017-05-06

Goal

To enable Business Intelligence team visualising legacy database table dependencies on designing a migration process

Scope

As this an exploratory study, R was chosen to read, process and transform a JSON file with database table dependencies into a file with a graph format.

Procedure

1.- Read the JSON file

To read the JSON file, I used the rjon library to read from a text file having the data in a JSON structure.

library("rjson")
json_file <- "jsonTV.txt"
json_data <- fromJSON(file=json_file)

2.- Transform JSON structure into a data frame

For transforming JSON structure into a data frame, I developed a function populateData.

populateDFData <- function(json_data,df_data) {
  rowIndex <- 0
  for (i in 1:length(json_data)) {

  lengthImport <- length(json_data[[i]]$imports)
  if (lengthImport>0) {
    for (j in 1:lengthImport) {
      rowIndex <- rowIndex+1
      df_data[rowIndex,"name"]<-json_data[[i]]$name
      df_data[rowIndex,"size"]<-json_data[[i]]$size
      import <- json_data[[i]]$imports
      df_data[rowIndex,"imports"]<-import[[j]]
    } } else{
      rowIndex <- rowIndex + 1
      df_data[rowIndex,"name"]<-json_data[[i]]$name
      df_data[rowIndex,"size"]<-json_data[[i]]$size
      df_data[rowIndex,"imports"]<-"<NA>"
    } 
    
  }
}

I declare a data frame whose number of rows were calculated using a function calculateLength, that defines a record for each occurrence of the tuple (name, import), where name is the table and import its dependencies.

calculateLength <- function(jsonObject) {
  dfLength <- 0
  for (i in 1:length(jsonObject)) {
    importSize <- length(jsonObject[[i]]$import)
    if (importSize == 0) {incrm <- 1} else {incrm<-importSize}
    dfLength <- dfLength + incrm
  }
  return(dfLength)
}
dfLength <- calculateLength(json_data) 
df_data <- data.frame(name=rep("",dfLength), 
                      size=rep("",dfLength), 
                      imports=rep("",dfLength))
df_data$name<- as.character(df_data$name)
df_data$size<- as.character(df_data$size)
df_data$imports<- as.character(df_data$imports)

Data frame is populated

df_data <- populateDFData(json_data = json_data, 
                          df_data = df_data)

3.- Export data to file with graph format

As we wanted to visualise the graph with Gephi, data input needs to contain headers named Source and Target. A data frame declaration is necessary. Besides, we need to filter only tables containing the expression “AMS” and dependencies with one or more tables. Hence, it was necessary to define two filter conditions. A graph data frame was defined using the function graph.data.frame, and this object was exported to a graphml file using the function write.graph.

yaml {r} library("igraph") gephi_df <- data.frame(Source=rep("",nrow(df_data)), Target=rep("",nrow(df_data)), Label=rep("",nrow(df_data))) gephi_df$Source <- df_data$imports gephi_df$Target <- df_data$name gephi_df$Label <- df_data$name gephi_df.filter1 <- grepl("AMS",gephi_df$Target) gephi_df.filter2 <- !grepl("<NA>",gephi_df$Source) gephi_df.export <- gephi_df[gephi_df.filter1&gephi_df.filter2,] gephi_df.g <- graph.data.frame(gephi_df.export[,c("Source", "Target")]) write.graph(gephi_df.g, file="tvia-objects-dependencies.graphml", format = "graphml")