Recommended R functions: Difference between revisions

From Opasnet
Jump to navigation Jump to search
mNo edit summary
 
(4 intermediate revisions by 2 users not shown)
Line 2: Line 2:
[[Category:Opasnet]]
[[Category:Opasnet]]
[[Category:R tool]]
[[Category:R tool]]
[[Category:Contains R code]]
{{method|moderator=Jouni|stub=Yes}}
{{method|moderator=Jouni|stub=Yes}}
'''Recommended R functions''' describes good practices for writing R code. The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.
'''Recommended R functions''' describes good practices for writing R code. The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.
Line 11: Line 12:
==Answer==
==Answer==


===data.frame===
===Recommended structures===


===ggplot2===
* When possible, use data.frames rather than arrays, tables or lists.
* Standard columns for data.frames:
** id: row identifier in the ''res'' table of [[Opasnet Base]]. Usually not needed and this can be sliced away.
** obs: row identifier in a data table or one uploaded piece of data. Technically, a piece of data that has the same series_id in [[Opasnet Base]].
** iter: identifier of the iteration in a Monte Carlo simulation.
** There are several standard indices in [[Opasnet Base]] such as Year, Sex, Age, Lo (longitude), La (latitude),... If possible, use these names. For a full reference, see [[Special:Opasnet Base Indices|Opasnet Base Indices]].
** Result: the actual result of the object, typically numeric. There are also other columns used for results, namely
*** Result.Text: in Opasnet Base, results that are in text format are stored in this column, and
*** Freq: when tapply() function is used, the summarised result is given in Freq column.
** Unit: If the unit can be different at different rows, a separate column Unit is needed.
** Description: Description can contain any descriptive information about the row. It is not used in calculations.
* For health impacts, there are standard tables to be used. For details, see [[Health impact assessment]].
* For time tracking of working hours, there are standard tables to be used. For details, see [[:op_fi:Aikakone]].
* For listing several similar tables that should be bound rowwise, use standard tables described in [[Using summary tables]].


===tapply===
===Recommended generic functions===


===merge===
{| class="wikitable sortable" {{prettytable}}
|+ '''Recommended functions and operations.
|----
! What to do || Functions to use routinely|| Functions to avoid except in special cases|| Examples and description
|----
| Manipulate data || data.frame || array ||
|----
| Draw raphs || ggplot, plot || || ggplot requires library(ggplot2)
|----
| Summarise data along a criterion || tapply || ||
|----
| Join two data.frames || merge || IntArray ||
|----
| Add rows to a data.frame || rbind || ||
|----
| Add columns to a data.frame || cbind || ||
|----
| Transform a table from long to wide or vice versa || reshape || ||
|----
| Get data from [[Opasnet Base]] || op_baseGetData || || Requires library(OpasnetBaseUtils) <br>> op_baseGetData("opasnet_base", "Op_en4523")[, -c(1,2)] # Gets the object with identifier Op_en4523 and slices columns 1 and 2 (id, obs) away.
|----
| Get index values from [[Opasnet Base]] || op_baseGetLocs || || Requires library(OpasnetBaseUtils)
|----
| Write data to [[Opasnet Base]] || op_baseWrite || || Requires library(OpasnetBaseUtils). Only works from a THL computer, not [[R-tools]]
|----
| Slicing R objects || data[rows, cols], data$col || || This is not a function but rather a list of practical ways of slicing an object.
|----
| || match  || ||
|----
| ||ifelse  || ||
|----
| ||is.na  || ||
|----
| ||colnames  || ||
|----
| Convert between data types || as.numeric, as.character, as.factor || ||
|----
| Convert between object types || as.data.frame, as.table || ||
|----
|}


===rbind===
===Recommended tailored functions===


===cbind===
'''Code tailoredfunctions:
* '''multivarplot''' produces a graph for multiple variables along the same X axis (usually a timeline).


===reshape===
<rcode name="tailoredfunctions">
##### multivarplot produces a graph for multiple variables along the same X axis (usually a timeline). Parameters:
#    a: Data.frame that has three columns: DateTime = x axis value, TagName = name of the variable, and Value = y axis value for the variable.
#    precision: a smoothing parameter (0 = no smoothing)
#    timeline: TRUE if DateTime has POSIXct format, FALSE if real number.


===op_baseGetData===
multivarplot <- function(a, precision = 0, timeline = FALSE) {
 
par(mar=c(5, length(levels(a$TagName)) * 3.5 + 1.5, 4, 4) + 0.1)
===op_baseGetLocs===
 
x <- 0
===op_baseWrite===
 
for (i in levels(a$TagName)) {
===slicing R objects ===
if(i != levels(a$TagName)[1]) par(new = TRUE)
 
plot(if(precision == 0) list(x = a$DateTime[a$TagName == i], y = a$Value[a$TagName == i]) else loess.smooth(a$DateTime[a$TagName == i], a$Value[a$TagName == i],
This is not a function but rather a list of practical ways of slicing an object.
degree = 1, span = precision), axes = FALSE, xlab = "", ylab = "", type = "l", col = rainbow(length(levels(a$TagName)))[x + 1], main = "",
 
xlim = c(min(a$DateTime), max(a$DateTime)), ylim = c(min(a$Value[a$TagName == i]) - sd(a$Value[a$TagName == i]) *
===match ===
0.1, max(a$Value[a$TagName == i]) + sd(a$Value[a$TagName == i]) * 0.1))
 
axis(2, col = rainbow(length(levels(a$TagName)))[x + 1],lwd = 2, line = x * 3.5)
===ifelse ===
mtext(2, text = i, line = x * 3.5 + 2, col = rainbow(length(levels(a$TagName)))[x + 1])
 
x <- x + 1
===is.na ===
}
 
===colnames ===
if(timeline){axis.POSIXct(1, a$DateTime)} else {axis(1, a$DateTime)}
 
mtext("Time", side = 1, col = "black", line = 2)
===as.numeric ===
}
 
</rcode>
===as.character ===
 
===as.data.frame ===
 
===as.table ===


==Rationale==
==Rationale==
Line 60: Line 113:
* [[R-tools]]
* [[R-tools]]
* [[A Tutorial on R]]
* [[A Tutorial on R]]
* [[List of R functions]]


==References==
==References==

Latest revision as of 09:33, 26 August 2013


Recommended R functions describes good practices for writing R code. The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.

Question

What are good practices for writing R code? The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.

Answer

Recommended structures

  • When possible, use data.frames rather than arrays, tables or lists.
  • Standard columns for data.frames:
    • id: row identifier in the res table of Opasnet Base. Usually not needed and this can be sliced away.
    • obs: row identifier in a data table or one uploaded piece of data. Technically, a piece of data that has the same series_id in Opasnet Base.
    • iter: identifier of the iteration in a Monte Carlo simulation.
    • There are several standard indices in Opasnet Base such as Year, Sex, Age, Lo (longitude), La (latitude),... If possible, use these names. For a full reference, see Opasnet Base Indices.
    • Result: the actual result of the object, typically numeric. There are also other columns used for results, namely
      • Result.Text: in Opasnet Base, results that are in text format are stored in this column, and
      • Freq: when tapply() function is used, the summarised result is given in Freq column.
    • Unit: If the unit can be different at different rows, a separate column Unit is needed.
    • Description: Description can contain any descriptive information about the row. It is not used in calculations.
  • For health impacts, there are standard tables to be used. For details, see Health impact assessment.
  • For time tracking of working hours, there are standard tables to be used. For details, see op_fi:Aikakone.
  • For listing several similar tables that should be bound rowwise, use standard tables described in Using summary tables.

Recommended generic functions

Recommended functions and operations.
What to do Functions to use routinely Functions to avoid except in special cases Examples and description
Manipulate data data.frame array
Draw raphs ggplot, plot ggplot requires library(ggplot2)
Summarise data along a criterion tapply
Join two data.frames merge IntArray
Add rows to a data.frame rbind
Add columns to a data.frame cbind
Transform a table from long to wide or vice versa reshape
Get data from Opasnet Base op_baseGetData Requires library(OpasnetBaseUtils)
> op_baseGetData("opasnet_base", "Op_en4523")[, -c(1,2)] # Gets the object with identifier Op_en4523 and slices columns 1 and 2 (id, obs) away.
Get index values from Opasnet Base op_baseGetLocs Requires library(OpasnetBaseUtils)
Write data to Opasnet Base op_baseWrite Requires library(OpasnetBaseUtils). Only works from a THL computer, not R-tools
Slicing R objects data[rows, cols], data$col This is not a function but rather a list of practical ways of slicing an object.
match
ifelse
is.na
colnames
Convert between data types as.numeric, as.character, as.factor
Convert between object types as.data.frame, as.table

Recommended tailored functions

Code tailoredfunctions:

  • multivarplot produces a graph for multiple variables along the same X axis (usually a timeline).

+ Show code

Rationale

Based on experience and testing.

See also

References


Related files

<mfanonymousfilelist></mfanonymousfilelist>