Recommended R functions

From Opasnet
Revision as of 09:33, 26 August 2013 by Heta (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Recommended R functions describes good practices for writing R code. The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.

Question

What are good practices for writing R code? The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.

Answer

Recommended structures

  • When possible, use data.frames rather than arrays, tables or lists.
  • Standard columns for data.frames:
    • id: row identifier in the res table of Opasnet Base. Usually not needed and this can be sliced away.
    • obs: row identifier in a data table or one uploaded piece of data. Technically, a piece of data that has the same series_id in Opasnet Base.
    • iter: identifier of the iteration in a Monte Carlo simulation.
    • There are several standard indices in Opasnet Base such as Year, Sex, Age, Lo (longitude), La (latitude),... If possible, use these names. For a full reference, see Opasnet Base Indices.
    • Result: the actual result of the object, typically numeric. There are also other columns used for results, namely
      • Result.Text: in Opasnet Base, results that are in text format are stored in this column, and
      • Freq: when tapply() function is used, the summarised result is given in Freq column.
    • Unit: If the unit can be different at different rows, a separate column Unit is needed.
    • Description: Description can contain any descriptive information about the row. It is not used in calculations.
  • For health impacts, there are standard tables to be used. For details, see Health impact assessment.
  • For time tracking of working hours, there are standard tables to be used. For details, see op_fi:Aikakone.
  • For listing several similar tables that should be bound rowwise, use standard tables described in Using summary tables.

Recommended generic functions

Recommended functions and operations.
What to do Functions to use routinely Functions to avoid except in special cases Examples and description
Manipulate data data.frame array
Draw raphs ggplot, plot ggplot requires library(ggplot2)
Summarise data along a criterion tapply
Join two data.frames merge IntArray
Add rows to a data.frame rbind
Add columns to a data.frame cbind
Transform a table from long to wide or vice versa reshape
Get data from Opasnet Base op_baseGetData Requires library(OpasnetBaseUtils)
> op_baseGetData("opasnet_base", "Op_en4523")[, -c(1,2)] # Gets the object with identifier Op_en4523 and slices columns 1 and 2 (id, obs) away.
Get index values from Opasnet Base op_baseGetLocs Requires library(OpasnetBaseUtils)
Write data to Opasnet Base op_baseWrite Requires library(OpasnetBaseUtils). Only works from a THL computer, not R-tools
Slicing R objects data[rows, cols], data$col This is not a function but rather a list of practical ways of slicing an object.
match
ifelse
is.na
colnames
Convert between data types as.numeric, as.character, as.factor
Convert between object types as.data.frame, as.table

Recommended tailored functions

Code tailoredfunctions:

  • multivarplot produces a graph for multiple variables along the same X axis (usually a timeline).

+ Show code

Rationale

Based on experience and testing.

See also

References


Related files

<mfanonymousfilelist></mfanonymousfilelist>