Recommended R functions
(Redirected from Data.frame)
Moderator:Jouni (see all) |
This page is a stub. You may improve it into a full page. |
Upload data
|
Recommended R functions describes good practices for writing R code. The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.
Question
What are good practices for writing R code? The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.
Answer
Recommended structures
- When possible, use data.frames rather than arrays, tables or lists.
- Standard columns for data.frames:
- id: row identifier in the res table of Opasnet Base. Usually not needed and this can be sliced away.
- obs: row identifier in a data table or one uploaded piece of data. Technically, a piece of data that has the same series_id in Opasnet Base.
- iter: identifier of the iteration in a Monte Carlo simulation.
- There are several standard indices in Opasnet Base such as Year, Sex, Age, Lo (longitude), La (latitude),... If possible, use these names. For a full reference, see Opasnet Base Indices.
- Result: the actual result of the object, typically numeric. There are also other columns used for results, namely
- Result.Text: in Opasnet Base, results that are in text format are stored in this column, and
- Freq: when tapply() function is used, the summarised result is given in Freq column.
- Unit: If the unit can be different at different rows, a separate column Unit is needed.
- Description: Description can contain any descriptive information about the row. It is not used in calculations.
- For health impacts, there are standard tables to be used. For details, see Health impact assessment.
- For time tracking of working hours, there are standard tables to be used. For details, see op_fi:Aikakone.
- For listing several similar tables that should be bound rowwise, use standard tables described in Using summary tables.
Recommended generic functions
What to do | Functions to use routinely | Functions to avoid except in special cases | Examples and description |
---|---|---|---|
Manipulate data | data.frame | array | |
Draw raphs | ggplot, plot | ggplot requires library(ggplot2) | |
Summarise data along a criterion | tapply | ||
Join two data.frames | merge | IntArray | |
Add rows to a data.frame | rbind | ||
Add columns to a data.frame | cbind | ||
Transform a table from long to wide or vice versa | reshape | ||
Get data from Opasnet Base | op_baseGetData | Requires library(OpasnetBaseUtils) > op_baseGetData("opasnet_base", "Op_en4523")[, -c(1,2)] # Gets the object with identifier Op_en4523 and slices columns 1 and 2 (id, obs) away. | |
Get index values from Opasnet Base | op_baseGetLocs | Requires library(OpasnetBaseUtils) | |
Write data to Opasnet Base | op_baseWrite | Requires library(OpasnetBaseUtils). Only works from a THL computer, not R-tools | |
Slicing R objects | data[rows, cols], data$col | This is not a function but rather a list of practical ways of slicing an object. | |
match | |||
ifelse | |||
is.na | |||
colnames | |||
Convert between data types | as.numeric, as.character, as.factor | ||
Convert between object types | as.data.frame, as.table |
Recommended tailored functions
Code tailoredfunctions:
- multivarplot produces a graph for multiple variables along the same X axis (usually a timeline).
Rationale
Based on experience and testing.
See also
References
Related files
<mfanonymousfilelist></mfanonymousfilelist>