Recommended R functions: Difference between revisions
Jump to navigation
Jump to search
m (→Answer) |
mNo edit summary |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
[[Category:Opasnet]] | [[Category:Opasnet]] | ||
[[Category:R tool]] | [[Category:R tool]] | ||
[[Category:Contains R code]] | |||
{{method|moderator=Jouni|stub=Yes}} | {{method|moderator=Jouni|stub=Yes}} | ||
'''Recommended R functions''' describes good practices for writing R code. The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code. | '''Recommended R functions''' describes good practices for writing R code. The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code. | ||
Line 11: | Line 12: | ||
==Answer== | ==Answer== | ||
=== | ===Recommended structures=== | ||
* When possible, use data.frames rather than arrays, tables or lists. | |||
* Standard columns for data.frames: | |||
** id: row identifier in the ''res'' table of [[Opasnet Base]]. Usually not needed and this can be sliced away. | |||
** obs: row identifier in a data table or one uploaded piece of data. Technically, a piece of data that has the same series_id in [[Opasnet Base]]. | |||
** iter: identifier of the iteration in a Monte Carlo simulation. | |||
** There are several standard indices in [[Opasnet Base]] such as Year, Sex, Age, Lo (longitude), La (latitude),... If possible, use these names. For a full reference, see [[Special:Opasnet Base Indices|Opasnet Base Indices]]. | |||
** Result: the actual result of the object, typically numeric. There are also other columns used for results, namely | |||
*** Result.Text: in Opasnet Base, results that are in text format are stored in this column, and | |||
*** Freq: when tapply() function is used, the summarised result is given in Freq column. | |||
** Unit: If the unit can be different at different rows, a separate column Unit is needed. | |||
** Description: Description can contain any descriptive information about the row. It is not used in calculations. | |||
* For health impacts, there are standard tables to be used. For details, see [[Health impact assessment]]. | |||
* For time tracking of working hours, there are standard tables to be used. For details, see [[:op_fi:Aikakone]]. | |||
* For listing several similar tables that should be bound rowwise, use standard tables described in [[Using summary tables]]. | |||
=== | ===Recommended generic functions=== | ||
= | {| class="wikitable sortable" {{prettytable}} | ||
|+ '''Recommended functions and operations. | |||
|---- | |||
! What to do || Functions to use routinely|| Functions to avoid except in special cases|| Examples and description | |||
|---- | |||
| Manipulate data || data.frame || array || | |||
|---- | |||
| Draw raphs || ggplot, plot || || ggplot requires library(ggplot2) | |||
|---- | |||
| Summarise data along a criterion || tapply || || | |||
|---- | |||
| Join two data.frames || merge || IntArray || | |||
|---- | |||
| Add rows to a data.frame || rbind || || | |||
|---- | |||
| Add columns to a data.frame || cbind || || | |||
|---- | |||
| Transform a table from long to wide or vice versa || reshape || || | |||
|---- | |||
| Get data from [[Opasnet Base]] || op_baseGetData || || Requires library(OpasnetBaseUtils) <br>> op_baseGetData("opasnet_base", "Op_en4523")[, -c(1,2)] # Gets the object with identifier Op_en4523 and slices columns 1 and 2 (id, obs) away. | |||
|---- | |||
| Get index values from [[Opasnet Base]] || op_baseGetLocs || || Requires library(OpasnetBaseUtils) | |||
|---- | |||
| Write data to [[Opasnet Base]] || op_baseWrite || || Requires library(OpasnetBaseUtils). Only works from a THL computer, not [[R-tools]] | |||
|---- | |||
| Slicing R objects || data[rows, cols], data$col || || This is not a function but rather a list of practical ways of slicing an object. | |||
|---- | |||
| || match || || | |||
|---- | |||
| ||ifelse || || | |||
|---- | |||
| ||is.na || || | |||
|---- | |||
| ||colnames || || | |||
|---- | |||
| Convert between data types || as.numeric, as.character, as.factor || || | |||
|---- | |||
| Convert between object types || as.data.frame, as.table || || | |||
|---- | |||
|} | |||
=== | ===Recommended tailored functions=== | ||
'''Code tailoredfunctions: | |||
* '''multivarplot''' produces a graph for multiple variables along the same X axis (usually a timeline). | |||
=== | <rcode name="tailoredfunctions"> | ||
##### multivarplot produces a graph for multiple variables along the same X axis (usually a timeline). Parameters: | |||
# a: Data.frame that has three columns: DateTime = x axis value, TagName = name of the variable, and Value = y axis value for the variable. | |||
# precision: a smoothing parameter (0 = no smoothing) | |||
# timeline: TRUE if DateTime has POSIXct format, FALSE if real number. | |||
=== | multivarplot <- function(a, precision = 0, timeline = FALSE) { | ||
par(mar=c(5, length(levels(a$TagName)) * 3.5 + 1.5, 4, 4) + 0.1) | |||
x <- 0 | |||
=== | |||
for (i in levels(a$TagName)) { | |||
=== | if(i != levels(a$TagName)[1]) par(new = TRUE) | ||
plot(if(precision == 0) list(x = a$DateTime[a$TagName == i], y = a$Value[a$TagName == i]) else loess.smooth(a$DateTime[a$TagName == i], a$Value[a$TagName == i], | |||
degree = 1, span = precision), axes = FALSE, xlab = "", ylab = "", type = "l", col = rainbow(length(levels(a$TagName)))[x + 1], main = "", | |||
xlim = c(min(a$DateTime), max(a$DateTime)), ylim = c(min(a$Value[a$TagName == i]) - sd(a$Value[a$TagName == i]) * | |||
=== | 0.1, max(a$Value[a$TagName == i]) + sd(a$Value[a$TagName == i]) * 0.1)) | ||
axis(2, col = rainbow(length(levels(a$TagName)))[x + 1],lwd = 2, line = x * 3.5) | |||
=== | mtext(2, text = i, line = x * 3.5 + 2, col = rainbow(length(levels(a$TagName)))[x + 1]) | ||
x <- x + 1 | |||
=== | } | ||
== | if(timeline){axis.POSIXct(1, a$DateTime)} else {axis(1, a$DateTime)} | ||
mtext("Time", side = 1, col = "black", line = 2) | |||
=== | } | ||
</rcode> | |||
==Rationale== | ==Rationale== | ||
Line 60: | Line 113: | ||
* [[R-tools]] | * [[R-tools]] | ||
* [[A Tutorial on R]] | * [[A Tutorial on R]] | ||
* [[List of R functions]] | |||
==References== | ==References== |
Latest revision as of 09:33, 26 August 2013
Moderator:Jouni (see all) |
This page is a stub. You may improve it into a full page. |
Upload data
|
Recommended R functions describes good practices for writing R code. The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.
Question
What are good practices for writing R code? The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.
Answer
Recommended structures
- When possible, use data.frames rather than arrays, tables or lists.
- Standard columns for data.frames:
- id: row identifier in the res table of Opasnet Base. Usually not needed and this can be sliced away.
- obs: row identifier in a data table or one uploaded piece of data. Technically, a piece of data that has the same series_id in Opasnet Base.
- iter: identifier of the iteration in a Monte Carlo simulation.
- There are several standard indices in Opasnet Base such as Year, Sex, Age, Lo (longitude), La (latitude),... If possible, use these names. For a full reference, see Opasnet Base Indices.
- Result: the actual result of the object, typically numeric. There are also other columns used for results, namely
- Result.Text: in Opasnet Base, results that are in text format are stored in this column, and
- Freq: when tapply() function is used, the summarised result is given in Freq column.
- Unit: If the unit can be different at different rows, a separate column Unit is needed.
- Description: Description can contain any descriptive information about the row. It is not used in calculations.
- For health impacts, there are standard tables to be used. For details, see Health impact assessment.
- For time tracking of working hours, there are standard tables to be used. For details, see op_fi:Aikakone.
- For listing several similar tables that should be bound rowwise, use standard tables described in Using summary tables.
Recommended generic functions
What to do | Functions to use routinely | Functions to avoid except in special cases | Examples and description |
---|---|---|---|
Manipulate data | data.frame | array | |
Draw raphs | ggplot, plot | ggplot requires library(ggplot2) | |
Summarise data along a criterion | tapply | ||
Join two data.frames | merge | IntArray | |
Add rows to a data.frame | rbind | ||
Add columns to a data.frame | cbind | ||
Transform a table from long to wide or vice versa | reshape | ||
Get data from Opasnet Base | op_baseGetData | Requires library(OpasnetBaseUtils) > op_baseGetData("opasnet_base", "Op_en4523")[, -c(1,2)] # Gets the object with identifier Op_en4523 and slices columns 1 and 2 (id, obs) away. | |
Get index values from Opasnet Base | op_baseGetLocs | Requires library(OpasnetBaseUtils) | |
Write data to Opasnet Base | op_baseWrite | Requires library(OpasnetBaseUtils). Only works from a THL computer, not R-tools | |
Slicing R objects | data[rows, cols], data$col | This is not a function but rather a list of practical ways of slicing an object. | |
match | |||
ifelse | |||
is.na | |||
colnames | |||
Convert between data types | as.numeric, as.character, as.factor | ||
Convert between object types | as.data.frame, as.table |
Recommended tailored functions
Code tailoredfunctions:
- multivarplot produces a graph for multiple variables along the same X axis (usually a timeline).
Rationale
Based on experience and testing.
See also
References
Related files
<mfanonymousfilelist></mfanonymousfilelist>