Recommended R functions: Difference between revisions
Jump to navigation
Jump to search
(table created from subtitles) |
(→Answer: recommended structures added) |
||
Line 10: | Line 10: | ||
==Answer== | ==Answer== | ||
===Recommended structures=== | |||
* When possible, use data.frames rather than arrays, tables or lists. | |||
* Standard columns for data.frames: | |||
** id: row identifier in the ''res'' table of [[Opasnet Base]]. Usually not needed and this can be sliced away. | |||
** obs: row identifier in a data table or one uploaded piece of data. Technically, a piece of data that has the same series_id in [[Opasnet Base]]. | |||
** iter: identifier of the iteration in a Monte Carlo simulation. | |||
** There are several standard indices in [[Opasnet Base]] such as Year, Sex, Age, Lo (longitude), La (latitude),... If possible, use these names. For a full reference, see [[Special:Opasnet Base Indices|Opasnet Base Indices]]. | |||
** Result: the actual result of the object, typically numeric. There are also other columns used for results, namely | |||
*** Result.Text: in Opasnet Base, results that are in text format are stored in this column, and | |||
*** Freq: when tapply() function is used, the summarised result is given in Freq column. | |||
** Unit: If the unit can be different at different rows, a separate column Unit is needed. | |||
** Description: Description can contain any descriptive information about the row. It is not used in calculations. | |||
* For health impacts, there are standard tables to be used. For details, see [[Health impact assessment]]. | |||
* For time tracking of working hours, there are standard tables to be used. For details, see [[:op_fi:Aikakone]]. | |||
* For listing several similar tables that should be bound rowwise, use standard tables described in [[Using summary tables]]. | |||
===Recommended functions=== | |||
{| class="wikitable sortable" {{prettytable}} | {| class="wikitable sortable" {{prettytable}} | ||
Line 51: | Line 70: | ||
|---- | |---- | ||
|} | |} | ||
==Rationale== | ==Rationale== |
Revision as of 09:41, 29 December 2011
Moderator:Jouni (see all) |
This page is a stub. You may improve it into a full page. |
Upload data
|
Recommended R functions describes good practices for writing R code. The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.
Question
What are good practices for writing R code? The code should be short, straightforward to understand, efficient to run, and similar to everyone else's code.
Answer
Recommended structures
- When possible, use data.frames rather than arrays, tables or lists.
- Standard columns for data.frames:
- id: row identifier in the res table of Opasnet Base. Usually not needed and this can be sliced away.
- obs: row identifier in a data table or one uploaded piece of data. Technically, a piece of data that has the same series_id in Opasnet Base.
- iter: identifier of the iteration in a Monte Carlo simulation.
- There are several standard indices in Opasnet Base such as Year, Sex, Age, Lo (longitude), La (latitude),... If possible, use these names. For a full reference, see Opasnet Base Indices.
- Result: the actual result of the object, typically numeric. There are also other columns used for results, namely
- Result.Text: in Opasnet Base, results that are in text format are stored in this column, and
- Freq: when tapply() function is used, the summarised result is given in Freq column.
- Unit: If the unit can be different at different rows, a separate column Unit is needed.
- Description: Description can contain any descriptive information about the row. It is not used in calculations.
- For health impacts, there are standard tables to be used. For details, see Health impact assessment.
- For time tracking of working hours, there are standard tables to be used. For details, see op_fi:Aikakone.
- For listing several similar tables that should be bound rowwise, use standard tables described in Using summary tables.
Recommended functions
What to do | Functions to use routinely | Functions to avoid except in special cases | Examples and description |
---|---|---|---|
Manipulate data | data.frame | array | |
Draw raphs | ggplot, plot | ggplot requires library(ggplot2) | |
Summarise data along a criterion | tapply | ||
Join two data.frames | merge | IntArray | |
Add rows to a data.frame | rbind | ||
Add columns to a data.frame | cbind | ||
Transform a table from long to wide or vice versa | reshape | ||
Get data from Opasnet Base | op_baseGetData | Requires library(OpasnetBaseUtils) > op_baseGetData("opasnet_base", "Op_en4523")[, -c(1,2)] # Gets the object with identifier Op_en4523 and slices columns 1 and 2 (id, obs) away. | |
Get index values from Opasnet Base | op_baseGetLocs | Requires library(OpasnetBaseUtils) | |
Write data to Opasnet Base | op_baseWrite | Requires library(OpasnetBaseUtils). Only works from a THL computer, not R-tools | |
Slicing R objects | data[rows, cols], data$col | This is not a function but rather a list of practical ways of slicing an object. | |
match | |||
ifelse | |||
is.na | |||
colnames | |||
Convert between data types | as.numeric, as.character, as.factor | ||
Convert between object types | as.data.frame, as.table |
Rationale
Based on experience and testing.
See also
References
Related files
<mfanonymousfilelist></mfanonymousfilelist>