Opasnet Base Connection for R: Difference between revisions

Latest revision as of 20:29, 10 April 2015

This page is a knowledge crystal of subtype method. The page identifier is Op_en4614
Moderator:Teemu R (see all)

Upload data {{#opasnet_base_link:Op_en4614}}

----#: . Should we merge this page with OpasnetBaseUtils? --Jouni 20:18, 28 December 2011 (EET) (type: truth; paradigms: science: comment)

Code for R for the purpose of interacting with the Opasnet Base is collected on this page. To use it, copy paste the code you need to the R console; this defines the functions, after which they can be called for in that R session. Or alternatively install the OpasnetBaseUtils package.

Package dependencies

These packages are required for most of the code to work. To install: from the top bar menu Packages>Install. To load: copy-paste.

library(RODBC)

Downloading data

op_baseGetData

+ Show code - Hide code

op_baseGetData <- function(dsn, ident, include = NULL, exclude = NULL, series_id = NULL, iterations = NULL, use.utf8 = TRUE, apply.utf8 = TRUE) {
	if (use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
	obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep=''))[1,1]
	if (length(series_id) == 0) {series_id <- sqlQuery(db, paste('SELECT series_id FROM actobj WHERE obj_id = ', obj_id, 
		' ORDER BY series_id DESC LIMIT 1', sep = ''))[1,1]}
	sliced <- FALSE
	locations <- NULL
	x <- 1
	basequery <- paste('SELECT loccell.cell_id FROM actobj LEFT JOIN cell ON actobj.id = cell.actobj_id LEFT JOIN', 
			' loccell ON cell.id = loccell.cell_id WHERE actobj.obj_id = ', obj_id, ' AND actobj.series_id = ', 
			series_id, ' AND loccell.loc_id IN(', sep = '')
	if (length(include) != 0) {
		sliced <- TRUE
		locations[x] <- paste("IN(", basequery, paste(include, collapse = ","), ")", sep = "")
		x <- x + 1
	}
	if (length(exclude) != 0) {
		sliced <- TRUE
		locations[x] <- paste("NOT IN(", basequery, paste(exclude, collapse = ","), ")", sep = "")
	}
	if (sliced == FALSE) {
		Data <- sqlQuery(db, paste('SELECT cell.id, res.obs, obj.ident AS ind, loc.location AS loc, res.result,', 
			' res.restext FROM actobj LEFT JOIN cell ON actobj.id = cell.actobj_id LEFT JOIN res ON cell.id =', 
			' res.cell_id LEFT JOIN loccell ON cell.id = loccell.cell_id LEFT JOIN loc ON loccell.loc_id',
			' = loc.id LEFT JOIN obj ON loc.obj_id_i = obj.id WHERE actobj.obj_id = ', obj_id, 
			' AND actobj.series_id = ', series_id, if(length(iterations)==1){paste(" AND obs <= ", iterations, 
			sep = "")}, sep = '')) } else {
		Data <- sqlQuery(db, paste('SELECT cell.id, res.obs, obj.ident AS ind, loc.location AS loc, res.result,', 
			' res.restext FROM actobj LEFT JOIN cell ON actobj.id = cell.actobj_id LEFT JOIN res ON cell.id =', 
			' res.cell_id LEFT JOIN loccell ON cell.id = loccell.cell_id LEFT JOIN loc ON loccell.loc_id',
			' = loc.id LEFT JOIN obj ON loc.obj_id_i = obj.id WHERE actobj.obj_id = ', obj_id, 
			' AND actobj.series_id = ', series_id, if(length(iterations)==1){paste(" AND obs <= ", iterations, 
			sep = "")}, ' AND (cell.id ', paste(locations, collapse = ') AND cell.id '), 
			'))', sep = ''))
	}
	odbcClose(db)
	Data <- Data[order(Data[,1], Data[,2], Data[,3]),]
	nind <- length(levels(Data[,3]))
	nres <- nrow(Data)/nind
	dataframe <- Data[1:nres*nind, c(1,2)]
	for (i in 1:nind) {
		dataframe[,2 + i] <- factor(Data[1:nres*nind - (nind - i), 4])
		levels(dataframe[,2 + i]) <- gsub(" *$", "",gsub("^ *", "", levels(dataframe[,2 + i])))
		colnames(dataframe)[2 + i] <- as.character(Data[i, 3])
		if(apply.utf8) Encoding(levels(dataframe[,2 + i])) <- "UTF-8"
	}
	dataframe[,1:2 + 2 + nind] <- Data[1:nres*nind, 5:6]
	colnames(dataframe)[1:2 + 2 + nind] <- c("Result", "Result.Text")
	if(apply.utf8) {if(is.factor(dataframe[,"Result.Text"])) {Encoding(levels(dataframe[,"Result.Text"])) <- "UTF-8"} else if(is.character(dataframe[,"Result.Text"])) {
		Encoding(dataframe[,"Result.Text"]) <- "UTF-8"}}
	rownames(dataframe) <- 1:nres
	return(dataframe)
}

Usage

variable <- op_baseGetData("opasnet_base", "page identifier", include = vector_of_loc_ids, exclude = vector_of_loc_ids)

Assuming "opasnet_base" is a correctly defined DSN (Data Service Name; in Windows XP: Control Panel\Administrative tools\Data Sources (ODBC)).
Include and exclude are optional.
- Include picks all cells in the locations given.
  - The clearest case is when all the included locations belong to the same index: Any cells in the non-included locations of the index will be left out.
  - In case given locations are in multiple indices: The effect produced will be the same as picking separately for each index and removing duplicates.
- Exclude unpicks any cells which are indexed by the locations given. Slower than include.
- They can be used in unison.
Result will be in a table format with columns: id, obs, ind1, ind2 ... indn, Result.
series_id is an optional parameter, if it is not given the most current upload of the data will be downloaded.

Finding index data

op_baseGetLocs

+ Show code - Hide code

op_baseGetLocs <- function(dsn, ident, series_id = NULL, use.utf8 = TRUE, apply.utf8 = TRUE) {
	if (use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
	obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep=''))[1,1]
	if (length(series_id) == 0) {series_id <- sqlQuery(db, paste('SELECT series_id FROM actobj WHERE obj_id = ', obj_id, 
		' ORDER BY series_id DESC LIMIT 1', sep = ''))[1,1]}
	Locs <- sqlQuery(db, paste("SELECT DISTINCT obj.ident AS ind, loc.location AS loc, loc.id AS loc_id", 
		" FROM actobj LEFT JOIN actloc ON actobj.id = actloc.actobj_id LEFT JOIN loc ON actloc.loc_id = loc.id", 
		" LEFT JOIN obj ON loc.obj_id_i = obj.id WHERE actobj.obj_id = ", obj_id, " AND actobj.series_id = ", 
		series_id, sep = ""))
	odbcClose(db)
	Locs <- Locs[order(Locs$ind, Locs$loc_id),]
	rownames(Locs) <- 1:nrow(Locs)
	if(apply.utf8) {Encoding(levels(Locs$ind)) <- "UTF-8"; Encoding(levels(Locs$loc)) <- "UTF-8"}
	return(Locs)
}

Returns all indexes and locations and their ids in a table of format: ind, loc, loc_id.

Manipulating data

DataframeToArray

+ Show code - Hide code

DataframeToArray <- function(dataframe, rescol = NULL) {
	ColNames <- colnames(dataframe[(colnames(dataframe) == "id") == FALSE])
	if (length(ColNames[(ColNames == "obs")])>0) {if(length(levels(factor(dataframe[,"obs"]))) == 1) {ColNames <- ColNames[
	(ColNames == "obs") == FALSE]} else {dataframe[,"obs"] <- factor(as.character(dataframe[,"obs"]))}}
	if (length(rescol)==0) {
	rescol <- colnames(dataframe) == "Freq"
	if (length(rescol[rescol==TRUE]) >= 1) rescol <- "Freq" else {
		rescol <- colnames(dataframe) == "Result"
		if (length(rescol[rescol==TRUE]) >= 1) rescol <- "Result" else {
			rescol <- colnames(dataframe) == "result"
			if (length(rescol[rescol==TRUE]) >= 1) rescol <- "result" else stop("No result column found")
		}
	}
	} else {ColNames <- ColNames[(ColNames == rescol) == FALSE]}
	ColNames <- ColNames[(ColNames == "Result") == FALSE]
	ColNames <- ColNames[(ColNames == "result") == FALSE]
	ColNames <- ColNames[(ColNames == "Freq") == FALSE]
	nind <- length(ColNames)
	DimNames <- rep(vector("list", 1), nind)
	names(DimNames) <- ColNames
	indlengths <- 0
	for (i in 1:nind) {
		DimNames[[i]] <- levels(factor(dataframe[,ColNames[i]]))
		indlengths[i] <- length(DimNames[[i]])
	}
	array <- rep(NA, prod(indlengths))
	dim(array) <- indlengths
	dimnames(array) <- DimNames
	array[as.matrix(dataframe[,ColNames])] <- dataframe[,rescol]
	return(array)
}

Usage

variable2 <- DataframeToArray(variable1, rescol = NULL)

variable1 must be in similar format as the result when downloading.
Columns named "id" and various versions of "Result" are ignored for dimension creation.
- "obs" column will also be ignored if there's only one.
The column containing the values may be defined in the parameters, otherwise it is assumed to be either "Freq", "Result" or "result" in that order.

Uploading data

op_baseWrite

+ Show code - Hide code

op_baseWrite <- function(dsn, input, ident = NULL, name = NULL, unit = NULL, objtype_id = NULL, who = NULL, acttype = NULL, 
	rescol = NULL, n.obs.const = FALSE, maxrows = 50000, use.utf8 = TRUE, use.utf8.read = TRUE, latin1.2.utf8.conv.write = TRUE, utf8.2.latin1.conv.read = TRUE) {
	
	# Coerce input into a data frame if it isn't one already; get rid of empty cells
	if (is.array(input)) dataframe <- as.data.frame(as.table(input)) else dataframe <- input
	if (is.null(rescol)) {
	rescol <- colnames(dataframe) == "Freq"
	if (sum(rescol) == 1) rescol <- "Freq" else {
		rescol <- colnames(dataframe) == "Result"
		if (sum(rescol) == 1) rescol <- "Result" else {
			rescol <- colnames(dataframe) == "result"
			if (sum(rescol) == 1) rescol <- "result"
		}
	}}
	dataframe <- dataframe[is.na(dataframe[,rescol]) == FALSE,]
	ColNames <- colnames(dataframe)[!(colnames(dataframe)%in%c(rescol, "id", "obs"))]
	for (i in ColNames) {
		dataframe[,i] <- factor(dataframe[,i])
		levels(dataframe[,i]) <- gsub(" *$", "",gsub("^ *", "", levels(dataframe[,i])))
		if(latin1.2.utf8.conv.write) if(sum(Encoding(levels(dataframe[,i]))=="latin1")!=0) levels(dataframe[,i]) <- iconv(levels(dataframe[,i]), "latin1", "UTF-8")
	}
	
	#if(!is.numeric(dataframe[,rescol])) 
	
	# Open database connection
	if(use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
	if(!use.utf8.read) db2 <- odbcConnect(dsn)
	
	# Add page to database (if it doesn't already exist)
	if (is.null(ident)) if (interactive()) ident <- readline(paste("What is the identifier of this object?", 
		"\n", sep = "")) else stop("indentifier of object no given")
	obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep = ''))[1,1]
	if (is.na(obj_id)) {
		
		# Wiki id
		if (substr(ident, 1,5)=="Op_en") {wiki_id <- 1; page <- substr(ident, 6, nchar(ident))} else {
		if (substr(ident, 1,5)=="Op_fi") {wiki_id <- 2; page <- substr(ident, 6, nchar(ident))} else {
		if (substr(ident, 1,6)=="Heande") {wiki_id <- 6; page <- substr(ident, 7, nchar(ident))} else {
		if (substr(ident, 1,4)=="Erac") {wiki_id <- 6; page <- substr(ident, 5, nchar(ident))} else {
		wiki_id <- 0; page <- 0; warning("No wiki id found in ident, writing zero.")}}}}
		page <- as.numeric(page)
		if (is.na(page)) stop("could not convert characters following the wiki ident into a page number")
		
		# Name etc.
		if (is.null(name)) if (interactive()) name <- readline(paste("What is the name of this object?", 
			"\n", sep = "")) else stop("object name not given")
		if (is.null(objtype_id)) if (interactive()) objtype_id <- readline(paste("What type of object is", 
			" this (id)?", paste(paste(sqlQuery(db, "SELECT id FROM objtype")[,1], sqlQuery(db, paste("SELECT objtype",
			" FROM objtype", sep = ""))[,1], sep = " - "), collapse = ", "), "\n", collapse = " ")) else {
			stop("object type not given")}
		sqlQuery(db, paste('INSERT INTO obj (ident, name, objtype_id, page, wiki_id) VALUES ("', paste(ident, 
			name, sep = '","'), '",', paste(objtype_id, page, wiki_id, sep = ','), ')', sep = ''))
		obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep = ''))[1,1]
	}
	
	# Write act and actobj
	if (is.null(who)==TRUE) if (interactive()) {who <- readline(paste("What is the name of the uploader?", "\n", sep = "")) 
		} else stop("uploader name not given")
	series_id <- sqlQuery(db, paste("SELECT series_id FROM actobj WHERE obj_id = ", obj_id, " ORDER BY series_id DESC LIMIT 1", 
		sep = ""))[1,1]
	if (is.na(series_id)==FALSE) {if (is.null(acttype)==TRUE) {if (interactive()) {acttype <- readline(paste("What type of upload", 
		" is this? 4 - new data to replace any existing, 5 - new data to be appended to existing data (must have the same", 
		" indices).", "\n", sep = "")) 
		} else acttype <- 4 
		}} else acttype <- 4
	if (!(acttype%in%c(4,5))) stop ("proper acttype not given")
	
	sqlQuery(db, paste('INSERT INTO act (acttype_id, who, comments) VALUES (', acttype, ',"', who, '","R upload")', sep = ''))
	act_id <- sqlQuery(db, paste('SELECT id FROM act WHERE who = "', who,'" AND comments = "R upload" ORDER BY id DESC LIMIT 1', 
		sep = ''))[1,1]
	if (acttype == 4) series_id <- act_id
	if (is.null(unit)) if (interactive()) unit <- readline(paste("What is the unit of this object?", 
		"\n", sep = "")) else stop("unit not given")
	sqlQuery(db, paste('INSERT INTO actobj (act_id, obj_id, series_id, unit) VALUES (', paste(act_id, obj_id, series_id, sep = ','), 
		',"', unit, '")', sep = ''))
	actobj_id <- sqlQuery(db, paste('SELECT id FROM actobj WHERE act_id = ', act_id, sep = ''))[1,1]
	
	#Write indexes
	for (i in ColNames) {
		sqlQuery(db, paste('INSERT IGNORE INTO obj (ident, name, objtype_id) VALUES ("', gsub(' ', '_', i), '","', 
			i, '", 6)', sep = ''))
	}
	IndIds <- sqlQuery((if(use.utf8.read) db else db2), paste('SELECT id, ident FROM obj WHERE ident IN("', paste(gsub(" ", "_", ColNames), 
		collapse = '","'), '")', sep = ''))
	if(utf8.2.latin1.conv.read) levels(IndIds$ident) <- iconv(levels(IndIds$ident), "UTF-8", "latin1")
	IndIdMap <- IndIds$id
	names(IndIdMap) <- tolower(IndIds$ident)
	ColIds <- as.character(IndIdMap[tolower(gsub(" ", "_", ColNames))])
	colnames(dataframe)[colnames(dataframe)%in%ColNames] <- ColIds
	
	#Write locations
	for (i in ColIds) {
		for (j in levels(dataframe[, i])) {
			sqlQuery(db, paste('INSERT IGNORE INTO loc (obj_id_i, location) VALUES (', i, ',"', j, '")', 
				sep = ''))
		}
	}
	LocIds <- sqlQuery((if(use.utf8.read) db else db2), paste('SELECT id, obj_id_i, location FROM loc WHERE obj_id_i IN("', paste(ColIds, collapse = '","'), 
		'")', sep = ''))
	if(utf8.2.latin1.conv.read) levels(LocIds$location) <- iconv(levels(LocIds$location), "UTF-8", "latin1")
	
	for (i in ColIds) {
		LocIdMap <- LocIds[LocIds$obj_id_i == i, 1]
		names(LocIdMap) <- gsub(" *$", "",gsub("^ *", "", tolower(LocIds[LocIds$obj_id_i == i, 3])))
		levels(dataframe[, i]) <- LocIdMap[tolower(levels(dataframe[, i]))]
		if (sum(is.na(levels(dataframe[, i]))) != 0) stop("Faulty location matching. Usually caused by special characters.")
		
		#Writing actloc
		sqlQuery(db, paste("INSERT INTO actloc (actobj_id, loc_id) VALUES (", paste(actobj_id, levels(dataframe[, i]), 
		sep = ",", collapse = "),("), ")", sep = ""))
	}
	
	#Writing cell
	n <- tapply(dataframe[,rescol], dataframe[,ColIds], length)
	ncell <- sum(!is.na(n))
	if (is.numeric(dataframe[,rescol])) means <- tapply(dataframe[,rescol], dataframe[,ColIds], mean) else means <- rep(0, ncell)
	if (is.numeric(dataframe[,rescol])) {
		sds <- tapply(dataframe[,rescol], dataframe[,ColIds], sd); sds[] <- ifelse(n == 1, 0, sds)} else sds <- rep(0, ncell)
	
	cellQuery <- paste(actobj_id, means[!is.na(means)], sds[!is.na(sds)], n[!is.na(n)], sep = ",")
	i <- 1
	while (length(cellQuery) >= (i + maxrows - 1)) {
		sqlQuery(db, paste('INSERT INTO cell (actobj_id, mean, sd, n) VALUES (', paste(cellQuery[i:(i + maxrows - 1)], 
			collapse = '),('), ')', sep = ''))
		i <- i + maxrows
	}
	if (length(cellQuery) %% maxrows != 0) {
		sqlQuery(db, paste('INSERT INTO cell (actobj_id, mean, sd, n) VALUES (', 
			paste(cellQuery[i:length(cellQuery)], collapse = '),('), ')', sep = ''))
	}
	
	#Writing res
	cell_id <- sqlQuery(db, paste("SELECT id FROM cell WHERE actobj_id = ", actobj_id, " ORDER BY ID", sep = ""))[,1]
	if (length(cell_id) != ncell) stop("number of written cells differs from given data")
	if (is.numeric(dataframe[,rescol])) ids <- means else ids <- n
	ids[!is.na(ids)] <- cell_id
	dataframe[, ncol(dataframe) + 1] <- ids[as.matrix(dataframe[,ColIds])]
	colnames(dataframe)[ncol(dataframe)] <- "cell_id"
	
	resQuery <- paste(dataframe[,"cell_id"], ',', if(sum(colnames(dataframe) == "obs") == 0) 1 else dataframe[,"obs"], ',', 
		if (!is.numeric(dataframe[,rescol])) '"', dataframe[,rescol], if (!is.numeric(dataframe[,rescol])) '"', sep = "")
	i <- 1
	while (length(resQuery) >= (i + maxrows - 1)) {
		sqlQuery(db, paste('INSERT INTO res (cell_id, obs, ', ifelse(is.numeric(dataframe[,rescol]), "result", "restext"), ') VALUES (', 
			paste(resQuery[i:(i + maxrows - 1)], collapse = '),('), ')', sep = ''))
		i <- i + maxrows
	}
	if (length(resQuery) %% maxrows != 0) {
		sqlQuery(db, paste('INSERT INTO res (cell_id, obs, ', ifelse(is.numeric(dataframe[,rescol]), "result", "restext"), ') VALUES (', 
			paste(resQuery[i:length(resQuery)], collapse = '),('), ')', sep = ''))
	}
	
	#Writing loccell
	ids <- as.data.frame(as.table(ids))
	ids <- ids[!is.na(ids$Freq),]
	loccellQuery <- paste(ids$Freq, unlist(ids[,-ncol(ids)]), sep = ",")
	i <- 1
	while (length(loccellQuery) >= (i + maxrows - 1)) {
		sqlQuery(db, paste('INSERT INTO loccell (cell_id, loc_id) VALUES (', paste(loccellQuery[i:(i + maxrows - 1)], collapse = '),('), ')', 
			sep = ''))
		i <- i + maxrows
	}
	if (length(loccellQuery) %% maxrows != 0) {
		sqlQuery(db, paste('INSERT INTO loccell (cell_id, loc_id) VALUES (', paste(loccellQuery[i:length(loccellQuery)], collapse = '),('), ')', 
			sep = ''))
	}
	
	#Close database connection
	odbcClose(db)
	cat("Successful\n")
	return(character())
}

Usage

op_baseWrite(dsn, input, ident = NULL, name = NULL, unit = NULL, objtype_id = NULL, who = NULL, acttype = NULL, rescol = NULL)

dsn and input must be defined, the rest of the object and act parameters if not defined are prompted for by the function as needed.
- For uploading the DSN defined must have writers permissions.
rescol defines the column from which the values are chosen from, both numerical and textual data are allowed, if left undefined the function will check column matches for "Freq", "Result" and "result" in that order.

Restrictions

Input may only be given in either array or data.frame form.
- Indexes used may not exceed the character limit of 20.
  - Indexes should preferably match an earlier entry: Special:OpasnetBaseIndices.
  - Indexes are treated as identifiers for indexes in the database, spaces in the indexes are converted to _. This ensures maximum compatibility and ease in operations in which data is downloaded and uploaded again. Names and more specific details can be edited into the indexes separately.

Regarding special characters and character encoding

Using special characters like ä and ö when our database is encoded in latin1 while wiki is in UTF-8 is a bit complicated. New parameters -- for forcing the odbc connection to use UTF-8 and in-R conversion of one encoding to the other prior to writing or after reading -- have been created. The defaults have been made to work with opasnet_base. For use with heande_base, forcing UTF-8 on the odbc connection when reading should be disabled, meaning that use.utf8 should be set to FALSE when using op_baseGetData and use.utf8.read should be set to FALSE when using op_baseWrite.

Pages related to Opasnet Base	Opasnet Base · Uploading to Opasnet Base · Data structures in Opasnet · Opasnet Base UI · Modelling in Opasnet · Special:Opasnet Base Import · Opasnet Base Connection for R (needs updating) · Converting KOPRA data into Opasnet Base · Poll · Working with sensitive data · Saved R objects
Pages related to the 2008-2011 version of Opasnet Base	Opasnet base connection for Analytica · Opasnet base structure · Related Analytica file (old version File:Transferring to result database.ANA) · Analytica Web Player · Removed pages and other links · Standard run · OpasnetBaseUtils

Opasnet Base Connection for R: Difference between revisions

Latest revision as of 20:29, 10 April 2015

Contents

Package dependencies

Downloading data

Usage

Finding index data

Manipulating data

Usage

Uploading data

Usage

Restrictions

Regarding special characters and character encoding

See also

Navigation menu

@@ Line 4: / Line 4: @@
 [[Category:SQL tool]]
 [[Category:Open assessment]]
+[[Heande:Opasnet Base Connection for R]]
+[[Category:Code under inspection]]
 {{tool|moderator=Teemu R}}
-Code for R for the purpose of interacting with the [[Opasnet Base]] is collected on this page. To use it, copy paste the code you need to the R console; this defines the functions, after which they can be called for in that R session.
+{{comment|# |Should we merge this page with [[OpasnetBaseUtils]]?|--[[User:Jouni|Jouni]] 20:18, 28 December 2011 (EET)}}
+Code for [[R]] for the purpose of interacting with the [[Opasnet Base]] is collected on this page. To use it, copy paste the code you need to the R console; this defines the functions, after which they can be called for in that R session. Or alternatively install the [[OpasnetBaseUtils]] package.
 ==Package dependencies==
@@ Line 11: / Line 15: @@
 These packages are required for most of the code to work. To install: from the top bar menu Packages>Install. To load: copy-paste.
- library(utils)
   library(RODBC)
 ==Downloading data==
-===Functions===
+'''op_baseGetData
- op_baseGetData <- function(dsn, ident, include = NULL, exclude = NULL, series_id = NULL) {
+<rcode name="op_baseGetData">
- 	db <- odbcConnect(dsn)
+op_baseGetData <- function(dsn, ident, include = NULL, exclude = NULL, series_id = NULL, iterations = NULL, use.utf8 = TRUE, apply.utf8 = TRUE) {
- 	<nowiki>obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep=''))[1,1]</nowiki>
+	if (use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
- 	if (length(series_id) == 0) {series_id <- sqlQuery(db, paste('SELECT series_id FROM actobj WHERE obj_id = ', obj_id,
+	obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep=''))[1,1]
- 		<nowiki>' ORDER BY series_id DESC LIMIT 1', sep = ''))[1,1]}</nowiki>
+	if (length(series_id) == 0) {series_id <- sqlQuery(db, paste('SELECT series_id FROM actobj WHERE obj_id = ', obj_id,
- 	sliced <- FALSE
+		' ORDER BY series_id DESC LIMIT 1', sep = ''))[1,1]}
- 	locations <- NULL
+	sliced <- FALSE
- 	x <- 1
+	locations <- NULL
- 	basequery <- paste('SELECT loccell.cell_id FROM actobj LEFT JOIN cell ON actobj.id = cell.actobj_id LEFT JOIN',
+	x <- 1
- 			' loccell ON cell.id = loccell.cell_id WHERE actobj.obj_id = ', obj_id, ' AND actobj.series_id = ',
+	basequery <- paste('SELECT loccell.cell_id FROM actobj LEFT JOIN cell ON actobj.id = cell.actobj_id LEFT JOIN',
- 			<nowiki>series_id, ' AND loccell.loc_id IN(', sep = '')</nowiki>
+			' loccell ON cell.id = loccell.cell_id WHERE actobj.obj_id = ', obj_id, ' AND actobj.series_id = ',
- 	if (length(include) != 0) {
+			series_id, ' AND loccell.loc_id IN(', sep = '')
- 		sliced <- TRUE
+	if (length(include) != 0) {
- 		locations[x] <- paste("IN(", basequery, paste(include, collapse = ","), ")", sep = "")
+		sliced <- TRUE
- 		x <- x + 1
+		locations[x] <- paste("IN(", basequery, paste(include, collapse = ","), ")", sep = "")
- 	}
+		x <- x + 1
- 	if (length(exclude) != 0) {
+	}
- 		sliced <- TRUE
+	if (length(exclude) != 0) {
- 		locations[x] <- paste("NOT IN(", basequery, paste(exclude, collapse = ","), ")", sep = "")
+		sliced <- TRUE
- 	}
+		locations[x] <- paste("NOT IN(", basequery, paste(exclude, collapse = ","), ")", sep = "")
- 	if (sliced == FALSE) {
+	}
- 		Data <- sqlQuery(db, paste('SELECT cell.id, res.obs, obj.ident AS ind, loc.location AS loc, res.result',
+	if (sliced == FALSE) {
- 			' FROM actobj LEFT JOIN cell ON actobj.id = cell.actobj_id LEFT JOIN res ON cell.id =',
+		Data <- sqlQuery(db, paste('SELECT cell.id, res.obs, obj.ident AS ind, loc.location AS loc, res.result,',
- 			' res.cell_id LEFT JOIN loccell ON cell.id = loccell.cell_id LEFT JOIN loc ON loccell.loc_id',
+			' res.restext FROM actobj LEFT JOIN cell ON actobj.id = cell.actobj_id LEFT JOIN res ON cell.id =',
- 			' = loc.id LEFT JOIN obj ON loc.obj_id_i = obj.id WHERE actobj.obj_id = ', obj_id,
+			' res.cell_id LEFT JOIN loccell ON cell.id = loccell.cell_id LEFT JOIN loc ON loccell.loc_id',
- 			<nowiki>' AND actobj.series_id = ', series_id, sep = '')) } else {</nowiki>
+			' = loc.id LEFT JOIN obj ON loc.obj_id_i = obj.id WHERE actobj.obj_id = ', obj_id,
- 		Data <- sqlQuery(db, paste('SELECT cell.id, res.obs, obj.ident AS ind, loc.location AS loc, res.result',
+			' AND actobj.series_id = ', series_id, if(length(iterations)==1){paste(" AND obs <= ", iterations,
- 			' FROM actobj LEFT JOIN cell ON actobj.id = cell.actobj_id LEFT JOIN res ON cell.id =',
+			sep = "")}, sep = '')) } else {
- 			' res.cell_id LEFT JOIN loccell ON cell.id = loccell.cell_id LEFT JOIN loc ON loccell.loc_id',
+		Data <- sqlQuery(db, paste('SELECT cell.id, res.obs, obj.ident AS ind, loc.location AS loc, res.result,',
- 			' = loc.id LEFT JOIN obj ON loc.obj_id_i = obj.id WHERE actobj.obj_id = ', obj_id,
+			' res.restext FROM actobj LEFT JOIN cell ON actobj.id = cell.actobj_id LEFT JOIN res ON cell.id =',
- 			' AND actobj.series_id = ', series_id, ' AND (cell.id ', paste(locations, collapse = ') AND cell.id '),
+			' res.cell_id LEFT JOIN loccell ON cell.id = loccell.cell_id LEFT JOIN loc ON loccell.loc_id',
- 			<nowiki>'))', sep = ''))</nowiki>
+			' = loc.id LEFT JOIN obj ON loc.obj_id_i = obj.id WHERE actobj.obj_id = ', obj_id,
- 	}
+			' AND actobj.series_id = ', series_id, if(length(iterations)==1){paste(" AND obs <= ", iterations,
- 	odbcClose(db)
+			sep = "")}, ' AND (cell.id ', paste(locations, collapse = ') AND cell.id '),
- 	Data <- Data[order(Data[,1], Data[,2], Data[,3]),]
+			'))', sep = ''))
- 	nind <- length(levels(Data[,3]))
+	}
- 	nres <- nrow(Data)/nind
+	odbcClose(db)
- 	dataframe <- Data[1:nres*nind, c(1,2)]
+	Data <- Data[order(Data[,1], Data[,2], Data[,3]),]
- 	for (i in nind:1) {
+	nind <- length(levels(Data[,3]))
- 		dataframe[,2 + nind - i + 1] <- Data[1:nres*nind - i + 1, 4]
+	nres <- nrow(Data)/nind
- 		colnames(dataframe)[2 + nind - i + 1] <- as.character(Data[nind - i + 1, 3])
+	dataframe <- Data[1:nres*nind, c(1,2)]
- 	}
+	for (i in 1:nind) {
- 	dataframe[,3 + nind] <- Data[1:nres*nind, 5]
+		dataframe[,2 + i] <- factor(Data[1:nres*nind - (nind - i), 4])
- 	colnames(dataframe)[3 + nind] <- "Result"
+		levels(dataframe[,2 + i]) <- gsub(" *$", "",gsub("^ *", "", levels(dataframe[,2 + i])))
- 	rownames(dataframe) <- 1:nres
+		colnames(dataframe)[2 + i] <- as.character(Data[i, 3])
- 	dataframe
+		if(apply.utf8) Encoding(levels(dataframe[,2 + i])) <- "UTF-8"
- }
+	}
+	dataframe[,1:2 + 2 + nind] <- Data[1:nres*nind, 5:6]
+	colnames(dataframe)[1:2 + 2 + nind] <- c("Result", "Result.Text")
+	if(apply.utf8) {if(is.factor(dataframe[,"Result.Text"])) {Encoding(levels(dataframe[,"Result.Text"])) <- "UTF-8"} else if(is.character(dataframe[,"Result.Text"])) {
+		Encoding(dataframe[,"Result.Text"]) <- "UTF-8"}}
+	rownames(dataframe) <- 1:nres
+	return(dataframe)
+}
+</rcode>
-====Usage====
+===Usage===
   variable <- op_baseGetData("opasnet_base", "page identifier", include = vector_of_loc_ids, exclude = vector_of_loc_ids)
@@ Line 82: / Line 93: @@
 ===Finding index data===
-====Function====
+'''op_baseGetLocs
- op_baseGetLocs <- function(dsn, ident, series_id = NULL) {
+<rcode name="op_baseGetLocs">
- 	db <- odbcConnect(dsn)
+op_baseGetLocs <- function(dsn, ident, series_id = NULL, use.utf8 = TRUE, apply.utf8 = TRUE) {
- 	<nowiki>obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep=''))[1,1]</nowiki>
+	if (use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
- 	if (length(series_id) == 0) {series_id <- sqlQuery(db, paste('SELECT series_id FROM actobj WHERE obj_id = ', obj_id,
+	obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep=''))[1,1]
- 		<nowiki>' ORDER BY series_id DESC LIMIT 1', sep = ''))[1,1]}</nowiki>
+	if (length(series_id) == 0) {series_id <- sqlQuery(db, paste('SELECT series_id FROM actobj WHERE obj_id = ', obj_id,
- 	Locs <- sqlQuery(db, paste("SELECT obj.name AS ind, loc.location AS loc, loc.id AS loc_id",
+		' ORDER BY series_id DESC LIMIT 1', sep = ''))[1,1]}
- 		" FROM actobj LEFT JOIN actloc ON actobj.id = actloc.actobj_id LEFT JOIN loc ON actloc.loc_id = loc.id",
+	Locs <- sqlQuery(db, paste("SELECT DISTINCT obj.ident AS ind, loc.location AS loc, loc.id AS loc_id",
- 		" LEFT JOIN obj ON loc.obj_id_i = obj.id WHERE actobj.obj_id = ", obj_id, " AND actobj.series_id = ",
+		" FROM actobj LEFT JOIN actloc ON actobj.id = actloc.actobj_id LEFT JOIN loc ON actloc.loc_id = loc.id",
- 		series_id, sep = ""))
+		" LEFT JOIN obj ON loc.obj_id_i = obj.id WHERE actobj.obj_id = ", obj_id, " AND actobj.series_id = ",
- 	odbcClose(db)
+		series_id, sep = ""))
- 	Locs <- Locs[order(Locs[,1]),]
+	odbcClose(db)
- 	rownames(Locs) <- 1:nrow(Locs)
+	Locs <- Locs[order(Locs$ind, Locs$loc_id),]
- 	Locs
+	rownames(Locs) <- 1:nrow(Locs)
- }
+	if(apply.utf8) {Encoding(levels(Locs$ind)) <- "UTF-8"; Encoding(levels(Locs$loc)) <- "UTF-8"}
+	return(Locs)
+}
+</rcode>
 *Returns all indexes and locations and their ids in a table of format: ind, loc, loc_id.
@@ Line 103: / Line 117: @@
 ==Manipulating data==
-===Functions===
+'''DataframeToArray
- DataframeToArray <- function(dataframe) {
+<rcode name="DataframeToArray">
- 	DimNames <- rep(vector("list", 1), ncol(dataframe) - 2)
+DataframeToArray <- function(dataframe, rescol = NULL) {
- 	names(DimNames) <- c(colnames(dataframe)[2:(ncol(dataframe) - 1)])
+	ColNames <- colnames(dataframe[(colnames(dataframe) == "id") == FALSE])
- 	nind <- length(names(DimNames))
+	if (length(ColNames[(ColNames == "obs")])>0) {if(length(levels(factor(dataframe[,"obs"]))) == 1) {ColNames <- ColNames[
- 	indlengths <- 0
+	(ColNames == "obs") == FALSE]} else {dataframe[,"obs"] <- factor(as.character(dataframe[,"obs"]))}}
- 	for (i in 1:nind) {
+	if (length(rescol)==0) {
- 		<nowiki>DimNames[[i]] <- levels(factor(dataframe[,1 + i]))</nowiki>
+	rescol <- colnames(dataframe) == "Freq"
- 		<nowiki>indlengths[i] <- length(DimNames[[i]])</nowiki>
+	if (length(rescol[rescol==TRUE]) >= 1) rescol <- "Freq" else {
- 	}
+		rescol <- colnames(dataframe) == "Result"
- 	array <- rep(NA, prod(indlengths))
+		if (length(rescol[rescol==TRUE]) >= 1) rescol <- "Result" else {
- 	dim(array) <- indlengths
+			rescol <- colnames(dataframe) == "result"
- 	dimnames(array) <- DimNames
+			if (length(rescol[rescol==TRUE]) >= 1) rescol <- "result" else stop("No result column found")
- 	array[as.matrix(dataframe[,c(1:nind + 1)])] <- dataframe[,ncol(dataframe)]
+		}
- 	array
+	}
- }
+	} else {ColNames <- ColNames[(ColNames == rescol) == FALSE]}
+	ColNames <- ColNames[(ColNames == "Result") == FALSE]
+	ColNames <- ColNames[(ColNames == "result") == FALSE]
+	ColNames <- ColNames[(ColNames == "Freq") == FALSE]
+	nind <- length(ColNames)
+	DimNames <- rep(vector("list", 1), nind)
+	names(DimNames) <- ColNames
+	indlengths <- 0
+	for (i in 1:nind) {
+		DimNames[[i]] <- levels(factor(dataframe[,ColNames[i]]))
+		indlengths[i] <- length(DimNames[[i]])
+	}
+	array <- rep(NA, prod(indlengths))
+	dim(array) <- indlengths
+	dimnames(array) <- DimNames
+	array[as.matrix(dataframe[,ColNames])] <- dataframe[,rescol]
+	return(array)
+}
+</rcode>
-{{attack|# |This does not work for probabilistic data. ''as.matrix(dataframe[,c(1:nind + 1)])'' gives values like "&nbsp;&nbsp;&nbsp;1", "&nbsp;&nbsp;&nbsp;2" for obs, but the array entries for obs are "1", "2" etc. They do not match.|--[[User:Jouni|Jouni]] 07:00, 4 November 2010 (UTC)}}
+===Usage===
-====Usage====
+  variable2 <- DataframeToArray(variable1, rescol = NULL)
-  variable2 <- DataframeToArray(variable1)
 *variable1 must be in similar format as the result when downloading.
-**first column (id) is ignored
+*Columns named "id" and various versions of "Result" are ignored for dimension creation.
-*variable1 <- function(variable1) is allowed in R
+**"obs" column will also be ignored if there's only one.
+*The column containing the values may be defined in the parameters, otherwise it is assumed to be either "Freq", "Result" or "result" in that order.
-===Other useful stuff===
- as.data.frame(as.table(array))
- dataframe[is.na(dataframe[,ncol(dataframe)])==FALSE,]
-*First line returns an array from a data frame.
-*Second line returns rows without empty values.
- dataframe[grep("location", dataframe$index, ignore.case = TRUE),]
-*Returns all rows of a data frame where column "index" value is "location".
- dataframe[order(dataframe[,"col1"],dataframe[,"col2"], ... ,dataframe[,"coln"]),]
-*Returns dataframe ordered by col1, col2, ... , coln.
- read.csv("table.csv", sep = ",")
-*Returns a data.frame from a .csv file, sep is the separator used in the file.
 ==Uploading data==
-*Varying numbers of iterations between cells not supported.
+'''op_baseWrite
-===Functions===
- op_baseWrite <- function(dsn, input, ident = NULL, name = NULL, unit = NULL, objtype_id = NULL, who = NULL, acttype = NULL, sort = TRUE) {
+<rcode name="op_baseWrite">
- 	#Open database connection
+op_baseWrite <- function(dsn, input, ident = NULL, name = NULL, unit = NULL, objtype_id = NULL, who = NULL, acttype = NULL,
- 	db <- odbcConnect(dsn)
+	rescol = NULL, n.obs.const = FALSE, maxrows = 50000, use.utf8 = TRUE, use.utf8.read = TRUE, latin1.2.utf8.conv.write = TRUE, utf8.2.latin1.conv.read = TRUE) {
- 	#Coerce input into a data frame if it isn't one already; get rid of empty cells
+	# Coerce input into a data frame if it isn't one already; get rid of empty cells
- 	if (is.array(input)) dataframe <- as.data.frame(as.table(input)) else dataframe <- input
+	if (is.array(input)) dataframe <- as.data.frame(as.table(input)) else dataframe <- input
- 	rescol <- colnames(dataframe) == "Freq"
+	if (is.null(rescol)) {
- 	if (length(rescol[rescol==TRUE]) >= 1) rescol <- "Freq" else {
+	rescol <- colnames(dataframe) == "Freq"
- 		rescol <- colnames(dataframe) == "Result"
+	if (sum(rescol) == 1) rescol <- "Freq" else {
- 		if (length(rescol[rescol==TRUE]) >= 1) rescol <- "Result" else {
+		rescol <- colnames(dataframe) == "Result"
- 			rescol <- colnames(dataframe) == "result"
+		if (sum(rescol) == 1) rescol <- "Result" else {
- 			if (length(rescol[rescol==TRUE]) >= 1) rescol <- "result"
+			rescol <- colnames(dataframe) == "result"
- 		}
+			if (sum(rescol) == 1) rescol <- "result"
- 	}
+		}
- 	values <- dataframe[,rescol]
+	}}
- 	cond <- (is.na(values) == FALSE)
+	dataframe <- dataframe[is.na(dataframe[,rescol]) == FALSE,]
- 	dataframe <- dataframe[cond,]
+	ColNames <- colnames(dataframe)[!(colnames(dataframe)%in%c(rescol, "id", "obs"))]
+	for (i in ColNames) {
- 	#Add page to database (if it doesn't already exist)
+		dataframe[,i] <- factor(dataframe[,i])
- 	if (is.null(ident)==TRUE) if (interactive()) ident <- readline(paste("What is the identifier of this object?",
+		levels(dataframe[,i]) <- gsub(" *$", "",gsub("^ *", "", levels(dataframe[,i])))
- 		"\n", sep = "")) else stop("indentifier of object no given")
+		if(latin1.2.utf8.conv.write) if(sum(Encoding(levels(dataframe[,i]))=="latin1")!=0) levels(dataframe[,i]) <- iconv(levels(dataframe[,i]), "latin1", "UTF-8")
- 	wiki_name <- substring(ident, 1, 5)
+	}
- 	if (wiki_name=="Op_en") {wiki_id <- 1; page <- substring(ident, 6, nchar(ident))} else if (wiki_name=="Op_fi") {
- 		wiki_id <- 2; page <- substring(ident, 6, nchar(ident))} else {wiki_id <- 0; page <- 0}
+	#if(!is.numeric(dataframe[,rescol]))
- 	<nowiki>obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep = ''))[1,1]</nowiki>
- 	if (is.na(obj_id)==TRUE) {
+	# Open database connection
- 		if (is.null(name)==TRUE) if (interactive()) name <- readline(paste("What is the name of this object?",
+	if(use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
- 			"\n", sep = "")) else stop("object name not given")
+	if(!use.utf8.read) db2 <- odbcConnect(dsn)
- 		if (is.null(unit)==TRUE) if (interactive()) unit <- readline(paste("What is the unit of this object?",
- 			"\n", sep = "")) else stop("unit not given")
+	# Add page to database (if it doesn't already exist)
- 		if (is.null(objtype_id)==TRUE) if (interactive()) objtype_id <- readline(paste("What type of object is",
+	if (is.null(ident)) if (interactive()) ident <- readline(paste("What is the identifier of this object?",
- 			" this (id)?", paste(paste(sqlQuery(db, "SELECT id FROM objtype")[,1], sqlQuery(db, paste("SELECT objtype",
+		"\n", sep = "")) else stop("indentifier of object no given")
- 			" FROM objtype", sep = ""))[,1], sep = " - "), collapse = ", "), "\n", collapse = " ")) else {
+	obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep = ''))[1,1]
- 			stop("object type not given")}
+	if (is.na(obj_id)) {
- 		sqlQuery(db, paste('INSERT INTO obj (ident, name, unit, objtype_id, page, wiki_id) VALUES ("', paste(ident,
- 			<nowiki>name, unit, sep = '","'), '",', paste(objtype_id, page, wiki_id, sep = ','), ')', sep = ''))</nowiki>
+		# Wiki id
- 		<nowiki>obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep = ''))[1,1]</nowiki>
+		if (substr(ident, 1,5)=="Op_en") {wiki_id <- 1; page <- substr(ident, 6, nchar(ident))} else {
- 	}
+		if (substr(ident, 1,5)=="Op_fi") {wiki_id <- 2; page <- substr(ident, 6, nchar(ident))} else {
+		if (substr(ident, 1,6)=="Heande") {wiki_id <- 6; page <- substr(ident, 7, nchar(ident))} else {
- 	#Write act and actobj
+		if (substr(ident, 1,4)=="Erac") {wiki_id <- 6; page <- substr(ident, 5, nchar(ident))} else {
- 	if (is.null(who)==TRUE) if (interactive()) {who <- readline(paste("What is the name of the uploader?", "\n", sep = ""))
+		wiki_id <- 0; page <- 0; warning("No wiki id found in ident, writing zero.")}}}}
- 		} else stop("uploader name not given")
+		page <- as.numeric(page)
- 	series_id <- sqlQuery(db, paste("SELECT series_id FROM actobj WHERE obj_id = ", obj_id, " ORDER BY series_id DESC LIMIT 1",
+		if (is.na(page)) stop("could not convert characters following the wiki ident into a page number")
- 		sep = ""))[1,1]
- 	if (is.na(series_id)==FALSE) {if (is.null(acttype)==TRUE) {if (interactive()) {acttype <- readline(paste("What type of upload",
+		# Name etc.
- 		" is this? 4 - new data to replace any existing, 5 - new data to be appended to existing data (must have the same",
+		if (is.null(name)) if (interactive()) name <- readline(paste("What is the name of this object?",
- 		" indices).", "\n", sep = ""))
+			"\n", sep = "")) else stop("object name not given")
- 		} else acttype <- 4
+		if (is.null(objtype_id)) if (interactive()) objtype_id <- readline(paste("What type of object is",
- 		}} else acttype <- 4
+			" this (id)?", paste(paste(sqlQuery(db, "SELECT id FROM objtype")[,1], sqlQuery(db, paste("SELECT objtype",
- 	if (acttype != 4 & acttype != 5) stop ("proper acttype not given")
+			" FROM objtype", sep = ""))[,1], sep = " - "), collapse = ", "), "\n", collapse = " ")) else {
- 	<nowiki>sqlQuery(db, paste('INSERT INTO act (acttype_id, who, comments) VALUES (', acttype, ',"', who, '","R upload")', sep = ''))</nowiki>
+			stop("object type not given")}
- 	act_id <- sqlQuery(db, "SELECT id FROM act ORDER BY id DESC LIMIT 1")[1,1]
+		sqlQuery(db, paste('INSERT INTO obj (ident, name, objtype_id, page, wiki_id) VALUES ("', paste(ident,
- 	if (acttype == 4) series_id <- act_id
+			name, sep = '","'), '",', paste(objtype_id, page, wiki_id, sep = ','), ')', sep = ''))
- 	sqlQuery(db, paste('INSERT INTO actobj (act_id, obj_id, series_id) VALUES (', paste(act_id, obj_id, series_id, sep = ','),
+		obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep = ''))[1,1]
- 		<nowiki>')', sep = ''))</nowiki>
+	}
- 	<nowiki>actobj_id <- sqlQuery(db, paste('SELECT id FROM actobj WHERE act_id = ', act_id, sep = ''))[1,1]</nowiki>
+	# Write act and actobj
- 	#Write indexes
+	if (is.null(who)==TRUE) if (interactive()) {who <- readline(paste("What is the name of the uploader?", "\n", sep = ""))
- 	ColNames <- colnames(dataframe[(colnames(dataframe) == "id") == FALSE])
+		} else stop("uploader name not given")
- 	ColNames <- ColNames[(ColNames == "obs") == FALSE]
+	series_id <- sqlQuery(db, paste("SELECT series_id FROM actobj WHERE obj_id = ", obj_id, " ORDER BY series_id DESC LIMIT 1",
- 	ColNames <- ColNames[(ColNames == "Result") == FALSE]
+		sep = ""))[1,1]
- 	ColNames <- ColNames[(ColNames == "result") == FALSE]
+	if (is.na(series_id)==FALSE) {if (is.null(acttype)==TRUE) {if (interactive()) {acttype <- readline(paste("What type of upload",
- 	ColNames <- ColNames[(ColNames == "Freq") == FALSE]
+		" is this? 4 - new data to replace any existing, 5 - new data to be appended to existing data (must have the same",
- 	nind <- length(ColNames)
+		" indices).", "\n", sep = ""))
- 	DimNames <- rep(vector("list", 1), nind)
+		} else acttype <- 4
- 	names(DimNames) <- ColNames
+		}} else acttype <- 4
- 	indlengths <- 0
+	if (!(acttype%in%c(4,5))) stop ("proper acttype not given")
- 	for (i in 1:nind) {
- 		<nowiki>DimNames[[i]] <- levels(factor(dataframe[, ColNames[i]]))</nowiki>
+	sqlQuery(db, paste('INSERT INTO act (acttype_id, who, comments) VALUES (', acttype, ',"', who, '","R upload")', sep = ''))
- 		<nowiki>indlengths[i] <- length(DimNames[[i]])</nowiki>
+	act_id <- sqlQuery(db, paste('SELECT id FROM act WHERE who = "', who,'" AND comments = "R upload" ORDER BY id DESC LIMIT 1',
- 	}
+		sep = ''))[1,1]
- 	for (i in 1:length(ColNames)) {
+	if (acttype == 4) series_id <- act_id
- 		<nowiki>sqlQuery(db, paste('INSERT IGNORE INTO obj (ident, name, objtype_id) VALUES ("', gsub(' ', '_', ColNames[i]), '","', </nowiki>
+	if (is.null(unit)) if (interactive()) unit <- readline(paste("What is the unit of this object?",
- 			<nowiki>ColNames[i], '", 6)', sep = ''))</nowiki>
+		"\n", sep = "")) else stop("unit not given")
- 	}
+	sqlQuery(db, paste('INSERT INTO actobj (act_id, obj_id, series_id, unit) VALUES (', paste(act_id, obj_id, series_id, sep = ','),
- 	IndIds <- sqlQuery(db, paste('SELECT id, ident FROM obj WHERE ident IN("', paste(gsub(" ", "_", ColNames),
+		',"', unit, '")', sep = ''))
- 		<nowiki>collapse = '","'), '")', sep = ''))</nowiki>
+	actobj_id <- sqlQuery(db, paste('SELECT id FROM actobj WHERE act_id = ', act_id, sep = ''))[1,1]
- 	DimIds <- DimNames
- 	DimN <- 1:nind
+	#Write indexes
- 	names(DimN) <- tolower(gsub(" ", "_", ColNames))
+	for (i in ColNames) {
- 	for (i in 1:nrow(IndIds)) {
+		sqlQuery(db, paste('INSERT IGNORE INTO obj (ident, name, objtype_id) VALUES ("', gsub(' ', '_', i), '","',
- 		names(DimIds)[DimN[tolower(IndIds[i,2])]] <- IndIds[i, 1]
+			i, '", 6)', sep = ''))
- 	}
+	}
+	IndIds <- sqlQuery((if(use.utf8.read) db else db2), paste('SELECT id, ident FROM obj WHERE ident IN("', paste(gsub(" ", "_", ColNames),
- 	#Write locations
+		collapse = '","'), '")', sep = ''))
- 	y <- 1
+	if(utf8.2.latin1.conv.read) levels(IndIds$ident) <- iconv(levels(IndIds$ident), "UTF-8", "latin1")
- 	LocNames <- as.data.frame(matrix(rep(NA, 2*sum(indlengths)), sum(indlengths), 2))
+	IndIdMap <- IndIds$id
- 	for (i in 1:nind) {
+	names(IndIdMap) <- tolower(IndIds$ident)
- 		LocNames[y:(y + indlengths[i] - 1),1:2] <- matrix(c(rep(ColNames[i], indlengths[i]),
+	ColIds <- as.character(IndIdMap[tolower(gsub(" ", "_", ColNames))])
- 			levels(factor(dataframe[,ColNames[i]]))), indlengths[i], 2)
+	colnames(dataframe)[colnames(dataframe)%in%ColNames] <- ColIds
- 		y <- y + indlengths[i]
- 	}
+	#Write locations
- 	LocNames[,1] <- names(DimIds)[DimN[tolower(LocNames[,1])]]
+	for (i in ColIds) {
- 	for (i in 1:nrow(LocNames)) {
+		for (j in levels(dataframe[, i])) {
- 		<nowiki>sqlQuery(db, paste('INSERT IGNORE INTO loc (obj_id_i, location) VALUES (', LocNames[i,1], ',"', LocNames[i,2], '")', </nowiki>
+			sqlQuery(db, paste('INSERT IGNORE INTO loc (obj_id_i, location) VALUES (', i, ',"', j, '")',
- 			<nowiki>sep = ''))</nowiki>
+				sep = ''))
- 	}
+		}
- 	LocIds <- sqlQuery(db, paste('SELECT id, obj_id_i, location FROM loc WHERE location IN("', paste(LocNames[,2],
+	}
- 		<nowiki>collapse = '","'), '") AND obj_id_i IN(', paste(names(DimIds), collapse = ','), ')', sep = ''))</nowiki>
+	LocIds <- sqlQuery((if(use.utf8.read) db else db2), paste('SELECT id, obj_id_i, location FROM loc WHERE obj_id_i IN("', paste(ColIds, collapse = '","'),
- 	LocMap <- NA
+		'")', sep = ''))
- 	y <- 1
+	if(utf8.2.latin1.conv.read) levels(LocIds$location) <- iconv(levels(LocIds$location), "UTF-8", "latin1")
- 	for (i in 1:nind) {
- 		LocMap <- LocIds[grep(names(DimIds)[i], LocIds$obj_id_i), 1]
+	for (i in ColIds) {
- 		names(LocMap) <- tolower(LocIds[grep(names(DimIds)[i], LocIds$obj_id_i), 3])
+		LocIdMap <- LocIds[LocIds$obj_id_i == i, 1]
- 		<nowiki>DimIds[[i]] <- LocMap[tolower(DimNames[[i]])]</nowiki>
+		names(LocIdMap) <- gsub(" *$", "",gsub("^ *", "", tolower(LocIds[LocIds$obj_id_i == i, 3])))
- 	}
+		levels(dataframe[, i]) <- LocIdMap[tolower(levels(dataframe[, i]))]
+		if (sum(is.na(levels(dataframe[, i]))) != 0) stop("Faulty location matching. Usually caused by special characters.")
- 	#Writing actloc
- 	sqlQuery(db, paste("INSERT INTO actloc (actobj_id, loc_id) VALUES (", paste(actobj_id, LocIds[,1], sep = ",",
+		#Writing actloc
- 	collapse = "),("), ")", sep = ""))
+		sqlQuery(db, paste("INSERT INTO actloc (actobj_id, loc_id) VALUES (", paste(actobj_id, levels(dataframe[, i]),
+		sep = ",", collapse = "),("), ")", sep = ""))
- 	#Changing location names in table into ids
+	}
- 	for (i in 1:nind) {
- 		dataframe[,ColNames[i]] <- factor(dataframe[,ColNames[i]])
+	#Writing cell
- 		<nowiki>levels(dataframe[,ColNames[i]]) <- DimIds[[i]]</nowiki>
+	n <- tapply(dataframe[,rescol], dataframe[,ColIds], length)
- 	}
+	ncell <- sum(!is.na(n))
+	if (is.numeric(dataframe[,rescol])) means <- tapply(dataframe[,rescol], dataframe[,ColIds], mean) else means <- rep(0, ncell)
- 	#A hidden parameter for adjusting query packet sizes, the higher the faster, though crash becomes likelier
+	if (is.numeric(dataframe[,rescol])) {
- 	maxrows <- 50000
+		sds <- tapply(dataframe[,rescol], dataframe[,ColIds], sd); sds[] <- ifelse(n == 1, 0, sds)} else sds <- rep(0, ncell)
- 	#Writing cell
+	cellQuery <- paste(actobj_id, means[!is.na(means)], sds[!is.na(sds)], n[!is.na(n)], sep = ",")
- 	obscol <- colnames(dataframe) == "obs"
+	i <- 1
- 	if (length(obscol[obscol==TRUE]) >= 1) {
+	while (length(cellQuery) >= (i + maxrows - 1)) {
- 		obscol <- TRUE
+		sqlQuery(db, paste('INSERT INTO cell (actobj_id, mean, sd, n) VALUES (', paste(cellQuery[i:(i + maxrows - 1)],
- 		n <- length(levels(factor(dataframe[,"obs"])))} else {
+			collapse = '),('), ')', sep = ''))
- 		obscol <- FALSE
+		i <- i + maxrows
- 		n <- 1
+	}
- 	}
+	if (length(cellQuery) %% maxrows != 0) {
- 	if (obscol) if (sort) {
+		sqlQuery(db, paste('INSERT INTO cell (actobj_id, mean, sd, n) VALUES (',
- 		dataframe <- dataframe[order(dataframe[,"obs"]),]
+			paste(cellQuery[i:length(cellQuery)], collapse = '),('), ')', sep = ''))
- 		for (i in 1:nind) {
+	}
- 			dataframe <- dataframe[order(dataframe[,ColNames[i]]),]
- 		}
+	#Writing res
- 	}
+	cell_id <- sqlQuery(db, paste("SELECT id FROM cell WHERE actobj_id = ", actobj_id, " ORDER BY ID", sep = ""))[,1]
- 	ncell <- nrow(dataframe)/n
+	if (length(cell_id) != ncell) stop("number of written cells differs from given data")
- 	cellQuery <- NA
+	if (is.numeric(dataframe[,rescol])) ids <- means else ids <- n
- 	cellQuery[1:ncell*8-7] <- actobj_id
+	ids[!is.na(ids)] <- cell_id
- 	cellQuery[1:ncell*8-6] <- ","
+	dataframe[, ncol(dataframe) + 1] <- ids[as.matrix(dataframe[,ColIds])]
- 	if (n == 1) {
+	colnames(dataframe)[ncol(dataframe)] <- "cell_id"
- 		cellQuery[1:ncell*8-5] <- dataframe[,rescol]
- 	 	cellQuery[1:ncell*8-3] <- 0} else {
+	resQuery <- paste(dataframe[,"cell_id"], ',', if(sum(colnames(dataframe) == "obs") == 0) 1 else dataframe[,"obs"], ',',
- 		cellQuery[1:ncell*8-5] <- apply(matrix(dataframe[,rescol], n, ncell), 2, mean)
+		if (!is.numeric(dataframe[,rescol])) '"', dataframe[,rescol], if (!is.numeric(dataframe[,rescol])) '"', sep = "")
- 		cellQuery[1:ncell*8-3] <- sd(matrix(dataframe[,rescol], n, ncell))
+	i <- 1
- 	}
+	while (length(resQuery) >= (i + maxrows - 1)) {
- 	cellQuery[1:ncell*8-4] <- ","
+		sqlQuery(db, paste('INSERT INTO res (cell_id, obs, ', ifelse(is.numeric(dataframe[,rescol]), "result", "restext"), ') VALUES (',
- 	cellQuery[1:ncell*8-2] <- ","
+			paste(resQuery[i:(i + maxrows - 1)], collapse = '),('), ')', sep = ''))
- 	cellQuery[1:ncell*8-1] <- n
+		i <- i + maxrows
- 	cellQuery[1:(ncell-1)*8] <- "),("
+	}
- 	if (ncell >= maxrows) {
+	if (length(resQuery) %% maxrows != 0) {
- 		for (i in 1:(ncell%/%maxrows)) {
+		sqlQuery(db, paste('INSERT INTO res (cell_id, obs, ', ifelse(is.numeric(dataframe[,rescol]), "result", "restext"), ') VALUES (',
- 			sqlQuery(db, paste('INSERT INTO cell (actobj_id, mean, sd, n) VALUES (', paste(cellQuery[
+			paste(resQuery[i:length(resQuery)], collapse = '),('), ')', sep = ''))
- 				<nowiki>((i-1)*8*maxrows+1):(i*8*maxrows-1)], collapse = ''), ')', sep = ''))</nowiki>
+	}
- 		}
- 	}
+	#Writing loccell
- 	if (ncell%%maxrows != 0) sqlQuery(db, paste('INSERT INTO cell (actobj_id, mean, sd, n) VALUES (', paste(cellQuery[(ncell%/%
+	ids <- as.data.frame(as.table(ids))
- 		<nowiki>maxrows*8*maxrows+1):length(cellQuery)], collapse = ''), ')', sep = ''))</nowiki>
+	ids <- ids[!is.na(ids$Freq),]
+	loccellQuery <- paste(ids$Freq, unlist(ids[,-ncol(ids)]), sep = ",")
- 	#Writing res
+	i <- 1
- 	cell_id <- sqlQuery(db, paste("SELECT id FROM cell WHERE actobj_id = ", actobj_id, " ORDER BY ID", sep = ""))[,1]
+	while (length(loccellQuery) >= (i + maxrows - 1)) {
- 	if (ncell != length(cell_id)) stop("Number of cells written does not match with expected value")
+		sqlQuery(db, paste('INSERT INTO loccell (cell_id, loc_id) VALUES (', paste(loccellQuery[i:(i + maxrows - 1)], collapse = '),('), ')',
- 	resQuery <- NA
+			sep = ''))
- 	resQuery[1:nrow(dataframe)*6-5] <- rep(cell_id, each = n)
+		i <- i + maxrows
- 	resQuery[1:nrow(dataframe)*6-4] <- ","
+	}
- 	if (n==1) resQuery[1:nrow(dataframe)*6-3] <- "0" else resQuery[1:nrow(dataframe)*6-3] <- dataframe[,"obs"]
+	if (length(loccellQuery) %% maxrows != 0) {
- 	resQuery[1:nrow(dataframe)*6-2] <- ","
+		sqlQuery(db, paste('INSERT INTO loccell (cell_id, loc_id) VALUES (', paste(loccellQuery[i:length(loccellQuery)], collapse = '),('), ')',
- 	resQuery[1:nrow(dataframe)*6-1] <- dataframe[,rescol]
+			sep = ''))
- 	resQuery[1:(nrow(dataframe)-1)*6] <- "),("
+	}
- 	if (nrow(dataframe) >= maxrows) {
- 		for (i in 1:(nrow(dataframe)%/%maxrows)) {
+	#Close database connection
- 			sqlQuery(db, paste('INSERT INTO res (cell_id, obs, result) VALUES (', paste(resQuery[((i-1)
+	odbcClose(db)
- 				<nowiki>*6*maxrows+1):(i*6*maxrows-1)], collapse = ''), ')', sep = ''))</nowiki>
+	cat("Successful\n")
- 		}
+	return(character())
- 	}
+}
- 	if (nrow(dataframe)%%maxrows != 0) sqlQuery(db, paste('INSERT INTO res (cell_id, obs, result) VALUES (', paste(resQuery[
+</rcode>
- 		<nowiki>(nrow(dataframe)%/%maxrows*6*maxrows+1):length(resQuery)], collapse = ''), ')', sep = ''))</nowiki>
- 	#Writing loccell
- 	locidmatrix <- rep(NA, nind*ncell)
- 	dim(locidmatrix) <- c(nind,ncell)
- 	for (i in 1:nind) {
- 		locidmatrix[i,] <- as.character(dataframe[1:ncell*n - n + 1,ColNames[i]])
- 	}
- 	loccellQuery <- rep(NA, 4*nind*ncell)
- 	dim(loccellQuery) <- c(4*nind, ncell)
- 	loccellQuery[1:nind*4-3, 1:ncell] <- rep(cell_id, each = nind)
- 	loccellQuery[1:nind*4-2, 1:ncell] <- ","
- 	loccellQuery[1:nind*4-1, 1:ncell] <- locidmatrix
- 	loccellQuery[1:nind*4, 1:ncell] <- "),("
- 	loccellQuery[nind*4, ncell] <- ""
- 	if (ncell >= maxrows) {
- 		for (i in 1:(ncell%/%maxrows)) {
- 			loccellQuery[nind*4, i*maxrows] <- ""
- 			sqlQuery(db, paste('INSERT INTO loccell (cell_id, loc_id) VALUES (', paste(loccellQuery[,
- 				<nowiki>((i-1)*maxrows+1):(i*maxrows)], collapse = ''), ')', sep = ''))</nowiki>
- 		}
- 	}
- 	if (ncell%%maxrows != 0) sqlQuery(db, paste('INSERT INTO loccell (cell_id, loc_id) VALUES (', paste(loccellQuery[,
- 		<nowiki>(ncell%/%maxrows*maxrows+1):ncell], collapse = ''), ')', sep = ''))</nowiki>
- 	#Close database connection
- 	odbcClose(db)
- }
-====Usage====
+===Usage===
-  op_baseWrite(dsn, input, ident = NULL, name = NULL, unit = NULL, objtype_id = NULL, who = NULL, acttype = NULL, sort = TRUE)
+  op_baseWrite(dsn, input, ident = NULL, name = NULL, unit = NULL, objtype_id = NULL, who = NULL, acttype = NULL, rescol = NULL)
 *dsn and input must be defined, the rest of the object and act parameters if not defined are prompted for by the function as needed.
 **For uploading the DSN defined must have writers permissions.
-*sort (default = TRUE), sorts the data in the required order
+*rescol defines the column from which the values are chosen from, both numerical and textual data are allowed, if left undefined the function will check column matches for "Freq", "Result" and "result" in that order.
-=====Restrictions=====
+====Restrictions====
 *Input may only be given in either array or data.frame form.
 **Indexes used may not exceed the character limit of 20.
 ***Indexes should preferably match an earlier entry: [[Special:OpasnetBaseIndices]].
-***Indexes are treated as identifiers for indexes in the database, spaces in the indexes are converted to _. This ensures maximum compatibility and ease in operations in which data is downloaded and uplaoded again. Names and more specific details can be edited into the indexes separately.
+***Indexes are treated as identifiers for indexes in the database, spaces in the indexes are converted to _. This ensures maximum compatibility and ease in operations in which data is downloaded and uploaded again. Names and more specific details can be edited into the indexes separately.
-**Input in data.frame form must contain a "Freq", "Result" or "result" column, this is where the numerical value is read.
-**Probabilistic input must contain a dimension (in case of array) or a column (in case of data.frame) named "obs". Also: all cells must have values for each "obs" value. By default the function sorts the data as it requires, however, for some performance boost this may be disabled if the sorting order is taken into account elsewhere: indices first, obs last; for arrays this means that the first index must be "obs". Data downloaded by the op_baseGetData function is sorted correctly by default.
+== Regarding special characters and character encoding ==
+Using special characters like ä and ö when our database is encoded in latin1 while wiki is in UTF-8 is a bit complicated. New parameters -- for forcing the odbc connection to use UTF-8 and in-R conversion of one encoding to the other prior to writing or after reading -- have been created. The defaults have been made to work with opasnet_base. For use with heande_base, forcing UTF-8 on the odbc connection when reading should be disabled, meaning that ''use.utf8'' should be set to ''FALSE'' when using ''op_baseGetData'' and ''use.utf8.read'' should be set to ''FALSE'' when using ''op_baseWrite''.
+== See also ==
+{{Opasnet Base}}
+*[[A Tutorial on R]]