Talk:Opasnet base structure: Difference between revisions
(→Continuous indices: new suggestions) |
|||
Line 71: | Line 71: | ||
{{attack|# |Some faulty thinking... Implementation would require a running ID achieved by using a new index or similar. It'd be probably better to adjust the database structure. |--[[User:Teemu R|Teemu R]] 10:21, 12 May 2011 (EEST)}} | {{attack|# |Some faulty thinking... Implementation would require a running ID achieved by using a new index or similar. It'd be probably better to adjust the database structure. |--[[User:Teemu R|Teemu R]] 10:21, 12 May 2011 (EEST)}} | ||
{{defend|# |If the Observation index location links were in the res table instead of loccell, a cell could have multiple results, which would be interpreted as different measurements (not obs as in iteration number) in the same index locations and a unique iteration number.|--[[User:Teemu R|Teemu R]] 14:00, 27 May 2011 (EEST)}} | {{defend|# |If the Observation index location links were placed in the res table instead of loccell, a cell could have multiple results, which would be interpreted as different measurements (not obs as in iteration number) in the same index locations and a unique iteration number.|--[[User:Teemu R|Teemu R]] 14:00, 27 May 2011 (EEST)}} | ||
}} | }} | ||
Revision as of 11:02, 27 May 2011
New structure for Opasnet Base
The following four discussions will replace the previous discussion Opasnet base should be restructured (below). This is because that discussion has several topics in one, but they can and should be dealt with separately.
Description of the problem with data slicing
Add description.
Discussion about solutions
Fact discussion: . |
---|
Opening statement: Opasnet Base needs more efficient solutions for slicing data, especially for the loccell table.
Closing statement: Accepted. (A closing statement, when resolved, should be updated to the main page.) |
Argumentation: |
Fact discussion: . |
---|
Opening statement: The loccell table should be kept in Opasnet Base.
Closing statement: Accepted (A closing statement, when resolved, should be updated to the main page.) |
Argumentation:
←--#: . Loccell makes the database flexible enough for any kind of data. --Jouni 06:28, 25 March 2011 (EET) (type: truth; paradigms: science: defence) ⇤--#: . Large objects' performance will suffer if indices, whose locations are often filtered, are implemented in loccell. --Teemu R 16:05, 28 March 2011 (EEST) (type: truth; paradigms: science: attack) |
Fact discussion: . |
---|
Opening statement: X % of the largest variables should be indexed with separate tables instead of with loccell, because the slicing of data is so slow. A good value for X is a part of this discussion.
Closing statement: Under discussion (to be changed when a conclusion is found) (A closing statement, when resolved, should be updated to the main page.) |
Argumentation:
←--#: . Slicing wouldn't require a tedious subquery. --Teemu R 16:05, 28 March 2011 (EEST) (type: truth; paradigms: science: defence) ←--#: . Enables fast and easy intelligent array operation on the database level. (It is possible with the current structure also, but it requires extremely heavy subquerying and is very memory intensive for the server.) --Teemu R 16:05, 28 March 2011 (EEST) (type: truth; paradigms: science: defence) ←--#: . Reduces number of joins used in data fetching queries -> faster. --Teemu R 16:05, 28 March 2011 (EEST) (type: truth; paradigms: science: defence) ----#: . I still support 100 as the value of X. I would trade simplicity for functionality and efficiency any day... though in my opinion the current structure is actually more complex than what I'm proposing (think about the difficulty of explaining it to an outsider; if all objects had their own table it would be a lot more intuitive). --Teemu R 16:05, 28 March 2011 (EEST) (type: truth; paradigms: science: comment) ----#: . If separate tables are created, they should be specific to actobj. They should be named as actobj_id or alternatively (if humans need to operate manually with tables) as obj_identifier+act.id. If found useful, tables could be located in folders named as obj_identifier. --Jouni 06:28, 25 March 2011 (EET) (type: truth; paradigms: science: comment) |
Fact discussion: . |
---|
Opening statement: Cell table should be reorganised and fields called ind1, ind2,...indn should be added. These are used for loc_id for the first n indices.
Closing statement: Under discussion (to be changed when a conclusion is found) (A closing statement, when resolved, should be updated to the main page.) |
Argumentation:
←--#: . Some of the same as above (slicing, intelligent arrays, less joins for most tables). --Teemu R 16:05, 28 March 2011 (EEST) (type: truth; paradigms: science: defence) ←--#: . If n is large enough to hold all of the indices that are going to be sliced, the rest of the indices can be put into loccell where slicing is costly, but never done. --Teemu R 16:12, 28 March 2011 (EEST) (type: truth; paradigms: science: defence)
|
Continuous indices
Currently, having indices with real values is problematic as each value would have its own location. Slicing data so as to select a range of values from a continous index would be very useful.
Fact discussion: . |
---|
Opening statement: Put the continuous indices as results under the index Observation and location Continuous Index X. Then build tools for slicing data by result value.
Closing statement: Resolution not yet found. (A closing statement, when resolved, should be updated to the main page.) |
Argumentation:
←--#: . Would not require much change to the database structure. --Teemu R 09:38, 12 May 2011 (EEST) (type: truth; paradigms: science: defence) ⇤--#: . Might require a tedious query. --Teemu R 09:38, 12 May 2011 (EEST) (type: truth; paradigms: science: attack) ⇤--#: . Some faulty thinking... Implementation would require a running ID achieved by using a new index or similar. It'd be probably better to adjust the database structure. --Teemu R 10:21, 12 May 2011 (EEST) (type: truth; paradigms: science: attack) ←--#: . If the Observation index location links were placed in the res table instead of loccell, a cell could have multiple results, which would be interpreted as different measurements (not obs as in iteration number) in the same index locations and a unique iteration number. --Teemu R 14:00, 27 May 2011 (EEST) (type: truth; paradigms: science: defence) |
Fact discussion: . |
---|
Opening statement: Add a column with a numeric value to the loccell table.
Closing statement: Resolution not yet found. (A closing statement, when resolved, should be updated to the main page.) |
Argumentation:
|
Should all variables go to result distribution database?
Fact discussion: . |
---|
Opening statement: Not all variables should go to the result distribution database
Closing statement: Not accepted. (A closing statement, when resolved, should be updated to the main page.) |
Argumentation:
←--1P: . There should be two levels of variables: 1) The results of important variables are uploaded in the result database, and they should be coherent with each other. 2) Other variables that are less important are used in case-specific assessments. They don't need to be coherent with all variables in the result database, only with those within the same assessment. --Jouni 23:52, 20 August 2007 (EEST) (type: truth; paradigms: science: defence)
|
Indeksien standardointi
Nykyään kantaan voi ladata mitä indeksejä tahansa. Jos samanniminen indeksi jo on, lokaatiot lisätään siihen. Jos samanniminen lokaatio jo on, käytetään sitä. Mutta jos on jo sisällöllisesti sama mutta nimeltään eri indeksi tai lokaatio, tätä ei tunnisteta millään tavalla. Tämä on iso ongelma, koska se estää tehokkaan muuttujien linkkaamisen toisiinsa samojen lokaatioiden osalta.
Ratkaisu: luodaan standardi-indeksien ja -lokaatioiden järjestelmä. Näitä käytetään aina kun mahdollista. Tarvitaan ylläpitäjä, joka seuraa uusia indeksejä ja tunnistaa, jos ne ovat sisällöltään samoja kuin jokin entinen. Kun tämmöinen löytyy, jokin indekseistä nimetään standardiksi (tai tarvittaessa luodaan uusi). Muut indeksit linkataan tähän, ja tulosteessa käytetään standardi-indeksin arvoja, ei alkuperäisiä.
Teknisesti tämä toteutetaan siten, että tarvitaan uusi taulu. Siinä on lista lokaatioita, ja kullekin lokaatiolle kerrotaan standardilokaatio. Tämä määrittelee samalla käytettävän indeksi yksiselitteisesti. Lista on uniikki lokaation suhteen mutta ei standardilokaation suhteen. Itse asiassa tästähän seuraa, ettei tarvita uutta taulua, vaan Loc-tauluun tarvitaan vain uusi kenttä standardilokaatiolle, mikä on paljon miellyttävämpi ratkaisu. Sen sijaan että käytettäisiin alkuperäistä Loc.id:tä, käytetäänkin Loc.Std_id ehdolla
<anacode> SELECT Rawloc.id, Loc.Obj_id_i, Loc.Location, Loc.Roww, Loc.Description FROM Loc AS Rawloc, Loc WHERE Rawloc.Std_id = Loc.id </anacode>
- ----1: . Tämä muutos on jo Opasnet-kantaan tehty. --Jouni 16:01, 22 May 2009 (EEST) (type: truth; paradigms: science: comment)
Tämä toimii joss kaikilla lokaatioilla on standardilokaatio. Tämä onnistuu, jos kaikkien lokaatioiden oletusarvo standardilokaatiolle on kyseinen lokaatio itse. Standardilokaation muutokset tekee ylläpitäjä käsin jälkikäteen.
SD
Salaisen datan käytössä on se ongelma, että se on salaista. Kuitenkin olisi tärkeää selvittää, kuinka tärkeää data on, ilman että paljastetaan, mikä se data on. Tähän tuli mieleeni ratkaisu:
Lähdetään siitä, että vaikka data sinänsä on salaista, sen keskihajonta on julkinen tieto. Niinpä voidaan Cell-tauluun lisätä kenttä SD, johon tämä hajonta sijoitetaan. Sen sijaan salatun tiedon tapauksessa Cell.Mean-kenttä jätetään tyhjäksi.
- ----1: . Tämä muutos on jo Opasnet-kantaan tehty. --Jouni 16:01, 22 May 2009 (EEST) (type: truth; paradigms: science: comment)
Nyt kuvitellaan tilanne, että meillä on muuttujasta julkinen estimaatti, joka on epävarma, ja salainen lisätutkimus, joka on informatiivinen. Jos tiedämme lisätutkimuksen hajonnan, voimme laskea EVPIIn (tiedon arvo osittaiselle epätäydelliselle tiedolle). Se tapahtuu siten, että oletamme saavamme tuon salaisen tutkimuksemme käyttöön, jolloin tiedon informatiivisuus lisääntyy eli keskihajonta kapenee. EVPIIssä verrataan alkuperäistä julkista jakaumaa tilanteeseen, jossa uusi tieto on jakauma, jonka keskiarvo otetaan alkuperäisestä jakaumasta arpomalla mutta keskihajonta salaisesta tutkimuksesta. Täydellisen tiedon EVPPIhän lasketaan muuten samalla tavalla mutta oletetaan, että uuden tiedon SD=0.
EVPII:n käyttö on erittäin tehokas työkalu osoittamaan sitä, kuinka kalliiksi yhteiskunnalle tulee jonkin tietyn informaation pimittäminen. Jos saamme pimitetystä tiedosta hajonnan selville, voimme demonstroida tämän kvantitatiivisesti. Tällä lähestymistavalla vielä tehdään jokin juttu Scienceen...
Res and Resinfo -tables should be merged
Fact discussion: . |
---|
Opening statement: Res and Resinfo -tables should be merged
Closing statement: Tables will be merged (A closing statement, when resolved, should be updated to the main page.) |
Argumentation:
←--1: . Merging these tables makes some queries faster because we can get rid of at least one join-query --Juha Villman 12:51, 7 September 2009 (EEST) (type: truth; paradigms: science: defence) ----2: . Merging makes Res-table slightly larger (approx. 2 %) because Restext, Who and When -fields require some amount of space even if they are empty (10 bits). --Juha Villman 12:51, 7 September 2009 (EEST) (type: truth; paradigms: science: comment) |