SIPs and SLURPs: Difference between revisions
(→Procedure: using res table for textual distributions) |
m (→Output) |
||
Line 24: | Line 24: | ||
* The parameter values for the four parameters. | * The parameter values for the four parameters. | ||
* If the distribution is a probability table with text values, a list of all possible values are given with sequence numbers. | * If the distribution is a probability table with text values, a list of all possible values are given with sequence numbers. | ||
* The packed sequence of random draws. If ''n'' is 256, 8 bits will be used for each draw. These are changed into characters having the ASCII values 33-288, which are unambiguously understood by most character encoding systems. There might be some exceptions, like asc(256) and "|" should not be used. For effective packing, n should be exactly or slightly smaller than some | * The packed sequence of random draws. If ''n'' is 256, 8 bits will be used for each draw. These are changed into characters having the ASCII values 33-288, which are unambiguously understood by most character encoding systems. There might be some exceptions, like asc(256) and "|" should not be used. For effective packing, n should be exactly or slightly smaller than some power of 2, as ''n'' = 257 and ''n'' = 512 both take 9 bits. | ||
For example, the output can look like this: | For example, the output can look like this: |
Revision as of 07:31, 16 August 2011
Moderator:Jouni (see all) |
|
Upload data
|
Stochastic information packet (SIP) is a format for describing random samples from probability distributions. A SIP is essentially a Monte Carlo sample of possible values, using a standard sample size, with a distribution representative of the possible outcomes. Importantly, the SIP is treated as the representation of the value and uncertainty of the quantity. To capture relationships between quantities, multiple SIPs are bundled into a SLURP (Stochastic Library Unit with Relationship Preserved). SIPs and SLURPs may be exchanged between people within the organization, and used directly in decision models. By managing a standardized set of SIPs and SLURPs within an organization, probabilistic estimates from different groups within an organization can be combined within models in a coherent fashion.
The sample values within SIPs and SLURPs appear in a random order, as would be the case in a Monte Carlo sample, but the specific ordering of the samples is critical: It captures the relationships between quantities. Suppose one SIP represents the remaining cost to complete a construction project and another SIP is the remaining time to completion. In scenarios, or samples, with an exceptionally low cost, the remaining time will also usually be small. Likewise, cost overruns usually coincide with delays. These two SIPs are coherent when the ordering of samples captures this relationship, meaning that the nth point of remaining_cost should correspond to the same scenario as the nth point of remaining_time. Coherence in this fashion captures correlation between the quantities as well as other more subtle dependencies that may not be apparent in of correlations. Remaining cost and time are SIPs that should be bundled within the same SLURP. [1]
Scope
SIPs and SLURPs are based on a commercial DIST 1.1 Standard by ProbiliTech. However, the same idea of packing random samples while retaining the original order of samples can be implemented using other means. What is a good way of packing random samples in such a way that is not bound by commercial standards?
Definition
Input
The method should take in a random sample of values (or text) and pack it effectively with a minimal loss of information. The user should be able to adjust the critical parameters, for example
- The rounding precision prec (2 = two decimal digits, 0 = integer, -1 = rounded to tens)
- The smallest value sampled min
- The largest value sampled max
- Number n of bins used. The default is 256 (28), which is used if n is omitted. However, prec, min, max may constrain the number of possible values, and if that is smaller than n, the smaller number will be used.
Output
A text string with all necessary information to unpack the sample should be the output.
- An identifier for a sip: SAMPLE
- The parameter values for the four parameters.
- If the distribution is a probability table with text values, a list of all possible values are given with sequence numbers.
- The packed sequence of random draws. If n is 256, 8 bits will be used for each draw. These are changed into characters having the ASCII values 33-288, which are unambiguously understood by most character encoding systems. There might be some exceptions, like asc(256) and "|" should not be used. For effective packing, n should be exactly or slightly smaller than some power of 2, as n = 257 and n = 512 both take 9 bits.
For example, the output can look like this:
SAMPLE|prec=2|min=260.47|max=294.37|n=16||eKu8W)=εñ"-$§▼eT4i.Mî║|
With n= = 16, each draw takes 4 bits ie. two draws per one character. This example has 23 characters and therefore it contains 46 draws. the bar | is used as a separator between parameters.
Procedure
The method is based on an assumption of latin hypercube sampling. This means that the numbers drawn from a distribution are not random but the whole distribution is divided into n bins which are equally apart from each other and have different probabilities. In effect, the distribution is treated as a frequency distribution with x1 observations from bin 1, x2 from bin 2, ... and xn from bin n. These values are clearly deterministic given the distribution, but they will be shuffled randomly. When the minimum, the maximun, and the number of bins are known, the values can be deduced. The the packed part of the SIP only contains the order of values that come from different bins.
Probability distributions are located in the cell table. Currently, there is a sip field, but this maybe should be extended to have a separate field for all parameters (prec, min, max, n, sample=sip).
If the probability distribution is a classified distribution with text values, then each possible value (i.e., the result range) should be stored in the res table. Then, the sample only contains the obs value for the particular text result, and that value is used to pick the right result from the res table.
See also
- SIPs and SLURPs with Analytica
- Sam Savage: Probability management Part 1 Part 2 Part 1 as PDF
- Dist standard
- SipEncode SipDecode (In Analytica wiki, requires password)
- 21st Century Risk Modeling