Sudoku solver: Difference between revisions
Jump to navigation
Jump to search
mNo edit summary |
(→Rationale: improvement ideas for later) |
||
(28 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:Decision analysis]] | [[Category:Decision analysis]] | ||
[[Category:Contains R code]] | |||
{{method|moderator=Jouni|stub=Yes}} | {{method|moderator=Jouni|stub=Yes}} | ||
Line 8: | Line 9: | ||
== Answer == | == Answer == | ||
===Procedure=== | |||
{ | The following terms are used: | ||
* The possible space of solutions is described by a logical vector A, which contains all possible values given the current information. A is indexed by h, i, j, k, and l. A develops in time, when more information occurs or is processed. In the beginning, all hypotheses are considered as potentially TRUE. | |||
* h = {1, 2, ..., 9}, i = {1, 2, ..., 9}, j = {1, 2, ..., 9}, k = {1, 2, ..., 9}, l = {1, 2, ..., 81} are indices for hypothesis, row, column, area, and cell, respectively. Note l is just another way to say (i,j), as l = i + (j-1)*9. Also, k is known if (i,j) is known, as k = ceiling(i/3) + (ceiling(j/3)-1)*3. However, k does not contain all information that (i,j) or l contains. | |||
* a<sub>h</sub> and b<sub>h</sub> are sub-vectors of A in such a way that a<sub>h</sub> = A<sub>h,m</sub> and b<sub>h</sub> = A<sub>h,n</sub>, where m and n (m < n) are values from index l. | |||
# Expand the missing index values of the Hypotheses table to create the full A. | |||
# Take the sudoku data table and replace hypotheses with data, if available. | |||
# Compare all two-cell pairs at once using matrices. | |||
## Create matrices of all critical properties: SudRow, SudCol, SudArea, and Hypothesis. These are compared pairwise, two cells at a time. | |||
## Use rules to deduce if a pair is incompatible or not. | |||
## Aggregate plausible hypotheses in the second cell across all plausible values in the first cell. This gives a set of hypotheses that are plausible in at least some conditions. | |||
## Aggregate along the first cell: each second cell must be compatible with all other (first) cells. | |||
## Do not apply these rules if the first and second cells are the same. | |||
# Go through every row, column, and area and find hypotheses where there is exactly one cell where it is plausible. | |||
## Remove all other hypotheses from these cells. | |||
# Do steps 3 and 4 until the sudoku does not improve (i.e., further hypotheses are not falsified). | |||
# Take a user-defined list of cells for which a random sample from plausible hypotheses is taken. It would be elegant to do this stepwise, but for simplicity, let's do it at once. Therefore, the cells that the user selects is critical, and a wise user will not select cells where uncertainties are clearly interdependent. | |||
# Solve all sudokus that are created in the sampling. | |||
# If a sudoku results in cells with zero plausible hypotheses, remove that iteration. | |||
# Calculate the number of different solutions still plausible and print it. | |||
# If the number is smaller than 100, print also the solutions. | |||
== Rationale == | |||
{{defend|# |Development needs to improve the solver: | |||
# Change sudoku input string in a way that it can have also other possibilities than a single number or any number. For example, syntax [3578] would mean that any of those four numbers could be in a particular position. This makes the interpretation of the sudoku string a bit problematic, because then its length is no longer fixed to 81. However, this way it is possible to restrict possibilities iterativerly. | |||
# When guessing, a new rule should be used: if a particular hypothesis is falsified in ALL scenarios, it can be falsified always. This rule can be extended in a way that when no unique solution is found, the solver would make guesses automatically and gradually falsify all wrong hypotheses.|--[[User:Jouni|Jouni]] ([[User talk:Jouni|talk]]) 09:36, 31 January 2016 (UTC)}} | |||
=== Data === | |||
This table will be expanded by fillna to be a 9*9*9 array (formatted as data.frame). As default, each hypothesis is assumed to be true unless shown otherwise. | |||
; List of all possible hypotheses, which are ''a priori'' assumed to be true. | |||
<t2b name="Hypotheses" index="SudRow,SudCol,Hypothesis" obs="Result" unit="Boolean"> | |||
||1|TRUE | |||
||2|TRUE | |||
||3|TRUE | |||
||4|TRUE | |||
||5|TRUE | |||
||6|TRUE | |||
||7|TRUE | |||
||8|TRUE | |||
||9|TRUE | |||
|1||TRUE | |||
|2||TRUE | |||
|3||TRUE | |||
|4||TRUE | |||
|5||TRUE | |||
|6||TRUE | |||
|7||TRUE | |||
|8||TRUE | |||
|9||TRUE | |||
1|||TRUE | |||
2|||TRUE | |||
3|||TRUE | |||
4|||TRUE | |||
5|||TRUE | |||
6|||TRUE | |||
7|||TRUE | |||
8|||TRUE | |||
9|||TRUE | |||
</t2b> | |||
; Rules of inference: the table is actually for illustration only because the code is too complex to implement from a table entry. Rules 1-4 come directly from the rules of the sudoku game. All other rules are logically derived from them. | |||
<t2b name="Rules" index="Rule name" obs="Rule" desc="Description" unit="-"> | |||
rule1|If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same row as A.| | |||
rule2|If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same column as A.| | |||
rule3|If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same area as A.| | |||
rule4|If cells A and B are actually the same cell, no inferences are made.|This is a stronger rule than others. | |||
rule5|If a hypothesis in cell B is TRUE at least once given all hypotheses in cell A, the hypothesis in B is considered TRUE.|This rule loses a lot of information about dependencies between cells, but it saves huge amounts of memory. In any case, even then, the set of rules effectively narrow down the potential hypothesis space. | |||
rule6|Rules 1-5 apply to all pairs of cells A and B.| | |||
rule7|If a hypothesis in cell B is FALSE after applying rule 5 for even one cell A, it is FALSE always.| | |||
rule8|If a hypothesis is TRUE in exactly one cell in a particular row, all other hypothesis in that cell are FALSE.| | |||
rule9|If a hypothesis is TRUE in exactly one cell in a particular column, all other hypothesis in that cell are FALSE.| | |||
rule10|If a hypothesis is TRUE in exactly one cell in a particular area, all other hypothesis in that cell are FALSE.| | |||
rule11|If two cells on the same row contain exactly two TRUE hypotheses that are the same in both cells, these hypotheses cannot be TRUE in any other cell on that row.|This rule is analogous to rule1 but with two cells. However, this rule (and the respective rules for columns and areas) are not implemented in the code. | |||
</t2b> | |||
;Sudoku data (this is "the most difficult sudoku in thr world"). | |||
=== | <t2b name="Sudoku" index="SudRow,SudCol" locations="1,2,3,4,5,6,7,8,9" unit="-"> | ||
1|8|||||||| | |||
2|||3|6||||| | |||
3||7|||9||2|| | |||
4||5||||7||| | |||
5|||||4|5|7|| | |||
6||||1||||3| | |||
7|||1|||||6|8 | |||
8|||8|5||||1| | |||
9||9|||||4|| | |||
</t2b> | |||
=== Formula === | === Formula === | ||
<rcode variables=" | |||
name:userdata|description:Enter sudoku as a string, with spaces in empty cells|type:text| | |||
name:cellguess|description:If you want to guess, enter the (list of) cell(s) here| | |||
name:verbose|description:Do you want to see intermediate results?|type:selection|options:FALSE;No;TRUE;Yes|default:FALSE | |||
"> | |||
library(OpasnetUtils) | |||
hypotheses <- tidy(opbase.data("Op_en5817.hypotheses")) | |||
for(i in 1:3) { | |||
hypotheses[[i]] <- ifelse(hypotheses[[i]] == "", NA, as.integer(as.character(hypotheses[[i]]))) | |||
} | |||
hypotheses <- unique(fillna(hypotheses, marginals = c(1, 2, 3))) | |||
# Get the sudoku data | |||
# data <- tidy(opbase.data("Op_en5817.sudoku"), objname="data") | |||
# data$SudRow <- as.numeric(data$SudRow) | |||
# data$SudCol <- as.numeric(data$SudCol) | |||
# Maailman vaikein | |||
data <- " | |||
8 | |||
36 | |||
7 9 2 | |||
5 7 | |||
457 | |||
1 3 | |||
1 68 | |||
85 1 | |||
9 4 | |||
" | |||
data <- " | |||
4 2 9 | |||
58 6 7 | |||
5 8 | |||
751 8 4 | |||
4 | |||
8 2 765 | |||
7 2 | |||
2 1 68 | |||
1 3 9 | |||
" | |||
data <- " | |||
3 8 2 | |||
5 8 | |||
6 71 | |||
4 2 | |||
124 967 | |||
9 4 | |||
35 4 | |||
2 6 | |||
3 2 6 | |||
" | |||
data <- " | |||
45 81 | |||
2 6 5 4 | |||
9 1 | |||
4 9 8 | |||
89 2 45 | |||
5 8 9 | |||
6 2 | |||
1 7 9 3 | |||
35 41 | |||
" | |||
# Neljä tähteä | |||
data <- " | |||
23 9 | |||
5 6 | |||
9 76 5 | |||
6 38 | |||
4 523 7 | |||
39 4 | |||
2 94 8 | |||
4 3 | |||
5 72 | |||
" | |||
if(!is.null(userdata)) data <- userdata | |||
hypotheses <- data.frame( | |||
SudRow = rep(rep(1:9, each = 9), times = 9), | |||
SudCol = rep(rep(1:9, each = 9), each = 9), | |||
Hypothesis = rep(rep(1:9, times = 9), times = 9), | |||
Result = TRUE | |||
) | |||
SudMake <- function(hypotheses, data) { | |||
hypotheses$SudRow <- as.numeric(hypotheses$SudRow) | |||
hypotheses$SudCol <- as.numeric(hypotheses$SudCol) | |||
hypotheses$SudCell = hypotheses$SudRow + (hypotheses$SudCol - 1) * 9 | |||
hypotheses$SudArea = ceiling(hypotheses$SudRow / 3) + (ceiling(hypotheses$SudCol / 3) - 1) * 3 | |||
hypotheses$Result <- as.logical(hypotheses$Result) | |||
data <- data.frame( | |||
SudRow = rep(1:9, each = 9), | |||
SudCol = rep(1:9, times = 9), | |||
dataResult = gsub(" ", "", strsplit(gsub("\n", "", data), split = "")[[1]]) | |||
) | |||
# oprint(tapply(data["Hypothesis"], data[c("SudRow", "SudCol")], function(x) paste(x, sep = "", collapse = ""))) | |||
out <- merge(hypotheses, data) | |||
out$dataResult <- as.character(out$dataResult) | |||
out$Result <- ifelse(out$dataResult == "", out$Result, (as.character(out$dataResult) == as.character(out$Hypothesis))) | |||
return(out) | |||
} | |||
ykköskarsinta <- function(out2, condition) { | |||
valinta <- as.data.frame(as.table(tapply(out2$Result, out2[c("Hypothesis", condition)], sum) == 1)) # haetaan ehto-hypoteesikombinaatiot | |||
valinta <- valinta[valinta$Freq, colnames(valinta) != "Freq"] # rajataan ainoisiin ratkaisuihin | |||
valinta <- merge(valinta, out2) # yhdistetään SudCell-tietoon | |||
valinta <- valinta[valinta$Result , ] # poistetaan turhat | |||
#print(valinta$SudCell) | |||
out2[out2$SudCell %in% valinta$SudCell , ]$Result <- FALSE #hylätään aluksi kaikki hypoteesit ykkösruuduista | |||
#print(paste(out2$Hypothesis, out2$SudCell) %in% paste(valinta$Hypothesis, valinta$SudCell)) | |||
out2[paste(out2$Hypothesis, out2$SudCell) %in% paste(valinta$Hypothesis, valinta$SudCell) , ]$Result <- TRUE # palautetaan oikea vastaus ykkösruutuihin | |||
return(out2) | |||
} | |||
SudSolve <- function(sudoku, verbose = FALSE) { | |||
iteraatio <- 1 | |||
repeat{ | |||
if(verbose) {print(SudShow(sudoku))} | |||
samerow <- matrix(sudoku$SudRow, nrow = nrow(sudoku), ncol = nrow(sudoku)) # rule6 | |||
samerow <- samerow == t(samerow) | |||
samecol <- matrix(sudoku$SudCol, nrow = nrow(sudoku), ncol = nrow(sudoku)) | |||
samecol <- samecol == t(samecol) | |||
samearea <- matrix(sudoku$SudArea, nrow = nrow(sudoku), ncol = nrow(sudoku)) | |||
samearea <- samearea == t(samearea) | |||
samehypo <- matrix(sudoku$Hypothesis, nrow = nrow(sudoku), ncol = nrow(sudoku)) | |||
samehypo <- samehypo == t(samehypo) | |||
samecell <- matrix(sudoku$SudCell, nrow = nrow(sudoku), ncol = nrow(sudoku)) | |||
samecell <- samecell == t(samecell) | |||
rule1 <- ! (samerow & samehypo) | |||
rule2 <- ! (samecol & samehypo) | |||
rule3 <- ! (samearea & samehypo) | |||
# rule4 <- ifelse(samecell, NA, TRUE) | |||
temp <- sudoku$Result & rule1 & rule2 & rule3 #& rule4 | |||
temp <- ifelse(samecell, NA, temp) # Tämän säännön täytyy ajaa yli kaikkien muiden. | |||
temp2 <- apply(temp, 2, function(x) tapply(x, sudoku$SudCell, any)) # rule5 | |||
temp3 <- apply(temp2, 2, function(x) { # rule7 | |||
x <- ifelse(is.na(x), TRUE, x) | |||
return(all(x)) | |||
}) | |||
sudtemp <- sudoku | |||
sudtemp$Result <- ifelse(sudtemp$Result, temp3, sudtemp$Result) | |||
if(any(tapply(sudtemp$Result, sudtemp["SudCell"], sum) == 0)) { | |||
if(verbose) {warning("Implausible solution\n")} | |||
return(sudtemp) | |||
} | |||
sudtemp <- ykköskarsinta(sudtemp, "SudCol") # rule8 | |||
sudtemp <- ykköskarsinta(sudtemp, "SudRow") # rule9 | |||
sudtemp <- ykköskarsinta(sudtemp, "SudArea") # rule10 | |||
iteraatio <- iteraatio + 1 | |||
test <- all(sudtemp$Result == sudoku$Result) | |||
if(verbose) cat("Has the solution changed during iteration", iteraatio, "?", !test, "\n") | |||
sudoku <- sudtemp | |||
if(test | iteraatio > 15) {break} | |||
} | |||
return(sudoku) | |||
} | |||
SudShow <- function(sudoku) { | |||
if(class(sudoku) == "data.frame") {sudoku <- list(sudoku)} | |||
for(i in 1:length(sudoku)) { | |||
out <- sudoku[[i]] | |||
out <- out[out$Result , ] | |||
out <- tapply(out$Hypothesis, out[c("SudRow", "SudCol")], function(x) paste(x, sep = "", collapse = "")) | |||
if(exists("oprint")) oprint(out) else print(out) | |||
} | |||
} | |||
SudGuess <- function(sudoku, cellguess, verbose = FALSE) { | |||
if(class(sudoku) == "data.frame") {sudokulist <- list(sudoku)} else {sudokulist <- sudoku} | |||
for(i in cellguess) { # Jokainen lisäruutu erikseen | |||
if(verbose) cat("Guessing at cell", i, "\n") | |||
lap <- 0 | |||
templist <- list() | |||
for(j in 1:length(sudokulist)) { # Jokainen sudokuskenaario erikseen. | |||
sudo <- sudokulist[[j]] | |||
nscen <- nrow(sudo[sudo$SudCell %in% i & sudo$Result , ]) | |||
for(k in 1:nscen) { # Jokainen valitun ruudun mahdollinen vaihtoehto erikseen. | |||
faagi <- rep(FALSE, nscen) | |||
faagi[k] <- TRUE | |||
temp <- sudo | |||
temp[temp$SudCell %in% i & temp$Result, "Result"] <- faagi | |||
temp <- SudSolve(temp) | |||
if(verbose) {print(SudShow(temp))} | |||
if(!any(tapply(temp$Result, temp["SudCell"], sum) == 0)) {templist[[lap + k]] <- temp} | |||
} | |||
lap <- lap + k | |||
} | |||
templist <- templist[!sapply(templist, is.null)] | |||
sudokulist <- templist | |||
} | |||
if(length(sudokulist) == 1) {sudokulist <- sudokulist[[1]]} | |||
return(sudokulist) | |||
} | |||
example <- SudMake(hypotheses, data) | |||
SudShow(example) | |||
example <- SudSolve(example, verbose = verbose) | |||
SudShow(example) | |||
if(!is.null(cellguess)) { | |||
example <- SudGuess(example, cellguess, verbose = verbose) | |||
SudShow(example) | |||
} | |||
</rcode> | |||
==See also== | ==See also== |
Latest revision as of 09:36, 31 January 2016
Moderator:Jouni (see all) |
This page is a stub. You may improve it into a full page. |
Upload data
|
Question
How to describe a sudoku and the sudoku rules in Opasnet so that it can be solved automatically?
Answer
Procedure
The following terms are used:
- The possible space of solutions is described by a logical vector A, which contains all possible values given the current information. A is indexed by h, i, j, k, and l. A develops in time, when more information occurs or is processed. In the beginning, all hypotheses are considered as potentially TRUE.
- h = {1, 2, ..., 9}, i = {1, 2, ..., 9}, j = {1, 2, ..., 9}, k = {1, 2, ..., 9}, l = {1, 2, ..., 81} are indices for hypothesis, row, column, area, and cell, respectively. Note l is just another way to say (i,j), as l = i + (j-1)*9. Also, k is known if (i,j) is known, as k = ceiling(i/3) + (ceiling(j/3)-1)*3. However, k does not contain all information that (i,j) or l contains.
- ah and bh are sub-vectors of A in such a way that ah = Ah,m and bh = Ah,n, where m and n (m < n) are values from index l.
- Expand the missing index values of the Hypotheses table to create the full A.
- Take the sudoku data table and replace hypotheses with data, if available.
- Compare all two-cell pairs at once using matrices.
- Create matrices of all critical properties: SudRow, SudCol, SudArea, and Hypothesis. These are compared pairwise, two cells at a time.
- Use rules to deduce if a pair is incompatible or not.
- Aggregate plausible hypotheses in the second cell across all plausible values in the first cell. This gives a set of hypotheses that are plausible in at least some conditions.
- Aggregate along the first cell: each second cell must be compatible with all other (first) cells.
- Do not apply these rules if the first and second cells are the same.
- Go through every row, column, and area and find hypotheses where there is exactly one cell where it is plausible.
- Remove all other hypotheses from these cells.
- Do steps 3 and 4 until the sudoku does not improve (i.e., further hypotheses are not falsified).
- Take a user-defined list of cells for which a random sample from plausible hypotheses is taken. It would be elegant to do this stepwise, but for simplicity, let's do it at once. Therefore, the cells that the user selects is critical, and a wise user will not select cells where uncertainties are clearly interdependent.
- Solve all sudokus that are created in the sampling.
- If a sudoku results in cells with zero plausible hypotheses, remove that iteration.
- Calculate the number of different solutions still plausible and print it.
- If the number is smaller than 100, print also the solutions.
Rationale
←--#: . Development needs to improve the solver:
- Change sudoku input string in a way that it can have also other possibilities than a single number or any number. For example, syntax [3578] would mean that any of those four numbers could be in a particular position. This makes the interpretation of the sudoku string a bit problematic, because then its length is no longer fixed to 81. However, this way it is possible to restrict possibilities iterativerly.
- When guessing, a new rule should be used: if a particular hypothesis is falsified in ALL scenarios, it can be falsified always. This rule can be extended in a way that when no unique solution is found, the solver would make guesses automatically and gradually falsify all wrong hypotheses. --Jouni (talk) 09:36, 31 January 2016 (UTC) (type: truth; paradigms: science: defence)
Data
This table will be expanded by fillna to be a 9*9*9 array (formatted as data.frame). As default, each hypothesis is assumed to be true unless shown otherwise.
- List of all possible hypotheses, which are a priori assumed to be true.
Obs | SudRow | SudCol | Hypothesis | Result |
---|---|---|---|---|
1 | 1 | TRUE | ||
2 | 2 | TRUE | ||
3 | 3 | TRUE | ||
4 | 4 | TRUE | ||
5 | 5 | TRUE | ||
6 | 6 | TRUE | ||
7 | 7 | TRUE | ||
8 | 8 | TRUE | ||
9 | 9 | TRUE | ||
10 | 1 | TRUE | ||
11 | 2 | TRUE | ||
12 | 3 | TRUE | ||
13 | 4 | TRUE | ||
14 | 5 | TRUE | ||
15 | 6 | TRUE | ||
16 | 7 | TRUE | ||
17 | 8 | TRUE | ||
18 | 9 | TRUE | ||
19 | 1 | TRUE | ||
20 | 2 | TRUE | ||
21 | 3 | TRUE | ||
22 | 4 | TRUE | ||
23 | 5 | TRUE | ||
24 | 6 | TRUE | ||
25 | 7 | TRUE | ||
26 | 8 | TRUE | ||
27 | 9 | TRUE |
- Rules of inference
- the table is actually for illustration only because the code is too complex to implement from a table entry. Rules 1-4 come directly from the rules of the sudoku game. All other rules are logically derived from them.
Obs | Rule name | Rule | Description |
---|---|---|---|
1 | rule1 | If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same row as A. | |
2 | rule2 | If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same column as A. | |
3 | rule3 | If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same area as A. | |
4 | rule4 | If cells A and B are actually the same cell, no inferences are made. | This is a stronger rule than others. |
5 | rule5 | If a hypothesis in cell B is TRUE at least once given all hypotheses in cell A, the hypothesis in B is considered TRUE. | This rule loses a lot of information about dependencies between cells, but it saves huge amounts of memory. In any case, even then, the set of rules effectively narrow down the potential hypothesis space. |
6 | rule6 | Rules 1-5 apply to all pairs of cells A and B. | |
7 | rule7 | If a hypothesis in cell B is FALSE after applying rule 5 for even one cell A, it is FALSE always. | |
8 | rule8 | If a hypothesis is TRUE in exactly one cell in a particular row, all other hypothesis in that cell are FALSE. | |
9 | rule9 | If a hypothesis is TRUE in exactly one cell in a particular column, all other hypothesis in that cell are FALSE. | |
10 | rule10 | If a hypothesis is TRUE in exactly one cell in a particular area, all other hypothesis in that cell are FALSE. | |
11 | rule11 | If two cells on the same row contain exactly two TRUE hypotheses that are the same in both cells, these hypotheses cannot be TRUE in any other cell on that row. | This rule is analogous to rule1 but with two cells. However, this rule (and the respective rules for columns and areas) are not implemented in the code. |
- Sudoku data (this is "the most difficult sudoku in thr world").
Obs | SudRow | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 8 | ||||||||
2 | 2 | 3 | 6 | |||||||
3 | 3 | 7 | 9 | 2 | ||||||
4 | 4 | 5 | 7 | |||||||
5 | 5 | 4 | 5 | 7 | ||||||
6 | 6 | 1 | 3 | |||||||
7 | 7 | 1 | 6 | 8 | ||||||
8 | 8 | 8 | 5 | 1 | ||||||
9 | 9 | 9 | 4 |
Formula
See also
- file:Sudoku solver.ANA (a draft solver using Analytica)
Keywords
References
Related files
<mfanonymousfilelist></mfanonymousfilelist>