Sudoku solver: Difference between revisions

From Opasnet
Jump to navigation Jump to search
mNo edit summary
(→‎Rationale: improvement ideas for later)
 
(28 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Decision analysis]]
[[Category:Decision analysis]]
[[Category:Contains R code]]
{{method|moderator=Jouni|stub=Yes}}
{{method|moderator=Jouni|stub=Yes}}


Line 8: Line 9:
== Answer ==
== Answer ==


You need the following tables.
===Procedure===


{| {{prettytable}}
The following terms are used:
|+ '''Hypotheses
* The possible space of solutions is described by a logical vector A, which contains all possible values given the current information. A is indexed by h, i, j, k, and l. A develops in time, when more information occurs or is processed. In the beginning, all hypotheses are considered as potentially TRUE.
! Row || Column || Result || Description
* h = {1, 2, ..., 9}, i = {1, 2, ..., 9}, j = {1, 2, ..., 9}, k = {1, 2, ..., 9}, l = {1, 2, ..., 81} are indices for hypothesis, row, column, area, and cell, respectively. Note l is just another way to say (i,j), as l = i + (j-1)*9. Also, k is known if (i,j) is known, as k = ceiling(i/3) + (ceiling(j/3)-1)*3. However, k does not contain all information that (i,j) or l contains.
|----
* a<sub>h</sub> and b<sub>h</sub> are sub-vectors of A in such a way that a<sub>h</sub> = A<sub>h,m</sub> and b<sub>h</sub> = A<sub>h,n</sub>, where m and n (m < n) are values from index l.
| All || All || 1,2,3,4,5,6,7,8,9 || For all row and column locations it applies that the plausible hypotheses are a single integer between 1 and 9 (unless more information is available).
|}




{| {{prettytable}}
# Expand the missing index values of the Hypotheses table to create the full A.
|+ '''Area descriptions
# Take the sudoku data table and replace hypotheses with data, if available.
! Row || Column || Area
# Compare all two-cell pairs at once using matrices.
|----
## Create matrices of all critical properties: SudRow, SudCol, SudArea, and Hypothesis. These are compared pairwise, two cells at a time.
|| 1|| 1|| A
## Use rules to deduce if a pair is incompatible or not.
|----
## Aggregate plausible hypotheses in the second cell across all plausible values in the first cell. This gives a set of hypotheses that are plausible in at least some conditions.
|| 1|| 2|| A
## Aggregate along the first cell: each second cell must be compatible with all other (first) cells.
|----
## Do not apply these rules if the first and second cells are the same.
|| 1|| 3|| A
# Go through every row, column, and area and find hypotheses where there is exactly one cell where it is plausible.
|----
## Remove all other hypotheses from these cells.
|| 1|| 4|| B
# Do steps 3 and 4 until the sudoku does not improve (i.e., further hypotheses are not falsified).
|----
# Take a user-defined list of cells for which a random sample from plausible hypotheses is taken. It would be elegant to do this stepwise, but for simplicity, let's do it at once. Therefore, the cells that the user selects is critical, and a wise user will not select cells where uncertainties are clearly interdependent.
|| …|| ||
# Solve all sudokus that are created in the sampling.
|----
# If a sudoku results in cells with zero plausible hypotheses, remove that iteration.
|| 2|| 1|| A
# Calculate the number of different solutions still plausible and print it.
|----
# If the number is smaller than 100, print also the solutions.
|| …|| ||
 
|----
== Rationale ==
|| 4|| 1|| D
|----
|| …|| ||
|----
|| 9|| 9|| I
|}


{{defend|# |Development needs to improve the solver:
# Change sudoku input string in a way that it can have also other possibilities than a single number or any number. For example, syntax [3578] would mean that any of those four numbers could be in a particular position. This makes the interpretation of the sudoku string a bit problematic, because then its length is no longer fixed to 81. However, this way it is possible to restrict possibilities iterativerly.
# When guessing, a new rule should be used: if a particular hypothesis is falsified in ALL scenarios, it can be falsified always. This rule can be extended in a way that when no unique solution is found, the solver would make guesses automatically and gradually falsify all wrong hypotheses.|--[[User:Jouni|Jouni]] ([[User talk:Jouni|talk]]) 09:36, 31 January 2016 (UTC)}}


{| {{prettytable}}
=== Data ===
|+ '''Rules of exclusion when comparing two cells.
! Property1 || Condition1|| Property2|| Condition2|| Rule|| Description
|----
| Row
| Same
| Column
| Different
| Same integer not allowed
| Two cells with the same row and different column are not allowed to have the same integer.
|----
| Row
| Different
| Column
| Same
| Same integer not allowed
| Two cells with the different row and same column are not allowed to have the same integer.
|----
| Area
| Same
| Column
| Different
| Same integer not allowed
| Two cells with the same area and different column are not allowed to have the same integer.
|----
| Area
| Same
| Row
| Different
| Same integer not allowed
| Two cells with the same area and different row are not allowed to have the same integer.
|----
|}


{| {{prettytable}}
This table will be expanded by fillna to be a 9*9*9 array (formatted as data.frame). As default, each hypothesis is assumed to be true unless shown otherwise.
|+ '''The sudoku data (this example is "the most difficult sudoku in the world")
! rowspan="2"| Row
!colspan="9"| Column
|----
! 1|| 2|| 3|| 4|| 5|| 6|| 7|| 8|| 9
|----
|| '''1'''|| 8|| || || || || || || ||
|----
|| '''2'''|| || || 3|| 6|| || || || ||
|----
|| '''3'''|| || 7|| || || 9|| || 2|| ||
|----
|| '''4'''|| || 5|| || || || 7|| || ||
|----
|| '''5'''|| || || || || 4|| 5|| 7|| ||
|----
|| '''6'''|| || || || 1|| || || || 3||
|----
|| '''7'''|| || || 1|| || || || || 6|| 8
|----
|| '''8'''|| || || 8|| 5|| || || || 1||
|----
|| '''9'''|| || 9|| || || || || 4|| ||
|----
|}


===Procedure===
; List of all possible hypotheses, which are ''a priori'' assumed to be true.


# Expand the "All" from the Hypothesis table to create a row for the hypothesis of each cell.
<t2b name="Hypotheses" index="SudRow,SudCol,Hypothesis" obs="Result" unit="Boolean">
# Compare two cells in the sudoku. Make a for loop the first cell: for(i in 1:nrow(hypothesis)).
||1|TRUE
## Make another for loop for the second cell: for(j in (i+1):nrow(hypothesis)).
||2|TRUE
### Make a third loop for all rules: for(k in 1:nrow(rules)).
||3|TRUE
### Test for the rule with the pair of cells, creating a set of plausible hypothesis for one cell conditional on the other cell.
||4|TRUE
### If a set is empty, the condition is implausible; remove the condition and thus that hypothesis from the other cell.
||5|TRUE
### Take the union of plausible hypothesis (which then covers all plausible hypotheses unconditionally.
||6|TRUE
### Do the same comparison for the other cell conditional on the first one.
||7|TRUE
# If a unique solution was not found and if the current set of hypotheses is not the same as the previous set, save the current set as "previous set" and go to number 2.
||8|TRUE
# Calculate the number of different solutions still plausible and print it.
||9|TRUE
# If the number is smaller than 100, print also the solutions.
|1||TRUE
|2||TRUE
|3||TRUE
|4||TRUE
|5||TRUE
|6||TRUE
|7||TRUE
|8||TRUE
|9||TRUE
1|||TRUE
2|||TRUE
3|||TRUE
4|||TRUE
5|||TRUE
6|||TRUE
7|||TRUE
8|||TRUE
9|||TRUE
</t2b>


== Rationale ==
; Rules of inference: the table is actually for illustration only because the code is too complex to implement from a table entry. Rules 1-4 come directly from the rules of the sudoku game. All other rules are logically derived from them.


<t2b name="Rules" index="Rule name" obs="Rule" desc="Description" unit="-">
rule1|If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same row as A.|
rule2|If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same column as A.|
rule3|If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same area as A.|
rule4|If cells A and B are actually the same cell, no inferences are made.|This is a stronger rule than others.
rule5|If a hypothesis in cell B is TRUE at least once given all hypotheses in cell A, the hypothesis in B is considered TRUE.|This rule loses a lot of information about dependencies between cells, but it saves huge amounts of memory. In any case, even then, the set of rules effectively narrow down the potential hypothesis space.
rule6|Rules 1-5 apply to all pairs of cells A and B.|
rule7|If a hypothesis in cell B is FALSE after applying rule 5 for even one cell A, it is FALSE always.|
rule8|If a hypothesis is TRUE in exactly one cell in a particular row, all other hypothesis in that cell are FALSE.|
rule9|If a hypothesis is TRUE in exactly one cell in a particular column, all other hypothesis in that cell are FALSE.|
rule10|If a hypothesis is TRUE in exactly one cell in a particular area, all other hypothesis in that cell are FALSE.|
rule11|If two cells on the same row contain exactly two TRUE hypotheses that are the same in both cells, these hypotheses cannot be TRUE in any other cell on that row.|This rule is analogous to rule1 but with two cells. However, this rule (and the respective rules for columns and areas) are not implemented in the code.
</t2b>


;Sudoku data (this is "the most difficult sudoku in thr world").


=== Dependencies ===
<t2b name="Sudoku" index="SudRow,SudCol" locations="1,2,3,4,5,6,7,8,9" unit="-">
1|8||||||||
2|||3|6|||||
3||7|||9||2||
4||5||||7|||
5|||||4|5|7||
6||||1||||3|
7|||1|||||6|8
8|||8|5||||1|
9||9|||||4||
</t2b>


=== Formula ===
=== Formula ===
<rcode variables="
name:userdata|description:Enter sudoku as a string, with spaces in empty cells|type:text|
name:cellguess|description:If you want to guess, enter the (list of) cell(s) here|
name:verbose|description:Do you want to see intermediate results?|type:selection|options:FALSE;No;TRUE;Yes|default:FALSE
">
library(OpasnetUtils)
hypotheses <- tidy(opbase.data("Op_en5817.hypotheses"))
for(i in 1:3) {
hypotheses[[i]] <- ifelse(hypotheses[[i]] == "", NA, as.integer(as.character(hypotheses[[i]])))
}
hypotheses <- unique(fillna(hypotheses, marginals = c(1, 2, 3)))
# Get the sudoku data
# data <- tidy(opbase.data("Op_en5817.sudoku"), objname="data")
# data$SudRow <- as.numeric(data$SudRow)
# data$SudCol <- as.numeric(data$SudCol)
# Maailman vaikein
data <- "
8       
  36   
7  9 2 
5  7 
    457 
  1  3
  1    68
  85  1
9    4 
"
data <- "
  4 2  9
58 6  7
  5  8 
751 8  4
    4   
8  2 765
  7  2 
2  1 68
1  3 9 
"
data <- "
3  8 2 
  5  8
6 71   
4      2
124 967
9      4
    35 4
2  6 
  3 2  6
"
data <- "
  45  81
  2 6 5 4
9  1 
4  9  8 
89 2 45
  5  8  9
  6  2
1 7 9 3 
35  41 
"
# Neljä tähteä
data <- "
  23 9 
    5 6
9  76  5
  6  38
4  523  7
39  4 
2  94  8
4 3   
  5 72 
"
if(!is.null(userdata)) data <- userdata
hypotheses <- data.frame(
SudRow = rep(rep(1:9, each = 9), times = 9),
SudCol = rep(rep(1:9, each = 9), each = 9),
Hypothesis = rep(rep(1:9, times = 9), times = 9),
Result = TRUE
)
SudMake <- function(hypotheses, data) {
hypotheses$SudRow <- as.numeric(hypotheses$SudRow)
hypotheses$SudCol <- as.numeric(hypotheses$SudCol)
hypotheses$SudCell = hypotheses$SudRow + (hypotheses$SudCol - 1) * 9
hypotheses$SudArea = ceiling(hypotheses$SudRow / 3) + (ceiling(hypotheses$SudCol / 3) - 1) * 3
hypotheses$Result <- as.logical(hypotheses$Result)
data <- data.frame(
SudRow = rep(1:9, each = 9),
SudCol = rep(1:9, times = 9),
dataResult = gsub(" ", "", strsplit(gsub("\n", "", data), split = "")[[1]])
)
# oprint(tapply(data["Hypothesis"], data[c("SudRow", "SudCol")], function(x) paste(x, sep = "", collapse = "")))
out <- merge(hypotheses, data)
out$dataResult <- as.character(out$dataResult)
out$Result <- ifelse(out$dataResult == "", out$Result, (as.character(out$dataResult) == as.character(out$Hypothesis)))
return(out)
}
ykköskarsinta <- function(out2, condition) {
valinta <- as.data.frame(as.table(tapply(out2$Result, out2[c("Hypothesis", condition)], sum) == 1)) # haetaan ehto-hypoteesikombinaatiot
valinta <- valinta[valinta$Freq, colnames(valinta) != "Freq"] # rajataan ainoisiin ratkaisuihin
valinta <- merge(valinta, out2) # yhdistetään SudCell-tietoon
valinta <- valinta[valinta$Result , ] # poistetaan turhat
#print(valinta$SudCell)
out2[out2$SudCell %in% valinta$SudCell , ]$Result <- FALSE #hylätään aluksi kaikki hypoteesit ykkösruuduista
#print(paste(out2$Hypothesis, out2$SudCell) %in% paste(valinta$Hypothesis, valinta$SudCell))
out2[paste(out2$Hypothesis, out2$SudCell) %in% paste(valinta$Hypothesis, valinta$SudCell) , ]$Result <- TRUE # palautetaan oikea vastaus ykkösruutuihin
return(out2)
}
SudSolve <- function(sudoku, verbose = FALSE) {
iteraatio <- 1
repeat{
if(verbose) {print(SudShow(sudoku))}
samerow <- matrix(sudoku$SudRow, nrow = nrow(sudoku), ncol = nrow(sudoku)) # rule6
samerow <- samerow == t(samerow)
samecol <- matrix(sudoku$SudCol, nrow = nrow(sudoku), ncol = nrow(sudoku))
samecol <- samecol == t(samecol)
samearea <- matrix(sudoku$SudArea, nrow = nrow(sudoku), ncol = nrow(sudoku))
samearea <- samearea == t(samearea)
samehypo <- matrix(sudoku$Hypothesis, nrow = nrow(sudoku), ncol = nrow(sudoku))
samehypo <- samehypo == t(samehypo)
samecell <- matrix(sudoku$SudCell, nrow = nrow(sudoku), ncol = nrow(sudoku))
samecell <- samecell == t(samecell)
rule1 <- ! (samerow & samehypo)
rule2 <- ! (samecol & samehypo)
rule3 <- ! (samearea & samehypo)
# rule4 <- ifelse(samecell, NA, TRUE)
temp <- sudoku$Result & rule1 & rule2 & rule3 #& rule4
temp <- ifelse(samecell, NA, temp) # Tämän säännön täytyy ajaa yli kaikkien muiden.
temp2 <- apply(temp, 2, function(x) tapply(x, sudoku$SudCell, any)) # rule5
temp3 <- apply(temp2, 2, function(x) { # rule7
x <- ifelse(is.na(x), TRUE, x)
return(all(x))
})
sudtemp <- sudoku
sudtemp$Result <- ifelse(sudtemp$Result, temp3, sudtemp$Result)
if(any(tapply(sudtemp$Result, sudtemp["SudCell"], sum) == 0)) {
if(verbose) {warning("Implausible solution\n")}
return(sudtemp)
}
sudtemp <- ykköskarsinta(sudtemp, "SudCol") # rule8
sudtemp <- ykköskarsinta(sudtemp, "SudRow") # rule9
sudtemp <- ykköskarsinta(sudtemp, "SudArea") # rule10
iteraatio <- iteraatio + 1
test <- all(sudtemp$Result == sudoku$Result)
if(verbose) cat("Has the solution changed during iteration", iteraatio, "?", !test, "\n")
sudoku <- sudtemp
if(test | iteraatio > 15) {break}
}
return(sudoku)
}
SudShow <- function(sudoku) {
if(class(sudoku) == "data.frame") {sudoku <- list(sudoku)}
for(i in 1:length(sudoku)) {
out <- sudoku[[i]]
out <- out[out$Result , ]
out <- tapply(out$Hypothesis, out[c("SudRow", "SudCol")], function(x) paste(x, sep = "", collapse = ""))
if(exists("oprint")) oprint(out) else print(out)
}
}
SudGuess <- function(sudoku, cellguess, verbose = FALSE) {
if(class(sudoku) == "data.frame") {sudokulist <- list(sudoku)} else {sudokulist <- sudoku}
for(i in cellguess) { # Jokainen lisäruutu erikseen
if(verbose) cat("Guessing at cell", i, "\n")
lap <- 0
templist <- list()
for(j in 1:length(sudokulist)) { # Jokainen sudokuskenaario erikseen.
sudo <- sudokulist[[j]]
nscen <- nrow(sudo[sudo$SudCell %in% i & sudo$Result , ])
for(k in 1:nscen) { # Jokainen valitun ruudun mahdollinen vaihtoehto erikseen.
faagi <- rep(FALSE, nscen)
faagi[k] <- TRUE
temp <- sudo
temp[temp$SudCell %in% i & temp$Result, "Result"] <- faagi
temp <- SudSolve(temp)
if(verbose) {print(SudShow(temp))}
if(!any(tapply(temp$Result, temp["SudCell"], sum) == 0)) {templist[[lap + k]] <- temp}
}
lap <- lap + k
}
templist <- templist[!sapply(templist, is.null)]
sudokulist <- templist
}
if(length(sudokulist) == 1) {sudokulist <- sudokulist[[1]]}
return(sudokulist)
}
example <- SudMake(hypotheses, data)
SudShow(example)
example <- SudSolve(example, verbose = verbose)
SudShow(example)
if(!is.null(cellguess)) {
example <- SudGuess(example, cellguess, verbose = verbose)
SudShow(example)
}
</rcode>


==See also==
==See also==

Latest revision as of 09:36, 31 January 2016



Question

How to describe a sudoku and the sudoku rules in Opasnet so that it can be solved automatically?

Answer

Procedure

The following terms are used:

  • The possible space of solutions is described by a logical vector A, which contains all possible values given the current information. A is indexed by h, i, j, k, and l. A develops in time, when more information occurs or is processed. In the beginning, all hypotheses are considered as potentially TRUE.
  • h = {1, 2, ..., 9}, i = {1, 2, ..., 9}, j = {1, 2, ..., 9}, k = {1, 2, ..., 9}, l = {1, 2, ..., 81} are indices for hypothesis, row, column, area, and cell, respectively. Note l is just another way to say (i,j), as l = i + (j-1)*9. Also, k is known if (i,j) is known, as k = ceiling(i/3) + (ceiling(j/3)-1)*3. However, k does not contain all information that (i,j) or l contains.
  • ah and bh are sub-vectors of A in such a way that ah = Ah,m and bh = Ah,n, where m and n (m < n) are values from index l.


  1. Expand the missing index values of the Hypotheses table to create the full A.
  2. Take the sudoku data table and replace hypotheses with data, if available.
  3. Compare all two-cell pairs at once using matrices.
    1. Create matrices of all critical properties: SudRow, SudCol, SudArea, and Hypothesis. These are compared pairwise, two cells at a time.
    2. Use rules to deduce if a pair is incompatible or not.
    3. Aggregate plausible hypotheses in the second cell across all plausible values in the first cell. This gives a set of hypotheses that are plausible in at least some conditions.
    4. Aggregate along the first cell: each second cell must be compatible with all other (first) cells.
    5. Do not apply these rules if the first and second cells are the same.
  4. Go through every row, column, and area and find hypotheses where there is exactly one cell where it is plausible.
    1. Remove all other hypotheses from these cells.
  5. Do steps 3 and 4 until the sudoku does not improve (i.e., further hypotheses are not falsified).
  6. Take a user-defined list of cells for which a random sample from plausible hypotheses is taken. It would be elegant to do this stepwise, but for simplicity, let's do it at once. Therefore, the cells that the user selects is critical, and a wise user will not select cells where uncertainties are clearly interdependent.
  7. Solve all sudokus that are created in the sampling.
  8. If a sudoku results in cells with zero plausible hypotheses, remove that iteration.
  9. Calculate the number of different solutions still plausible and print it.
  10. If the number is smaller than 100, print also the solutions.

Rationale

←--#: . Development needs to improve the solver:

  1. Change sudoku input string in a way that it can have also other possibilities than a single number or any number. For example, syntax [3578] would mean that any of those four numbers could be in a particular position. This makes the interpretation of the sudoku string a bit problematic, because then its length is no longer fixed to 81. However, this way it is possible to restrict possibilities iterativerly.
  2. When guessing, a new rule should be used: if a particular hypothesis is falsified in ALL scenarios, it can be falsified always. This rule can be extended in a way that when no unique solution is found, the solver would make guesses automatically and gradually falsify all wrong hypotheses. --Jouni (talk) 09:36, 31 January 2016 (UTC) (type: truth; paradigms: science: defence)

Data

This table will be expanded by fillna to be a 9*9*9 array (formatted as data.frame). As default, each hypothesis is assumed to be true unless shown otherwise.

List of all possible hypotheses, which are a priori assumed to be true.
Hypotheses(Boolean)
ObsSudRowSudColHypothesisResult
11TRUE
22TRUE
33TRUE
44TRUE
55TRUE
66TRUE
77TRUE
88TRUE
99TRUE
101TRUE
112TRUE
123TRUE
134TRUE
145TRUE
156TRUE
167TRUE
178TRUE
189TRUE
191TRUE
202TRUE
213TRUE
224TRUE
235TRUE
246TRUE
257TRUE
268TRUE
279TRUE
Rules of inference
the table is actually for illustration only because the code is too complex to implement from a table entry. Rules 1-4 come directly from the rules of the sudoku game. All other rules are logically derived from them.
Rules(-)
ObsRule nameRuleDescription
1rule1If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same row as A.
2rule2If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same column as A.
3rule3If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same area as A.
4rule4If cells A and B are actually the same cell, no inferences are made.This is a stronger rule than others.
5rule5If a hypothesis in cell B is TRUE at least once given all hypotheses in cell A, the hypothesis in B is considered TRUE.This rule loses a lot of information about dependencies between cells, but it saves huge amounts of memory. In any case, even then, the set of rules effectively narrow down the potential hypothesis space.
6rule6Rules 1-5 apply to all pairs of cells A and B.
7rule7If a hypothesis in cell B is FALSE after applying rule 5 for even one cell A, it is FALSE always.
8rule8If a hypothesis is TRUE in exactly one cell in a particular row, all other hypothesis in that cell are FALSE.
9rule9If a hypothesis is TRUE in exactly one cell in a particular column, all other hypothesis in that cell are FALSE.
10rule10If a hypothesis is TRUE in exactly one cell in a particular area, all other hypothesis in that cell are FALSE.
11rule11If two cells on the same row contain exactly two TRUE hypotheses that are the same in both cells, these hypotheses cannot be TRUE in any other cell on that row.This rule is analogous to rule1 but with two cells. However, this rule (and the respective rules for columns and areas) are not implemented in the code.
Sudoku data (this is "the most difficult sudoku in thr world").
Sudoku(-)
ObsSudRow123456789
118
2236
33792
4457
55457
6613
77168
88851
9994

Formula

Enter sudoku as a string, with spaces in empty cells:

If you want to guess, enter the (list of) cell(s) here:

Do you want to see intermediate results?:

+ Show code

See also

Keywords

References


Related files

<mfanonymousfilelist></mfanonymousfilelist>