Sudoku solver: Difference between revisions

Revision as of 19:44, 3 May 2013

This page is a knowledge crystal of subtype method. The page identifier is Op_en5817
Moderator:Jouni (see all)
This page is a stub. You may improve it into a full page.
Upload data {{#opasnet_base_link:Op_en5817}}

Question

How to describe a sudoku and the sudoku rules in Opasnet so that it can be solved automatically?

Answer

You need the following tables.

**Hypotheses**
Row	Column	Result	Description
All	All	1,2,3,4,5,6,7,8,9	For all row and column locations it applies that the plausible hypotheses are a single integer between 1 and 9 (unless more information is available).

**Area descriptions**
Row	Column	Area
1	1	A
1	2	A
1	3	A
1	4	B
…
2	1	A
…
4	1	D
…
9	9	I

**Rules of exclusion when comparing two cells.**
Property1	Condition1	Property2	Condition2	Rule	Description
Row	Same	Column	Different	Same integer not allowed	Two cells with the same row and different column are not allowed to have the same integer.
Row	Different	Column	Same	Same integer not allowed	Two cells with the different row and same column are not allowed to have the same integer.
Area	Same	Column	Different	Same integer not allowed	Two cells with the same area and different column are not allowed to have the same integer.
Area	Same	Row	Different	Same integer not allowed	Two cells with the same area and different row are not allowed to have the same integer.

**The sudoku data (this example is "the most difficult sudoku in the world")**
Row	Column
Row	1	2	3	4	5	6	7	8	9
1	8
2			3	6
3		7			9		2
4		5				7
5					4	5	7
6				1				3
7			1					6	8
8			8	5				1
9		9					4

Procedure

The following terms are used:

The possible space of solutions is described by a logical vector A, which contains all possible values given the current information. A is indexed by h, i, j, k, and l. A develops in time, when more information occurs or is processed. In the beginning, all hypotheses are considered as potentially TRUE.
h = {1, 2, ..., 9}, i = {1, 2, ..., 9}, j = {1, 2, ..., 9}, k = {1, 2, ..., 9}, l = {1, 2, ..., 81} are indices for hypothesis, row, column, area, and cell, respectively. Note l is just another way to say (i,j), as l = i + (j-1)*9. Also, k is known if (i,j) is known, as k = ceiling(i/3) + (ceiling(j/3)-1)*3. However, k does not contain all information that (i,j) or l contains.
a_h and b_h are sub-vectors of A in such a way that a_h = A_h,m and b_h = A_h,n, where m and n (m < n) are values from index l.

Expand the missing index values of the Hypotheses table to create the full A.
Take the sudoku data table and replace hypotheses with data, if available.
Compare two cells a and b in the sudoku. Make a for-loop for the first cell a: for(m in 1:nrow(l))).
1. Make another for-loop for the second cell b: for(n in (m+1):nrow(l)).
  1. Make a third for-loop for the hypotheses of a: for(h in 1:nrow(a))
    1. Make a fourth for-loop for the rules: for(r in 1:nrow(rules)).
      1. Test for the rule with the pair of cells, creating a set of plausible hypothesis for the other cell b conditional on the first cell a.
      2. If the set of plausible hypotheses for b is empty, the condition is implausible; remove the condition and thus that hypothesis from the first cell a.
      3. Take the union of plausible hypothesis for b (which then covers all plausible hypotheses unconditionally) and replace the current hypotheses of b with the new hypotheses.
  2. Do the third for-loop again but now with the other cell b.
Do the second and first loops for all values of m and n.
Do another set of loops for each row, column, and area to test whether there is exactly one cell where each hypothesis is plausible. (Remember, that in sudoku, unlike in most parts of the world, we know that each hypothesis is correct at exactly one cell). If a unique cell is found, remove all other hypotheses from that cell.
1. Make a loop for each row and then hypothesis.
2. Make a loop for each column and then hypothesis.
3. Make a loop for each area and then hypothesis.
If a unique solution was not found and if the current set of hypotheses is not the same as the previous set, save the current set as "previous set" and go to number 2.
Calculate the number of different solutions still plausible and print it.
If the number is smaller than 100, print also the solutions.

Rationale

Data

This table will be expanded by fillna to be a 9*9*9 array (formatted as data.frame). As default, each hypothesis is assumed to be true unless shown otherwise.

Hypotheses(Boolean)
Obs	SudRow	SudCol	Hypothesis	Result
1			1	TRUE
2			2	TRUE
3			3	TRUE
4			4	TRUE
5			5	TRUE
6			6	TRUE
7			7	TRUE
8			8	TRUE
9			9	TRUE
10		1		TRUE
11		2		TRUE
12		3		TRUE
13		4		TRUE
14		5		TRUE
15		6		TRUE
16		7		TRUE
17		8		TRUE
18		9		TRUE
19	1			TRUE
20	2			TRUE
21	3			TRUE
22	4			TRUE
23	5			TRUE
24	6			TRUE
25	7			TRUE
26	8			TRUE
27	9			TRUE

Sudoku data

Sudoku(-)
Obs	SudRow	1	2	3	4	5	6	7	8	9
1	1	8
2	2			3	6
3	3		7			9		2
4	4		5				7
5	5					4	5	7
6	6				1				3
7	7			1					6	8
8	8			8	5				1
9	9		9					4

Formula

+ Show code - Hide code

data <- "8           3 6       7   9  2    5    7        4 5 7      1    3    1     6 8   8 5    1   9     4   "


data <- data.frame(
	SudRow = rep(1:9, each = 9), 
	SudCol = rep(1:9, times = 9), 
	dataResult = strsplit(data, split=" ")[[1]]
)

data <- "
  4 2   9
58 6   7 
   5  8  
 751 8  4
    4    
8  2 765 
  7  2   
 2   1 68
1   3 9  
"

data <- "
 3  8 2  
   5   8 
6 71     
 4      2
 124 967 
9      4 
     35 4
 2   6   
  3 2  6 
"
data <- data.frame(
	SudRow = rep(1:9, each = 9), 
	SudCol = rep(1:9, times = 9), 
	dataResult = gsub(" ", "", strsplit(gsub("\n", "", data), split = "")[[1]])
)

hypotheses <- data.frame(
	SudRow = rep(rep(1:9, each = 9), times = 9),
	SudCol = rep(rep(1:9, each = 9), each = 9),
	Hypothesis = rep(rep(1:9, times = 9), times = 9),
	Result = TRUE
)

library(OpasnetUtils)

#hypotheses <- tidy(opbase.data("Op_en5817.hypotheses"))
#for(i in 1:3) {
#	hypotheses[[i]] <- ifelse(hypotheses[[i]] == "", NA, as.integer(as.character(hypotheses[[i]])))
#}

#hypotheses <- unique(fillna(hypotheses, marginals = c(1, 2, 3)))
hypotheses$SudCell = as.factor(as.numeric(hypotheses$SudRow) + (as.numeric(hypotheses$SudCol) - 1) * 9)
hypotheses$SudArea = as.factor(ceiling(as.numeric(hypotheses$SudRow) / 3) + (ceiling(as.numeric(hypotheses$SudCol) / 3) - 1) * 3)
hypotheses$Result <- as.logical(hypotheses$Result)

# Get the sudoku data

# data <- tidy(opbase.data("Op_en5817.sudoku"), objname="data")

out <- merge(hypotheses, data)
out$dataResult <- as.character(out$dataResult)
out$Result <- ifelse(out$dataResult == "", out$Result, (as.character(out$dataResult) == as.character(out$Hypothesis)))

SudSolve <- function(sudoku) {

	temp <- sudoku["Result"]
	for(m in 1:nrow(sudoku)) {
		a <- sudoku[m , ]
		if(a$Result) {
			temp[[m]] <- ifelse(a$SudRow == sudoku$SudRow & a$Hypothesis == sudoku$Hypothesis, FALSE, sudoku$Result)
			temp[[m]] <- ifelse(a$SudCol == sudoku$SudCol & a$Hypothesis == sudoku$Hypothesis, FALSE, temp[[m]])
			temp[[m]] <- ifelse(a$SudArea == sudoku$SudArea & a$Hypothesis == sudoku$Hypothesis, FALSE, temp[[m]])
		} else {
			temp[[m]] <- FALSE
		}
		temp[[m]] <- ifelse(a$SudCell == sudoku$SudCell, NA, temp[[m]])
	}

	temp2 <- data.frame(Result = rep(NA, 9^2))
	temp3 <- rep(NA, 9^3)

	for(m in 1:nrow(sudoku)) {temp2[m] <- tapply(temp[[m]], sudoku$SudCell, any)}
	for(n in 1:nrow(sudoku)) {
		temp2[[n]] <- ifelse(is.na(temp2[[n]]), TRUE, temp2[[n]])
		temp3[n] <- all(temp2[[n]])
	}
	sudoku$Result <- temp3
	return(sudoku)
}

SudShow <- function(sudoku) {

	pasthyp <- function(x) {
		return(paste(x, sep = "", collapse = ""))
	}
	out <- aggregate(
		ifelse(sudoku$Result, sudoku$Hypothesis, ""), 
		sudoku[c("SudRow", "SudCol", "SudCell")], 
		pasthyp
	)

	out <- array(out[[4]], dim = c(9,9))
	return(out)
}

oprint(SudShow(out))
out2 <- SudSolve(out)
oprint(SudShow(out2))

#    ah and bh are sub-vectors of A in such a way that ah = Ah,m and bh = Ah,n, where m and n (m < n) are values from index l. 


#    Compare two cells a and b in the sudoku. Make a for-loop for the first cell a: for(m in 1:nrow(l))).
#        Make another for-loop for the second cell b: for(n in (m+1):nrow(l)).
#            Make a third for-loop for the hypotheses of a: for(h in 1:nrow(a))
#                Make a fourth for-loop for the rules: for(r in 1:nrow(rules)).
#                    Test for the rule with the pair of cells, creating a set of plausible hypothesis for the other cell b conditional on the first cell a.
#                    If the set of plausible hypotheses for b is empty, the condition is implausible; remove the condition and thus that hypothesis from the first cell a.
#                    Take the union of plausible hypothesis for b (which then covers all plausible hypotheses unconditionally) and replace the current hypotheses of b with the new hypotheses. 
#            Do the third for-loop again but now with the other cell b. 
#    Do the second and first loops for all values of m and n.
#    Do another set of loops for each row, column, and area to test whether there is exactly one cell where each hypothesis is plausible. (Remember, that in sudoku, unlike in most parts of the world, we know that each hypothesis is correct at exactly one cell). If a unique cell is found, remove all other hypotheses from that cell.
#        Make a loop for each row and then hypothesis.
#        Make a loop for each column and then hypothesis.
#        Make a loop for each area and then hypothesis. 
#    If a unique solution was not found and if the current set of hypotheses is not the same as the previous set, save the current set as "previous set" and go to number 2.
#    Calculate the number of different solutions still plausible and print it.
#    If the number is smaller than 100, print also the solutions.

⇤--#: . Not all the loops are needed. One row can be compared to all rows simultaneously, and then plausible hypotheses can be searched for using tapply. --Jouni 23:57, 2 May 2013 (EEST) (type: truth; paradigms: science: attack)

Keywords

References

Related files