Sudoku solver

This page is a knowledge crystal of subtype method. The page identifier is Op_en5817
Moderator:Jouni (see all)
This page is a stub. You may improve it into a full page.
Upload data {{#opasnet_base_link:Op_en5817}}

Question

How to describe a sudoku and the sudoku rules in Opasnet so that it can be solved automatically?

Answer

Procedure

The following terms are used:

The possible space of solutions is described by a logical vector A, which contains all possible values given the current information. A is indexed by h, i, j, k, and l. A develops in time, when more information occurs or is processed. In the beginning, all hypotheses are considered as potentially TRUE.
h = {1, 2, ..., 9}, i = {1, 2, ..., 9}, j = {1, 2, ..., 9}, k = {1, 2, ..., 9}, l = {1, 2, ..., 81} are indices for hypothesis, row, column, area, and cell, respectively. Note l is just another way to say (i,j), as l = i + (j-1)*9. Also, k is known if (i,j) is known, as k = ceiling(i/3) + (ceiling(j/3)-1)*3. However, k does not contain all information that (i,j) or l contains.
a_h and b_h are sub-vectors of A in such a way that a_h = A_h,m and b_h = A_h,n, where m and n (m < n) are values from index l.

Expand the missing index values of the Hypotheses table to create the full A.
Take the sudoku data table and replace hypotheses with data, if available.
Compare all two-cell pairs at once using matrices.
1. Create matrices of all critical properties: SudRow, SudCol, SudArea, and Hypothesis. These are compared pairwise, two cells at a time.
2. Use rules to deduce if a pair is incompatible or not.
3. Aggregate plausible hypotheses in the second cell across all plausible values in the first cell. This gives a set of hypotheses that are plausible in at least some conditions.
4. Aggregate along the first cell: each second cell must be compatible with all other (first) cells.
5. Do not apply these rules if the first and second cells are the same.
Go through every row, column, and area and find hypotheses where there is exactly one cell where it is plausible.
1. Remove all other hypotheses from these cells.
Do steps 3 and 4 until the sudoku does not improve (i.e., further hypotheses are not falsified).
Take a user-defined list of cells for which a random sample from plausible hypotheses is taken. It would be elegant to do this stepwise, but for simplicity, let's do it at once. Therefore, the cells that the user selects is critical, and a wise user will not select cells where uncertainties are clearly interdependent.
Solve all sudokus that are created in the sampling.
If a sudoku results in cells with zero plausible hypotheses, remove that iteration.
Calculate the number of different solutions still plausible and print it.
If the number is smaller than 100, print also the solutions.

Rationale

←--#: . Development needs to improve the solver:

Change sudoku input string in a way that it can have also other possibilities than a single number or any number. For example, syntax [3578] would mean that any of those four numbers could be in a particular position. This makes the interpretation of the sudoku string a bit problematic, because then its length is no longer fixed to 81. However, this way it is possible to restrict possibilities iterativerly.

When guessing, a new rule should be used: if a particular hypothesis is falsified in ALL scenarios, it can be falsified always. This rule can be extended in a way that when no unique solution is found, the solver would make guesses automatically and gradually falsify all wrong hypotheses. --Jouni (talk) 09:36, 31 January 2016 (UTC) (type: truth; paradigms: science: defence)

Data

This table will be expanded by fillna to be a 9*9*9 array (formatted as data.frame). As default, each hypothesis is assumed to be true unless shown otherwise.

List of all possible hypotheses, which are a priori assumed to be true.

Hypotheses(Boolean)
Obs	SudRow	SudCol	Hypothesis	Result
1			1	TRUE
2			2	TRUE
3			3	TRUE
4			4	TRUE
5			5	TRUE
6			6	TRUE
7			7	TRUE
8			8	TRUE
9			9	TRUE
10		1		TRUE
11		2		TRUE
12		3		TRUE
13		4		TRUE
14		5		TRUE
15		6		TRUE
16		7		TRUE
17		8		TRUE
18		9		TRUE
19	1			TRUE
20	2			TRUE
21	3			TRUE
22	4			TRUE
23	5			TRUE
24	6			TRUE
25	7			TRUE
26	8			TRUE
27	9			TRUE

Rules of inference: the table is actually for illustration only because the code is too complex to implement from a table entry. Rules 1-4 come directly from the rules of the sudoku game. All other rules are logically derived from them.

Rules(-)
Obs	Rule name	Rule	Description
1	rule1	If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same row as A.
2	rule2	If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same column as A.
3	rule3	If a hypothesis is TRUE in cell A, that hypothesis must be FALSE in cell B, if B is on the same area as A.
4	rule4	If cells A and B are actually the same cell, no inferences are made.	This is a stronger rule than others.
5	rule5	If a hypothesis in cell B is TRUE at least once given all hypotheses in cell A, the hypothesis in B is considered TRUE.	This rule loses a lot of information about dependencies between cells, but it saves huge amounts of memory. In any case, even then, the set of rules effectively narrow down the potential hypothesis space.
6	rule6	Rules 1-5 apply to all pairs of cells A and B.
7	rule7	If a hypothesis in cell B is FALSE after applying rule 5 for even one cell A, it is FALSE always.
8	rule8	If a hypothesis is TRUE in exactly one cell in a particular row, all other hypothesis in that cell are FALSE.
9	rule9	If a hypothesis is TRUE in exactly one cell in a particular column, all other hypothesis in that cell are FALSE.
10	rule10	If a hypothesis is TRUE in exactly one cell in a particular area, all other hypothesis in that cell are FALSE.
11	rule11	If two cells on the same row contain exactly two TRUE hypotheses that are the same in both cells, these hypotheses cannot be TRUE in any other cell on that row.	This rule is analogous to rule1 but with two cells. However, this rule (and the respective rules for columns and areas) are not implemented in the code.

Sudoku data (this is "the most difficult sudoku in thr world").

Sudoku(-)
Obs	SudRow	1	2	3	4	5	6	7	8	9
1	1	8
2	2			3	6
3	3		7			9		2
4	4		5				7
5	5					4	5	7
6	6				1				3
7	7			1					6	8
8	8			8	5				1
9	9		9					4

Formula

+ Show code - Hide code


library(OpasnetUtils)

hypotheses <- tidy(opbase.data("Op_en5817.hypotheses"))
for(i in 1:3) {
	hypotheses[[i]] <- ifelse(hypotheses[[i]] == "", NA, as.integer(as.character(hypotheses[[i]])))
}

hypotheses <- unique(fillna(hypotheses, marginals = c(1, 2, 3)))

# Get the sudoku data

# data <- tidy(opbase.data("Op_en5817.sudoku"), objname="data")
# data$SudRow <- as.numeric(data$SudRow)
# data$SudCol <- as.numeric(data$SudCol)

# Maailman vaikein

data <- "
8        
  36     
 7  9 2  
 5   7   
    457  
   1   3 
  1    68
  85   1 
 9    4  
"

data <- "
  4 2   9
58 6   7 
   5  8  
 751 8  4
    4    
8  2 765 
  7  2   
 2   1 68
1   3 9  
"

data <- "
 3  8 2  
   5   8 
6 71     
 4      2
 124 967 
9      4 
     35 4
 2   6   
  3 2  6 
"

data <- "
  45   81
  2 6 5 4
 9   1   
4  9  8  
 89 2 45 
  5  8  9
   6   2 
1 7 9 3  
35   41  
"

# Neljä tähteä

data <- "
   23 9  
     5 6 
9  76  5 
  6   38 
4  523  7
 39   4  
 2  94  8
 4 3     
  5 72   
"

if(!is.null(userdata)) data <- userdata

hypotheses <- data.frame(
	SudRow = rep(rep(1:9, each = 9), times = 9),
	SudCol = rep(rep(1:9, each = 9), each = 9),
	Hypothesis = rep(rep(1:9, times = 9), times = 9),
	Result = TRUE
)

SudMake <- function(hypotheses, data) {

	hypotheses$SudRow <- as.numeric(hypotheses$SudRow)
	hypotheses$SudCol <- as.numeric(hypotheses$SudCol)
	hypotheses$SudCell = hypotheses$SudRow + (hypotheses$SudCol - 1) * 9
	hypotheses$SudArea = ceiling(hypotheses$SudRow / 3) + (ceiling(hypotheses$SudCol / 3) - 1) * 3
	hypotheses$Result <- as.logical(hypotheses$Result)

	data <- data.frame(
		SudRow = rep(1:9, each = 9), 
		SudCol = rep(1:9, times = 9), 
		dataResult = gsub(" ", "", strsplit(gsub("\n", "", data), split = "")[[1]])
	)

#	oprint(tapply(data["Hypothesis"], data[c("SudRow", "SudCol")], function(x) paste(x, sep = "", collapse = "")))

	out <- merge(hypotheses, data)
	out$dataResult <- as.character(out$dataResult)
	out$Result <- ifelse(out$dataResult == "", out$Result, (as.character(out$dataResult) == as.character(out$Hypothesis)))
	return(out)
}

ykköskarsinta <- function(out2, condition) {

		valinta <- as.data.frame(as.table(tapply(out2$Result, out2[c("Hypothesis", condition)], sum) == 1)) # haetaan ehto-hypoteesikombinaatiot
		valinta <- valinta[valinta$Freq, colnames(valinta) != "Freq"] # rajataan ainoisiin ratkaisuihin
		valinta <- merge(valinta, out2) # yhdistetään SudCell-tietoon
		valinta <- valinta[valinta$Result , ] # poistetaan turhat
#print(valinta$SudCell)
		out2[out2$SudCell %in% valinta$SudCell , ]$Result <- FALSE #hylätään aluksi kaikki hypoteesit ykkösruuduista
#print(paste(out2$Hypothesis, out2$SudCell) %in% paste(valinta$Hypothesis, valinta$SudCell))
		out2[paste(out2$Hypothesis, out2$SudCell) %in% paste(valinta$Hypothesis, valinta$SudCell) , ]$Result <- TRUE # palautetaan oikea vastaus ykkösruutuihin
		return(out2)
	}

SudSolve <- function(sudoku, verbose = FALSE) {

	iteraatio <- 1
	repeat{
		if(verbose) {print(SudShow(sudoku))}
		samerow <- matrix(sudoku$SudRow, nrow = nrow(sudoku), ncol = nrow(sudoku)) # rule6
		samerow <- samerow == t(samerow)

		samecol <- matrix(sudoku$SudCol, nrow = nrow(sudoku), ncol = nrow(sudoku))
		samecol <- samecol == t(samecol)

		samearea <- matrix(sudoku$SudArea, nrow = nrow(sudoku), ncol = nrow(sudoku))
		samearea <- samearea == t(samearea)

		samehypo <- matrix(sudoku$Hypothesis, nrow = nrow(sudoku), ncol = nrow(sudoku))
		samehypo <- samehypo == t(samehypo)

		samecell <- matrix(sudoku$SudCell, nrow = nrow(sudoku), ncol = nrow(sudoku))
		samecell <- samecell == t(samecell)

		rule1 <- ! (samerow & samehypo)
		rule2 <- ! (samecol & samehypo)
		rule3 <- ! (samearea & samehypo)
	#	rule4 <- ifelse(samecell, NA, TRUE)

		temp <- sudoku$Result & rule1 & rule2 & rule3 #& rule4
		temp <- ifelse(samecell, NA, temp) # Tämän säännön täytyy ajaa yli kaikkien muiden.

		temp2 <- apply(temp, 2, function(x) tapply(x, sudoku$SudCell, any)) # rule5

		temp3 <- apply(temp2, 2, function(x) { # rule7
			x <- ifelse(is.na(x), TRUE, x)
			return(all(x))
		})
		
		sudtemp <- sudoku
		sudtemp$Result <- ifelse(sudtemp$Result, temp3, sudtemp$Result)
		
		if(any(tapply(sudtemp$Result, sudtemp["SudCell"], sum) == 0)) {
			if(verbose) {warning("Implausible solution\n")}
			return(sudtemp)
		}
		
		sudtemp <- ykköskarsinta(sudtemp, "SudCol") # rule8
		sudtemp <- ykköskarsinta(sudtemp, "SudRow") # rule9
		sudtemp <- ykköskarsinta(sudtemp, "SudArea") # rule10
		iteraatio <- iteraatio + 1
		test <- all(sudtemp$Result == sudoku$Result)
		if(verbose) cat("Has the solution changed during iteration", iteraatio, "?", !test, "\n")
		sudoku <- sudtemp
		if(test | iteraatio > 15) {break}
	}
	return(sudoku)
}

SudShow <- function(sudoku) {

	if(class(sudoku) == "data.frame") {sudoku <- list(sudoku)}
	for(i in 1:length(sudoku)) {
		out <- sudoku[[i]]
		out <- out[out$Result , ]
		out <- tapply(out$Hypothesis, out[c("SudRow", "SudCol")], function(x) paste(x, sep = "", collapse = ""))
		if(exists("oprint")) oprint(out) else print(out)
	}	
}

SudGuess <- function(sudoku, cellguess, verbose = FALSE) {
	if(class(sudoku) == "data.frame") {sudokulist <- list(sudoku)} else {sudokulist <- sudoku}
	for(i in cellguess) { # Jokainen lisäruutu erikseen
		if(verbose) cat("Guessing at cell", i, "\n")
		lap <- 0
		templist <- list()
		for(j in 1:length(sudokulist)) { # Jokainen sudokuskenaario erikseen.
			sudo <- sudokulist[[j]]
			nscen <- nrow(sudo[sudo$SudCell %in% i & sudo$Result , ])
			for(k in 1:nscen) { # Jokainen valitun ruudun mahdollinen vaihtoehto erikseen.
				faagi <- rep(FALSE, nscen)
				faagi[k] <- TRUE
				temp <- sudo
				temp[temp$SudCell %in% i & temp$Result, "Result"] <- faagi
				temp <- SudSolve(temp)
				if(verbose) {print(SudShow(temp))}
				if(!any(tapply(temp$Result, temp["SudCell"], sum) == 0)) {templist[[lap + k]] <- temp}
			}
			lap <- lap + k
		}
		templist <- templist[!sapply(templist, is.null)]
		sudokulist <- templist
	}
	if(length(sudokulist) == 1) {sudokulist <- sudokulist[[1]]}
	return(sudokulist)
}

example <- SudMake(hypotheses, data)
SudShow(example)
example <- SudSolve(example, verbose = verbose)
SudShow(example)

if(!is.null(cellguess)) {
	example <- SudGuess(example, cellguess, verbose = verbose)
	SudShow(example)
}

Keywords

References

Related files