3 files changed, 40 insertions, 365 deletions
diff --git a/Readme.md b/Readme.md
index 268bb15..f55abbe 100644
--- a/Readme.md
+++ b/Readme.md
@@ -1,77 +1,37 @@
-# FISTA Reconstruction (Daniil Kazantsev)
+# CCPi-Regularisation Toolkit (CCPi-RUT)
 
-# General Description
+**Iterative image reconstruction (IIR) methods normally require regularisation to stabilise convergence and make the reconstruction problem more well-posed. 
+CCPi-RUT is released under Apache 2.0 license and consists of 2D/3D regularisation methods which frequently used for IIR. 
+The core modules are written in C-OMP and CUDA languages and wrappers for Matlab and Python are provided.** 
 
-Software for reconstructing 2D/3D x-ray and neutron tomography datasets. The data can be undersampled, of poor contrast, noisy, and contain various artifacts.  This is Matlab and C-omp implementation of iterative model-based algorithms with unconventional data fidelities and with various regularization terms (TV and higher-order LLT). The main optimization problem is solved using FISTA framework [1]. The presented algorithms are FBP, FISTA (Least-Squares), FISTA-LS-TV(LS-Total Variation), FISTA-GH(Group-Huber)-TV, and FISTA-Student-TV.  More information about the algorithms can be found in papers [2,3]. Please cite [2] if the algorithms or data used in your research. 
-
-## Requirements/Dependencies: 
+## Prerequisites: 
 
  * MATLAB (www.mathworks.com/products/matlab/)
- * ASTRA toolbox (https://github.com/astra-toolbox/astra-toolbox)
- * C/C++ compiler (run compile_mex in Matlab first to compile C-functions)
-
-## Package Contents:
-
-### Demos:
- * Demo1: Synthetic phantom reconstruction with noise, stripes and zingers
- * DemoRD1: Real data reconstruction from  sino_basalt.mat (see Data)
- * DemoRD2: Real data reconstruction from  sino3D_dendrites.mat (see Data)
-
-### Data:
-  * phantom_bone512.mat - a synthetic 2D phantom obtained from high-resolution x-ray scan
-  * sino_basalt.mat – 2D neutron (PSI) tomography sinogram (slice across a pack of basalt beads)
-  * sino3D_dendrites.mat – 3D (20 slices) x-ray synchrotron dataset (DLS) of growing dendrites
+ * Python (ver. 3.5); Cython
+ * C/C++ compilers
+ * nvcc compilers
 
-### Main modules:
+## Package Contents :
 
-  * FISTA_REC.m – Matlab function to perform FISTA-based reconstruction
-  * FGP_TV.c – C-omp function to solve for the weighted TV term using FGP
-  * SplitBregman_TV.c – C-omp function to solve for the weighted TV term using Split-Bregman
-  * LLT_model.c – C-omp function to solve for the weighted LLT [3] term using explicit scheme
-  * studentst.m – Matlab function to calculate Students t penalty with 'auto-tuning'
+  * 1. Rudin-Osher-Fatemi Total Variation (explicit PDE minimisation scheme) 2D/3D GPU/CPU [1]
+  * 2. Fast-Gradient-Projection Total Variation 2D/3D GPU/CPU [2]
 
-### Supplementary:
-
- * zing_rings_add.m Matlab script to generate proj. data, add noise, zingers and stripes
-  * my_red_yellowMAP.mat – nice colormap for the phantom
- * RMSE.m – Matlab function to calculate Root Mean Square Error
- 
-### Practical advices:
- * Full 3D reconstruction provides much better results than 2D. In the case of ring artifacts, 3D is almost necessary
- * Depending on data it is better to use TV-LLT combination in order to achieve piecewise-smooth solution. The  DemoRD2 shows one possible example when smoother surfaces required.
- * L (Lipshitz constant) if tweaked can lead to faster convergence than automatic values
- * Students’t penalty is generally quite stable in practice, however some tweaking of L might require for the real data
- * You can choose between SplitBregman-TV and FISTA-TV modules. The former is slower but requires less  memory (for 3D volume U it can take up to 6 x U), the latter is faster but can take more memory (for 3D volume U it can take up to 11 x U). Also the SplitBregman is quite good in improving contrast. 
-
- ### Compiling:
-
- It is very important to check that OMP support is activated for the optimal use of all CPU cores.
-#### Windows
- In Windows enable OMP support, e.g.:
-edit C:\Users\Username\AppData\Roaming\MathWorks\MATLAB\R2012b\mexopts.bat
-In the file, edit 'OPTIMFLAGS' line as shown below (add /openmp entry at the end):
-OPTIMFLAGS = /O2 /Oy- /DNDEBUG /openmp
-define and set the number of CPU cores
-PROC = feature('numCores');
-PROCS = num2str(PROC);
-setenv('OMP_NUM_THREADS', PROCS); % you can check how many CPU's by: getenv
-OMP_NUM_THREADS
+### Demos:
+ * ---
 
-#### Linux
-In Linux in terminal: export OMP_NUM_THREADS = (the numer of CPU cores)
-then run Matlab as normal
-to compile with OMP support from Matlab in Windows run:
-mex *.cpp CFLAGS="\$CFLAGS -fopenmp -Wall" LDFLAGS="\$LDFLAGS -fopenmp"
-to compile with OMP support from Matlab in Linux ->rename *cpp to *c and compile:
-mex *.c CFLAGS="\$CFLAGS -fopenmp -Wall" LDFLAGS="\$LDFLAGS -fopenmp"
+### Installation:
 
+#### Python (conda-build)
+```
+     export CIL_VERSION=0.9.2
+```
+#### Matlab 
 
 ### References:
-[1] Beck, A. and Teboulle, M., 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences,2(1), pp.183-202.
-[2] Kazantsev, D., Bleichrodt, F., van Leeuwen, T., Kaestner, A., Withers, P.J., Batenburg, K.J., 2017. A novel tomographic reconstruction method based on the robust Student's t function for suppressing data outliers, IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING (to appear) 
+[1] Rudin, L.I., Osher, S. and Fatemi, E., 1992. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4), pp.259-268.
+[2] Beck, A. and Teboulle, M., 2009. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing, 18(11), pp.2419-2434.
 [3] Lysaker, M., Lundervold, A. and Tai, X.C., 2003. Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Transactions on image processing, 12(12), pp.1579-1590.
-[4] Paleo, P. and Mirone, A., 2015. Ring artifacts correction in compressed sensing tomographic reconstruction. Journal of synchrotron radiation, 22(5), pp.1268-1278.
-[5] Sidky, E.Y., Jakob, H.J. and Pan, X., 2012. Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle-Pock algorithm. Physics in medicine and biology, 57(10), p.3065.
 
-D. Kazantsev / Harwell Campus / 16.03.17
+### Acknowledgment:
+CCPi-RUT is a product of the [CCPi project](https://pages.github.com/)
 any questions/comments please e-mail to daniil.kazantsev@manchester.ac.uk
diff --git a/Wrappers/Python/src/cpu_regularizers.pyx b/Wrappers/Python/src/cpu_regularizers.pyx
index 60e8627..f993e54 100644
--- a/Wrappers/Python/src/cpu_regularizers.pyx
+++ b/Wrappers/Python/src/cpu_regularizers.pyx
@@ -21,6 +21,10 @@ cimport numpy as np
 cdef extern float TV_ROF_CPU_main(float *Input, float *Output, float lambdaPar, int iterationsNumb, float tau, int dimX, int dimY, int dimZ);
 cdef extern float TV_FGP_CPU_main(float *Input, float *Output, float lambdaPar, int iterationsNumb, float epsil, int methodTV, int nonneg, int printM, int dimX, int dimY, int dimZ);
 
+
+#****************************************************************#
+#********************** Total-variation ROF *********************#
+#****************************************************************#
 def TV_ROF_CPU(inputData, regularization_parameter, iterationsNumb, marching_step_parameter):
     if inputData.ndim == 2:
         return TV_ROF_2D(inputData, regularization_parameter, iterationsNumb, marching_step_parameter)
@@ -58,8 +62,12 @@ def TV_ROF_3D(np.ndarray[np.float32_t, ndim=3, mode="c"] inputData,
     # Run ROF iterations for 3D data 
     TV_ROF_CPU_main(&inputData[0,0,0], &outputData[0,0,0], regularization_parameter, iterationsNumb, marching_step_parameter, dims[0], dims[1], dims[2])
 
-    return outputData    
-                     
+    return outputData
+
+#****************************************************************#
+#********************** Total-variation FGP *********************#
+#****************************************************************#
+#******** Total-variation Fast-Gradient-Projection (FGP)*********#
 def TV_FGP_CPU(inputData, regularization_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, printM):
     if inputData.ndim == 2:
         return TV_FGP_2D(inputData, regularization_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, printM)
diff --git a/Wrappers/Python/src/gpu_regularizers.pyx b/Wrappers/Python/src/gpu_regularizers.pyx
index f96156a..a44bd1d 100644
--- a/Wrappers/Python/src/gpu_regularizers.pyx
+++ b/Wrappers/Python/src/gpu_regularizers.pyx
@@ -18,59 +18,9 @@ import cython
 import numpy as np
 cimport numpy as np
 
-
-cdef extern void Diff4th_GPU_kernel(float* A, float* B, int N, int M, int Z,
-                            float sigma, int iter, float tau, float lambdaf);
-cdef extern void NLM_GPU_kernel(float *A, float* B, float *Eucl_Vec, 
-                                int N, int M,  int Z, int dimension, 
-                                int SearchW, int SimilW, 
-                                int SearchW_real, float denh2, float lambdaf);
 cdef extern void TV_ROF_GPU_main(float* Input, float* Output, float lambdaPar, int iter, float tau, int N, int M, int Z);
 cdef extern void TV_FGP_GPU_main(float *Input, float *Output, float lambdaPar, int iter, float epsil, int methodTV, int nonneg, int printM, int N, int M, int Z);
 
-# padding function
-cdef extern float pad_crop(float *A, float *Ap, 
-                           int OldSizeX, int OldSizeY, int OldSizeZ, 
-                           int NewSizeX, int NewSizeY, int NewSizeZ, 
-                           int padXY, int switchpad_crop);
-                           
-#Diffusion 4th order regularizer
-def Diff4thHajiaboli(inputData, 
-                     edge_preserv_parameter, 
-                     iterations, 
-                     time_marching_parameter,
-                     regularization_parameter):
-    if inputData.ndim == 2:
-        return Diff4thHajiaboli2D(inputData,  
-                     edge_preserv_parameter, 
-                     iterations, 
-                     time_marching_parameter,
-                     regularization_parameter)
-    elif inputData.ndim == 3:
-        return Diff4thHajiaboli3D(inputData,  
-                     edge_preserv_parameter, 
-                     iterations, 
-                     time_marching_parameter,
-                     regularization_parameter)
-# patch-based nonlocal regularization
-def NML(inputData, 
-                     SearchW_real, 
-                     SimilW, 
-                     h,
-                     lambdaf):
-    if inputData.ndim == 2:
-        return NML2D(inputData, 
-                     SearchW_real, 
-                     SimilW, 
-                     h,
-                     lambdaf)
-    elif inputData.ndim == 3:
-        return NML3D(inputData, 
-                     SearchW_real, 
-                     SimilW, 
-                     h,
-                     lambdaf)
-                     
 # Total-variation Rudin-Osher-Fatemi (ROF)
 def TV_ROF_GPU(inputData,
                      regularization_parameter,
@@ -111,255 +61,10 @@ def TV_FGP_GPU(inputData,
                      methodTV,
                      nonneg,
                      printM)
+                     
+#****************************************************************#
+#********************** Total-variation ROF *********************#
 #****************************************************************#
-def Diff4thHajiaboli2D(np.ndarray[np.float32_t, ndim=2, mode="c"] inputData, 
-                     float edge_preserv_parameter, 
-                     int iterations, 
-                     float time_marching_parameter,
-                     float regularization_parameter):
-    
-    cdef long dims[2]
-    dims[0] = inputData.shape[0]
-    dims[1] = inputData.shape[1]
-    N = dims[0] + 2;
-    M = dims[1] + 2;
-    
-    #A_L = (float*)mxGetData(mxCreateNumericMatrix(N, M, mxSINGLE_CLASS, mxREAL));
-    #B_L = (float*)mxGetData(mxCreateNumericMatrix(N, M, mxSINGLE_CLASS, mxREAL));
-    cdef np.ndarray[np.float32_t, ndim=2, mode="c"] A_L = \
-		    np.zeros([N,M], dtype='float32')
-    cdef np.ndarray[np.float32_t, ndim=2, mode="c"] B_L = \
-		    np.zeros([N,M], dtype='float32')
-    cdef np.ndarray[np.float32_t, ndim=2, mode="c"] B = \
-		    np.zeros([dims[0],dims[1]], dtype='float32')
-    #A = inputData
-
-    # copy A to the bigger A_L with boundaries
-    #pragma omp parallel for shared(A_L, A) private(i,j)
-    cdef int i, j;
-    for i in range(N):
-        for j in range(M):
-            if (((i > 0) and (i < N-1)) and  ((j > 0) and (j < M-1))):
-                #A_L[i*M+j] = A[(i-1)*(dims[1])+(j-1)]
-                A_L[i][j] = inputData[i-1][j-1]
-        
-    # Running CUDA code here
-    #Diff4th_GPU_kernel(A_L, B_L, N, M, Z, (float)sigma, iter, (float)tau, lambda);    
-    Diff4th_GPU_kernel(
-            #<float*> A_L.data, <float*> B_L.data,
-            &A_L[0,0], &B_L[0,0], 
-                       N, M, 0, 
-                       edge_preserv_parameter,
-                       iterations , 
-                       time_marching_parameter, 
-                       regularization_parameter);
-    # copy the processed B_L to a smaller B
-    for i in range(N):
-        for j in range(M):
-            if (((i > 0) and (i < N-1)) and ((j > 0) and (j < M-1))):
-                B[i-1][j-1] = B_L[i][j]
-    ##pragma omp parallel for shared(B_L, B) private(i,j)
-    #for (i=0; i < N; i++) {
-    #    for (j=0; j < M; j++) {
-    #        if (((i > 0) && (i < N-1)) &&  ((j > 0) && (j < M-1)))   B[(i-1)*(dims[1])+(j-1)] = B_L[i*M+j];
-    #    }}
-     
-    return B
-
-def Diff4thHajiaboli3D(np.ndarray[np.float32_t, ndim=3, mode="c"] inputData, 
-                     float edge_preserv_parameter, 
-                     int iterations, 
-                     float time_marching_parameter,
-                     float regularization_parameter):
-    
-    cdef long dims[3]
-    dims[0] = inputData.shape[0]
-    dims[1] = inputData.shape[1]
-    dims[2] = inputData.shape[2]
-    N = dims[0] + 2
-    M = dims[1] + 2
-    Z = dims[2] + 2
-    
-    
-    #A_L = (float*)mxGetData(mxCreateNumericMatrix(N, M, mxSINGLE_CLASS, mxREAL));
-    #B_L = (float*)mxGetData(mxCreateNumericMatrix(N, M, mxSINGLE_CLASS, mxREAL));
-    cdef np.ndarray[np.float32_t, ndim=3, mode="c"] A_L = \
-		    np.zeros([N,M,Z], dtype='float32')
-    cdef np.ndarray[np.float32_t, ndim=3, mode="c"] B_L = \
-		    np.zeros([N,M,Z], dtype='float32')
-    cdef np.ndarray[np.float32_t, ndim=3, mode="c"] B = \
-		    np.zeros([dims[0],dims[1],dims[2]], dtype='float32')
-    #A = inputData
-    
-    # copy A to the bigger A_L with boundaries
-    #pragma omp parallel for shared(A_L, A) private(i,j)
-    cdef int i, j, k;
-    for i in range(N):
-        for j in range(M):
-            for k in range(Z):
-                if (((i > 0) and (i < N-1)) and \
-                    ((j > 0) and (j < M-1)) and \
-                    ((k > 0) and (k < Z-1))):
-                        A_L[i][j][k] = inputData[i-1][j-1][k-1];
-        
-    # Running CUDA code here
-    #Diff4th_GPU_kernel(A_L, B_L, N, M, Z, (float)sigma, iter, (float)tau, lambda);    
-    Diff4th_GPU_kernel(
-            #<float*> A_L.data, <float*> B_L.data,
-            &A_L[0,0,0], &B_L[0,0,0], 
-                       N, M, Z, 
-                       edge_preserv_parameter,
-                       iterations , 
-                       time_marching_parameter, 
-                       regularization_parameter);
-    # copy the processed B_L to a smaller B
-    for i in range(N):
-        for j in range(M):
-            for k in range(Z):
-                if (((i > 0) and (i < N-1)) and \
-                    ((j > 0) and (j < M-1)) and \
-                    ((k > 0) and (k < Z-1))):
-                    B[i-1][j-1][k-1] = B_L[i][j][k]
-    
-     
-    return B
-
-
-def NML2D(np.ndarray[np.float32_t, ndim=2, mode="c"] inputData, 
-                     SearchW_real, 
-                     SimilW, 
-                     h,
-                     lambdaf):    
-    N = inputData.shape[0]
-    M = inputData.shape[1]      
-    Z = 0
-    numdims = inputData.ndim
-     
-    if h < 0:
-        raise ValueError('Parameter for the PB filtering function must be > 0') 
-             
-    SearchW = SearchW_real + 2*SimilW;
-    
-    SearchW_full = 2*SearchW + 1; #/* the full searching window  size */
-    SimilW_full = 2*SimilW + 1;   #/* the full similarity window  size */
-    h2 = h*h;
-    
-    padXY = SearchW + 2*SimilW; #/* padding sizes */
-    newsizeX = N + 2*(padXY); #/* the X size of the padded array */
-    newsizeY = M + 2*(padXY); #/* the Y size of the padded array */
-    #newsizeZ = Z + 2*(padXY); #/* the Z size of the padded array */
-    
-    #output
-    cdef np.ndarray[np.float32_t, ndim=2, mode="c"] B = \
-		    np.zeros([N,M], dtype='float32')
-    #/*allocating memory for the padded arrays */
-    
-    cdef np.ndarray[np.float32_t, ndim=2, mode="c"] Ap = \
-		    np.zeros([newsizeX, newsizeY], dtype='float32')
-    cdef np.ndarray[np.float32_t, ndim=2, mode="c"] Bp = \
-		    np.zeros([newsizeX, newsizeY], dtype='float32')
-    cdef np.ndarray[np.float32_t, ndim=1, mode="c"] Eucl_Vec = \
-		    np.zeros([SimilW_full*SimilW_full], dtype='float32')
-    
-    #/*Gaussian kernel */
-    cdef int count, i_n, j_n;
-    cdef float val;
-    count = 0
-    for i_n in range (-SimilW, SimilW +1):
-        for j_n in range(-SimilW, SimilW +1):
-            val = (float)(i_n*i_n + j_n*j_n)/(2*SimilW*SimilW)
-            Eucl_Vec[count] = np.exp(-val)
-            count = count + 1
-    
-    #/*Perform padding of image A to the size of [newsizeX * newsizeY] */
-    switchpad_crop = 0; # /*padding*/
-    pad_crop(&inputData[0,0], &Ap[0,0], M, N, 0, newsizeY, newsizeX, 0, padXY,
-             switchpad_crop);
-    
-    #/* Do PB regularization with the padded array  */
-    NLM_GPU_kernel(&Ap[0,0], &Bp[0,0], &Eucl_Vec[0], newsizeY, newsizeX, 0,
-                   numdims, SearchW, SimilW, SearchW_real, 
-                   h2, lambdaf);
-    
-    switchpad_crop = 1; #/*cropping*/
-    pad_crop(&Bp[0,0], &B[0,0], M, N, 0, newsizeX, newsizeY, 0, padXY, 
-             switchpad_crop)
-    
-    return B
-
-def NML3D(np.ndarray[np.float32_t, ndim=3, mode="c"] inputData, 
-                     SearchW_real, 
-                     SimilW, 
-                     h,
-                     lambdaf):    
-    N = inputData.shape[0]
-    M = inputData.shape[1]      
-    Z = inputData.shape[2]     
-    numdims = inputData.ndim
-
-    if h < 0:
-        raise ValueError('Parameter for the PB filtering function must be > 0') 
-             
-    SearchW = SearchW_real + 2*SimilW;
-    
-    SearchW_full = 2*SearchW + 1; #/* the full searching window  size */
-    SimilW_full = 2*SimilW + 1;   #/* the full similarity window  size */
-    h2 = h*h;
-    
-    padXY = SearchW + 2*SimilW; #/* padding sizes */
-    newsizeX = N + 2*(padXY); #/* the X size of the padded array */
-    newsizeY = M + 2*(padXY); #/* the Y size of the padded array */
-    newsizeZ = Z + 2*(padXY); #/* the Z size of the padded array */
-    
-    #output
-    cdef np.ndarray[np.float32_t, ndim=3, mode="c"] B = \
-		    np.zeros([N,M,Z], dtype='float32')
-    #/*allocating memory for the padded arrays */
-    
-    cdef np.ndarray[np.float32_t, ndim=3, mode="c"] Ap = \
-		    np.zeros([newsizeX, newsizeY, newsizeZ], dtype='float32')
-    cdef np.ndarray[np.float32_t, ndim=3, mode="c"] Bp = \
-		    np.zeros([newsizeX, newsizeY, newsizeZ], dtype='float32')
-    cdef np.ndarray[np.float32_t, ndim=1, mode="c"] Eucl_Vec = \
-		    np.zeros([SimilW_full*SimilW_full*SimilW_full],
-				    dtype='float32')
-    
-    
-    #/*Gaussian kernel */
-    cdef int count, i_n, j_n, k_n;
-    cdef float val;
-    count = 0
-    for i_n in range (-SimilW, SimilW +1):
-        for j_n in range(-SimilW, SimilW +1):
-            for k_n in range(-SimilW, SimilW+1):
-                val = (i_n*i_n + j_n*j_n + k_n*k_n)/(2*SimilW*SimilW*SimilW)
-                Eucl_Vec[count] = np.exp(-val)
-                count = count + 1
-    
-    #/*Perform padding of image A to the size of [newsizeX * newsizeY] */
-    switchpad_crop = 0; # /*padding*/
-    pad_crop(&inputData[0,0,0], &Ap[0,0,0], 
-             M, N, Z, 
-             newsizeY, newsizeX, newsizeZ, 
-             padXY,
-             switchpad_crop);
-    
-    #/* Do PB regularization with the padded array  */
-    NLM_GPU_kernel(&Ap[0,0,0], &Bp[0,0,0], &Eucl_Vec[0], 
-                   newsizeY, newsizeX, newsizeZ,
-                   numdims, SearchW, SimilW, SearchW_real, 
-                   h2, lambdaf);
-        
-    switchpad_crop = 1; #/*cropping*/
-    pad_crop(&Bp[0,0,0], &B[0,0,0],
-             M, N, Z, 
-             newsizeX, newsizeY, newsizeZ,
-             padXY, 
-             switchpad_crop)
-    
-    return B                   
-  
-# Total-variation ROF 
 def ROFTV2D(np.ndarray[np.float32_t, ndim=2, mode="c"] inputData, 
                      float regularization_parameter,
                      int iterations, 
@@ -404,8 +109,10 @@ def ROFTV3D(np.ndarray[np.float32_t, ndim=3, mode="c"] inputData,
                        dims[0], dims[1], dims[2]);   
      
     return outputData
-
-# Total-variation Fast-Gradient-Projection (FGP)
+#****************************************************************#
+#********************** Total-variation FGP *********************#
+#****************************************************************#
+#******** Total-variation Fast-Gradient-Projection (FGP)*********#
 def FGPTV2D(np.ndarray[np.float32_t, ndim=2, mode="c"] inputData, 
                      float regularization_parameter,
                      int iterations,