31 files changed, 1867 insertions, 274 deletions
diff --git a/Readme.md b/Readme.md
index 7a33fb3..4d8e7ca 100644
--- a/Readme.md
+++ b/Readme.md
@@ -24,11 +24,12 @@
 1. Rudin-Osher-Fatemi (ROF) Total Variation (explicit PDE minimisation scheme) **2D/3D CPU/GPU** (Ref. *1*)
 2. Fast-Gradient-Projection (FGP) Total Variation **2D/3D CPU/GPU** (Ref. *2*)
 3. Split-Bregman (SB) Total Variation **2D/3D CPU/GPU** (Ref. *5*)
-4. Total Generalised Variation (TGV) model for higher-order regularisation **2D/3D CPU/GPU** (Ref. *6*)
-5. Linear and nonlinear diffusion (explicit PDE minimisation scheme) **2D/3D CPU/GPU** (Ref. *8*)
-6. Anisotropic Fourth-Order Diffusion (explicit PDE minimisation) **2D/3D CPU/GPU** (Ref. *9*)
-7. A joint ROF-LLT (Lysaker-Lundervold-Tai) model for higher-order regularisation **2D/3D CPU/GPU** (Ref. *10,11*)
-8. Nonlocal Total Variation regularisation (GS fixed point iteration) **2D CPU/GPU** (Ref. *12*)
+4. Primal-Dual (PD) Total Variation **2D/3D CPU/GPU** (Ref. *13*)
+5. Total Generalised Variation (TGV) model for higher-order regularisation **2D/3D CPU/GPU** (Ref. *6,13*)
+6. Linear and nonlinear diffusion (explicit PDE minimisation scheme) **2D/3D CPU/GPU** (Ref. *8*)
+7. Anisotropic Fourth-Order Diffusion (explicit PDE minimisation) **2D/3D CPU/GPU** (Ref. *9*)
+8. A joint ROF-LLT (Lysaker-Lundervold-Tai) model for higher-order regularisation **2D/3D CPU/GPU** (Ref. *10,11*)
+9. Nonlocal Total Variation regularisation (GS fixed point iteration) **2D CPU/GPU** (Ref. *12*)
 
 ### Multi-channel (vectorial):
 1. Fast-Gradient-Projection (FGP) Directional Total Variation **2D/3D CPU/GPU** (Ref. *3,4,2*)
@@ -145,29 +146,32 @@ addpath(/path/to/library);
 ```
 
 ### References to implemented methods:
-1. [Rudin, L.I., Osher, S. and Fatemi, E., 1992. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4), pp.259-268.](https://www.sciencedirect.com/science/article/pii/016727899290242F)
+1. [Rudin, L.I., Osher, S. and Fatemi, E., 1992. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4)](https://www.sciencedirect.com/science/article/pii/016727899290242F)
 
-2. [Beck, A. and Teboulle, M., 2009. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing, 18(11), pp.2419-2434.](https://doi.org/10.1109/TIP.2009.2028250)
+2. [Beck, A. and Teboulle, M., 2009. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing, 18(11)](https://doi.org/10.1109/TIP.2009.2028250)
 
-3. [Ehrhardt, M.J. and Betcke, M.M., 2016. Multicontrast MRI reconstruction with structure-guided total variation. SIAM Journal on Imaging Sciences, 9(3), pp.1084-1106.](https://doi.org/10.1137/15M1047325)
+3. [Ehrhardt, M.J. and Betcke, M.M., 2016. Multicontrast MRI reconstruction with structure-guided total variation. SIAM Journal on Imaging Sciences, 9(3)](https://doi.org/10.1137/15M1047325)
 
 4. [Kazantsev, D., Jørgensen, J.S., Andersen, M., Lionheart, W.R., Lee, P.D. and Withers, P.J., 2018. Joint image reconstruction method with correlative multi-channel prior for X-ray spectral computed tomography. Inverse Problems, 34(6)](https://doi.org/10.1088/1361-6420/aaba86) **Results can be reproduced using the following** [SOFTWARE](https://github.com/dkazanc/multi-channel-X-ray-CT)
 
-5. [Goldstein, T. and Osher, S., 2009. The split Bregman method for L1-regularized problems. SIAM journal on imaging sciences, 2(2), pp.323-343.](https://doi.org/10.1137/080725891)
+5. [Goldstein, T. and Osher, S., 2009. The split Bregman method for L1-regularized problems. SIAM journal on imaging sciences, 2(2)](https://doi.org/10.1137/080725891)
 
-6. [Bredies, K., Kunisch, K. and Pock, T., 2010. Total generalized variation. SIAM Journal on Imaging Sciences, 3(3), pp.492-526.](https://doi.org/10.1137/090769521)
+6. [Bredies, K., Kunisch, K. and Pock, T., 2010. Total generalized variation. SIAM Journal on Imaging Sciences, 3(3)](https://doi.org/10.1137/090769521)
 
-7. [Duran, J., Moeller, M., Sbert, C. and Cremers, D., 2016. Collaborative total variation: a general framework for vectorial TV models. SIAM Journal on Imaging Sciences, 9(1), pp.116-151.](https://doi.org/10.1137/15M102873X)
+7. [Duran, J., Moeller, M., Sbert, C. and Cremers, D., 2016. Collaborative total variation: a general framework for vectorial TV models. SIAM Journal on Imaging Sciences, 9(1)](https://doi.org/10.1137/15M102873X)
 
-8. [Black, M.J., Sapiro, G., Marimont, D.H. and Heeger, D., 1998. Robust anisotropic diffusion. IEEE Transactions on image processing, 7(3), pp.421-432.](https://doi.org/10.1109/83.661192)
+8. [Black, M.J., Sapiro, G., Marimont, D.H. and Heeger, D., 1998. Robust anisotropic diffusion. IEEE Transactions on image processing, 7(3)](https://doi.org/10.1109/83.661192)
 
-9. [Hajiaboli, M.R., 2011. An anisotropic fourth-order diffusion filter for image noise removal. International Journal of Computer Vision, 92(2), pp.177-191.](https://doi.org/10.1007/s11263-010-0330-1)
+9. [Hajiaboli, M.R., 2011. An anisotropic fourth-order diffusion filter for image noise removal. International Journal of Computer Vision, 92(2)](https://doi.org/10.1007/s11263-010-0330-1)
 
-10. [Lysaker, M., Lundervold, A. and Tai, X.C., 2003. Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Transactions on image processing, 12(12), pp.1579-1590.](https://doi.org/10.1109/TIP.2003.819229)
+10. [Lysaker, M., Lundervold, A. and Tai, X.C., 2003. Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Transactions on image processing, 12(12)](https://doi.org/10.1109/TIP.2003.819229)
 
-11. [Kazantsev, D., Guo, E., Phillion, A.B., Withers, P.J. and Lee, P.D., 2017. Model-based iterative reconstruction using higher-order regularization of dynamic synchrotron data. Measurement Science and Technology, 28(9), p.094004.](https://doi.org/10.1088/1361-6501/aa7fa8)
+11. [Kazantsev, D., Guo, E., Phillion, A.B., Withers, P.J. and Lee, P.D., 2017. Model-based iterative reconstruction using higher-order regularization of dynamic synchrotron data. Measurement Science and Technology, 28(9)](https://doi.org/10.1088/1361-6501/aa7fa8)
+
+12. [Abderrahim E., Lezoray O. and Bougleux S. 2008. Nonlocal discrete regularization on weighted graphs: a framework for image and manifold processing. IEEE Trans. Image Processing 17(7), pp. 1047-1060.](https://ieeexplore.ieee.org/document/4526700)
+
+13. [Chambolle, A. and Pock, T., 2010. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision 40(1)](https://doi.org/10.1007/s10851-010-0251-1)
 
-12. [Abderrahim E., Lezoray O. and Bougleux S. 2008. Nonlocal discrete regularization on weighted graphs: a framework for image and manifold processing." IEEE Trans. Image Processing 17(7), pp. 1047-1060.](https://ieeexplore.ieee.org/document/4526700)
 
 ### References to software (please cite if used):
 * [Kazantsev, D., Pasca, E., Turner, M.J. and Withers, P.J., 2019. CCPi-Regularisation toolkit for computed tomographic image reconstruction with proximal splitting algorithms. SoftwareX, 9, pp.317-323.](https://www.sciencedirect.com/science/article/pii/S2352711018301912)
diff --git a/build/run.sh b/build/run.sh
index 285476b..f0b3d3e 100755
--- a/build/run.sh
+++ b/build/run.sh
@@ -6,7 +6,7 @@ rm -r ../build_proj
 mkdir ../build_proj
 cd ../build_proj/
 #make clean
-export CIL_VERSION=19.09
+export CIL_VERSION=19.10
 # install Python modules without CUDA
 #cmake ../ -DBUILD_PYTHON_WRAPPER=ON -DBUILD_MATLAB_WRAPPER=OFF -DBUILD_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./install
 # install Python modules with CUDA
diff --git a/demos/Matlab_demos/demoMatlab_3Ddenoise.m b/demos/Matlab_demos/demoMatlab_3Ddenoise.m
index f018327..b7f92cb 100644
--- a/demos/Matlab_demos/demoMatlab_3Ddenoise.m
+++ b/demos/Matlab_demos/demoMatlab_3Ddenoise.m
@@ -62,6 +62,25 @@ fprintf('Denoise a volume using the FGP-TV model (GPU) \n');
 % fprintf('%s %f \n', 'RMSE error for FGP-TV is:', rmse_fgpG);
 % figure; imshow(u_fgpG(:,:,7), [0 1]); title('FGP-TV denoised volume (GPU)');
 %%
+fprintf('Denoise a volume using the PD-TV model (CPU) \n');
+lambda_reg = 0.03; % regularsation parameter for all methods
+iter_pd = 300; % number of FGP iterations
+epsil_tol =  0.0; % tolerance
+tic; [u_pd,infovec] = PD_TV(single(vol3D), lambda_reg, iter_pd, epsil_tol); toc; 
+energyfunc_val_fgp = TV_energy(single(u_pd),single(vol3D),lambda_reg, 1); % get energy function value
+rmse_pd = (RMSE(Ideal3D(:),u_pd(:)));
+fprintf('%s %f \n', 'RMSE error for PD-TV is:', rmse_pd);
+figure; imshow(u_pd(:,:,7), [0 1]); title('PD-TV denoised volume (CPU)');
+%%
+% fprintf('Denoise a volume using the PD-TV model (GPU) \n');
+% lambda_reg = 0.03; % regularsation parameter for all methods
+% iter_pd = 300; % number of FGP iterations
+% epsil_tol =  0.0; % tolerance
+% tic; u_pdG = PD_TV_GPU(single(vol3D), lambda_reg, iter_pd, epsil_tol); toc; 
+% rmse_pdG = (RMSE(Ideal3D(:),u_pdG(:)));
+% fprintf('%s %f \n', 'RMSE error for PD-TV is:', rmse_pdG);
+% figure; imshow(u_pdG(:,:,7), [0 1]); title('PD-TV denoised volume (GPU)');
+%%
 fprintf('Denoise a volume using the SB-TV model (CPU) \n');
 iter_sb = 150; % number of SB iterations
 epsil_tol =  0.0; % tolerance
diff --git a/demos/Matlab_demos/demoMatlab_denoise.m b/demos/Matlab_demos/demoMatlab_denoise.m
index b50eaf5..3d93cb6 100644
--- a/demos/Matlab_demos/demoMatlab_denoise.m
+++ b/demos/Matlab_demos/demoMatlab_denoise.m
@@ -46,6 +46,22 @@ figure; imshow(u_fgp, [0 1]); title('FGP-TV denoised image (CPU)');
 % tic; u_fgpG = FGP_TV_GPU(single(u0), lambda_reg, iter_fgp, epsil_tol); toc; 
 % figure; imshow(u_fgpG, [0 1]); title('FGP-TV denoised image (GPU)');
 %%
+fprintf('Denoise using the PD-TV model (CPU) \n');
+lambda_reg = 0.03;
+iter_pd = 500; % number of FGP iterations
+epsil_tol =  0.0; % tolerance
+tic; [u_pd,infovec] = PD_TV(single(u0), lambda_reg, iter_pd, epsil_tol); toc; 
+energyfunc_val_pd = TV_energy(single(u_pd),single(u0),lambda_reg, 1); % get energy function value
+rmsePD = (RMSE(u_pd(:),Im(:)));
+fprintf('%s %f \n', 'RMSE error for PD-TV is:', rmsePD);
+[ssimval] = ssim(u_pd*255,single(Im)*255);
+fprintf('%s %f \n', 'MSSIM error for PD-TV is:', ssimval);
+figure; imshow(u_pd, [0 1]); title('PD-TV denoised image (CPU)');
+%%
+% fprintf('Denoise using the PD-TV model (GPU) \n');
+% tic; u_pdG = PD_TV_GPU(single(u0), lambda_reg, iter_pd, epsil_tol); toc; 
+% figure; imshow(u_pdG, [0 1]); title('PD-TV denoised image (GPU)');
+%%
 fprintf('Denoise using the SB-TV model (CPU) \n');
 lambda_reg = 0.03;
 iter_sb = 200; % number of SB iterations
diff --git a/demos/demo_cpu_regularisers.py b/demos/demo_cpu_regularisers.py
index 8655623..7d66b7f 100644
--- a/demos/demo_cpu_regularisers.py
+++ b/demos/demo_cpu_regularisers.py
@@ -12,7 +12,7 @@ import matplotlib.pyplot as plt
 import numpy as np
 import os
 import timeit
-from ccpi.filters.regularisers import ROF_TV, FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, TNV, NDF, Diff4th
+from ccpi.filters.regularisers import ROF_TV, FGP_TV, PD_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, TNV, NDF, Diff4th
 from ccpi.filters.regularisers import PatchSelect, NLTV
 from ccpi.supp.qualitymetrics import QualityTools
 ###############################################################################
@@ -129,8 +129,8 @@ imgplot = plt.imshow(u0,cmap="gray")
 pars = {'algorithm' : FGP_TV, \
         'input' : u0,\
         'regularisation_parameter':0.02, \
-        'number_of_iterations' :400 ,\
-        'tolerance_constant':1e-06,\
+        'number_of_iterations' :1500 ,\
+        'tolerance_constant':1e-08,\
         'methodTV': 0 ,\
         'nonneg': 0}
         
@@ -161,6 +161,55 @@ plt.title('{}'.format('CPU results'))
 
 #%%
 print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+print ("_______________PD-TV (2D)__________________")
+print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+
+## plot 
+fig = plt.figure()
+plt.suptitle('Performance of PD-TV regulariser using the CPU')
+a=fig.add_subplot(1,2,1)
+a.set_title('Noisy Image')
+imgplot = plt.imshow(u0,cmap="gray")
+
+# set parameters
+pars = {'algorithm' : PD_TV, \
+        'input' : u0,\
+        'regularisation_parameter':0.02, \
+        'number_of_iterations' :1500 ,\
+        'tolerance_constant':1e-08,\
+        'methodTV': 0 ,\
+        'nonneg': 1,
+        'lipschitz_const' : 8,
+        'tau' : 0.0025}
+        
+print ("#############PD TV CPU####################")
+start_time = timeit.default_timer()
+(pd_cpu,info_vec_cpu) = PD_TV(pars['input'], 
+              pars['regularisation_parameter'],
+              pars['number_of_iterations'],
+              pars['tolerance_constant'], 
+              pars['methodTV'],
+              pars['nonneg'],
+              pars['lipschitz_const'],
+              pars['tau'],'cpu')
+
+Qtools = QualityTools(Im, pd_cpu)
+pars['rmse'] = Qtools.rmse()
+
+txtstr = printParametersToString(pars)
+txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time)
+print (txtstr)
+a=fig.add_subplot(1,2,2)
+
+# these are matplotlib.patch.Patch properties
+props = dict(boxstyle='round', facecolor='wheat', alpha=0.75)
+# place a text box in upper left in axes coords
+a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14,
+         verticalalignment='top', bbox=props)
+imgplot = plt.imshow(pd_cpu, cmap="gray")
+plt.title('{}'.format('CPU results'))
+#%%
+print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
 print ("_______________SB-TV (2D)__________________")
 print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
 
diff --git a/demos/demo_cpu_regularisers3D.py b/demos/demo_cpu_regularisers3D.py
index fc1e8e6..cfdd2d4 100644
--- a/demos/demo_cpu_regularisers3D.py
+++ b/demos/demo_cpu_regularisers3D.py
@@ -12,7 +12,7 @@ import matplotlib.pyplot as plt
 import numpy as np
 import os
 import timeit
-from ccpi.filters.regularisers import ROF_TV, FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th
+from ccpi.filters.regularisers import ROF_TV, FGP_TV, PD_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th
 from ccpi.supp.qualitymetrics import QualityTools
 ###############################################################################
 def printParametersToString(pars):
@@ -30,8 +30,7 @@ def printParametersToString(pars):
         return txt
 ###############################################################################
 
-# filename = os.path.join( "data" ,"lena_gray_512.tif")
-filename = "/home/algol/Documents/DEV/CCPi-Regularisation-Toolkit/test/lena_gray_512.tif"
+filename = os.path.join( "data" ,"lena_gray_512.tif")
 
 # read image
 Im = plt.imread(filename)
@@ -169,7 +168,55 @@ a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14,
          verticalalignment='top', bbox=props)
 imgplot = plt.imshow(fgp_cpu3D[10,:,:], cmap="gray")
 plt.title('{}'.format('Recovered volume on the CPU using FGP-TV'))
+#%%
+print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+print ("_______________PD-TV (3D)__________________")
+print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+
+## plot 
+fig = plt.figure()
+plt.suptitle('Performance of PD-TV regulariser using the CPU')
+a=fig.add_subplot(1,2,1)
+a.set_title('Noisy Image')
+imgplot = plt.imshow(noisyVol[10,:,:],cmap="gray")
 
+# set parameters
+pars = {'algorithm' : PD_TV, \
+        'input' : noisyVol,\
+        'regularisation_parameter':0.02, \
+        'number_of_iterations' :1000 ,\
+        'tolerance_constant':1e-06,\
+        'methodTV': 0 ,\
+        'nonneg': 0,
+        'lipschitz_const' : 8,
+        'tau' : 0.0025}
+
+print ("#############FGP TV GPU####################")
+start_time = timeit.default_timer()
+(pd_cpu3D,info_vec_cpu) = PD_TV(pars['input'], 
+              pars['regularisation_parameter'],
+              pars['number_of_iterations'],
+              pars['tolerance_constant'], 
+              pars['methodTV'],
+              pars['nonneg'],
+              pars['lipschitz_const'],
+              pars['tau'],'cpu')
+             
+Qtools = QualityTools(idealVol, pd_cpu3D)
+pars['rmse'] = Qtools.rmse()
+
+txtstr = printParametersToString(pars)
+txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time)
+print (txtstr)
+a=fig.add_subplot(1,2,2)
+
+# these are matplotlib.patch.Patch properties
+props = dict(boxstyle='round', facecolor='wheat', alpha=0.75)
+# place a text box in upper left in axes coords
+a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14,
+         verticalalignment='top', bbox=props)
+imgplot = plt.imshow(pd_cpu3D[10,:,:], cmap="gray")
+plt.title('{}'.format('Recovered volume on the CPU using PD-TV'))
 #%%
 print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
 print ("_______________SB-TV (3D)_________________")
diff --git a/demos/demo_cpu_vs_gpu_regularisers.py b/demos/demo_cpu_vs_gpu_regularisers.py
index 21e3899..015dfc6 100644
--- a/demos/demo_cpu_vs_gpu_regularisers.py
+++ b/demos/demo_cpu_vs_gpu_regularisers.py
@@ -12,7 +12,7 @@ import matplotlib.pyplot as plt
 import numpy as np
 import os
 import timeit
-from ccpi.filters.regularisers import ROF_TV, FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th
+from ccpi.filters.regularisers import ROF_TV, FGP_TV, PD_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th
 from ccpi.filters.regularisers import PatchSelect
 from ccpi.supp.qualitymetrics import QualityTools
 ###############################################################################
@@ -220,6 +220,108 @@ if (diff_im.sum() > 1):
 else:
     print ("Arrays match")
 
+
+#%%
+print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+print ("____________PD-TV bench___________________")
+print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+
+## plot 
+fig = plt.figure()
+plt.suptitle('Comparison of PD-TV regulariser using CPU and GPU implementations')
+a=fig.add_subplot(1,4,1)
+a.set_title('Noisy Image')
+imgplot = plt.imshow(u0,cmap="gray")
+
+# set parameters
+pars = {'algorithm' : PD_TV, \
+        'input' : u0,\
+        'regularisation_parameter':0.02, \
+        'number_of_iterations' :1500 ,\
+        'tolerance_constant':0.0,\
+        'methodTV': 0 ,\
+        'nonneg': 0,
+        'lipschitz_const' : 8,
+        'tau' : 0.0025}
+        
+print ("#############PD TV CPU####################")
+start_time = timeit.default_timer()
+(pd_cpu,info_vec_cpu) = PD_TV(pars['input'], 
+              pars['regularisation_parameter'],
+              pars['number_of_iterations'],
+              pars['tolerance_constant'], 
+              pars['methodTV'],
+              pars['nonneg'],
+              pars['lipschitz_const'],
+              pars['tau'],'cpu')
+
+Qtools = QualityTools(Im, pd_cpu)
+pars['rmse'] = Qtools.rmse()
+
+txtstr = printParametersToString(pars)
+txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time)
+print (txtstr)
+a=fig.add_subplot(1,4,2)
+
+# these are matplotlib.patch.Patch properties
+props = dict(boxstyle='round', facecolor='wheat', alpha=0.75)
+# place a text box in upper left in axes coords
+a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14,
+         verticalalignment='top', bbox=props)
+imgplot = plt.imshow(pd_cpu, cmap="gray")
+plt.title('{}'.format('CPU results'))
+
+# set parameters
+pars = {'algorithm' : PD_TV, \
+        'input' : u0,\
+        'regularisation_parameter':0.02, \
+        'number_of_iterations' :1500 ,\
+        'tolerance_constant':0.0,\
+        'methodTV': 0 ,\
+        'nonneg': 0,
+        'lipschitz_const' : 8,
+        'tau' : 0.0025}
+        
+print ("#############PD TV CPU####################")
+start_time = timeit.default_timer()
+(pd_gpu,info_vec_gpu) = PD_TV(pars['input'], 
+              pars['regularisation_parameter'],
+              pars['number_of_iterations'],
+              pars['tolerance_constant'], 
+              pars['methodTV'],
+              pars['nonneg'],
+              pars['lipschitz_const'],
+              pars['tau'],'gpu')
+
+Qtools = QualityTools(Im, pd_gpu)
+pars['rmse'] = Qtools.rmse()
+
+txtstr = printParametersToString(pars)
+txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time)
+print (txtstr)
+a=fig.add_subplot(1,4,3)
+
+# these are matplotlib.patch.Patch properties
+props = dict(boxstyle='round', facecolor='wheat', alpha=0.75)
+# place a text box in upper left in axes coords
+a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14,
+         verticalalignment='top', bbox=props)
+imgplot = plt.imshow(pd_gpu, cmap="gray")
+plt.title('{}'.format('GPU results'))
+
+
+print ("--------Compare the results--------")
+tolerance = 1e-05
+diff_im = np.zeros(np.shape(pd_cpu))
+diff_im = abs(pd_cpu - pd_gpu)
+diff_im[diff_im > tolerance] = 1
+a=fig.add_subplot(1,4,4)
+imgplot = plt.imshow(diff_im, vmin=0, vmax=1, cmap="gray")
+plt.title('{}'.format('Pixels larger threshold difference'))
+if (diff_im.sum() > 1):
+    print ("Arrays do not match!")
+else:
+    print ("Arrays match")
 #%%
 print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
 print ("____________SB-TV bench___________________")
diff --git a/demos/demo_gpu_regularisers.py b/demos/demo_gpu_regularisers.py
index 3efcfce..5131847 100644
--- a/demos/demo_gpu_regularisers.py
+++ b/demos/demo_gpu_regularisers.py
@@ -12,7 +12,7 @@ import matplotlib.pyplot as plt
 import numpy as np
 import os
 import timeit
-from ccpi.filters.regularisers import ROF_TV, FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th
+from ccpi.filters.regularisers import ROF_TV, FGP_TV, PD_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th
 from ccpi.filters.regularisers import PatchSelect, NLTV
 from ccpi.supp.qualitymetrics import QualityTools
 ###############################################################################
@@ -158,6 +158,55 @@ imgplot = plt.imshow(fgp_gpu, cmap="gray")
 plt.title('{}'.format('GPU results'))
 #%%
 print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+print ("_______________PD-TV (2D)__________________")
+print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+
+## plot 
+fig = plt.figure()
+plt.suptitle('Performance of PD-TV regulariser using the GPU')
+a=fig.add_subplot(1,2,1)
+a.set_title('Noisy Image')
+imgplot = plt.imshow(u0,cmap="gray")
+
+# set parameters
+pars = {'algorithm' : PD_TV, \
+        'input' : u0,\
+        'regularisation_parameter':0.02, \
+        'number_of_iterations' :1500 ,\
+        'tolerance_constant':1e-06,\
+        'methodTV': 0 ,\
+        'nonneg': 1,
+        'lipschitz_const' : 8,
+        'tau' : 0.0025}
+        
+print ("#############PD TV CPU####################")
+start_time = timeit.default_timer()
+(pd_gpu,info_vec_gpu) = PD_TV(pars['input'], 
+              pars['regularisation_parameter'],
+              pars['number_of_iterations'],
+              pars['tolerance_constant'], 
+              pars['methodTV'],
+              pars['nonneg'],
+              pars['lipschitz_const'],
+              pars['tau'],'gpu')
+
+Qtools = QualityTools(Im, pd_gpu)
+pars['rmse'] = Qtools.rmse()
+
+txtstr = printParametersToString(pars)
+txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time)
+print (txtstr)
+a=fig.add_subplot(1,2,2)
+
+# these are matplotlib.patch.Patch properties
+props = dict(boxstyle='round', facecolor='wheat', alpha=0.75)
+# place a text box in upper left in axes coords
+a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14,
+         verticalalignment='top', bbox=props)
+imgplot = plt.imshow(pd_gpu, cmap="gray")
+plt.title('{}'.format('GPU results'))
+#%%
+print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
 print ("____________SB-TV regulariser______________")
 print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
 
diff --git a/demos/demo_gpu_regularisers3D.py b/demos/demo_gpu_regularisers3D.py
index ccf9694..2c25d01 100644
--- a/demos/demo_gpu_regularisers3D.py
+++ b/demos/demo_gpu_regularisers3D.py
@@ -12,7 +12,7 @@ import matplotlib.pyplot as plt
 import numpy as np
 import os
 import timeit
-from ccpi.filters.regularisers import ROF_TV, FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th
+from ccpi.filters.regularisers import ROF_TV, FGP_TV, PD_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th
 from ccpi.supp.qualitymetrics import QualityTools
 ###############################################################################
 def printParametersToString(pars):
@@ -172,7 +172,54 @@ a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14,
          verticalalignment='top', bbox=props)
 imgplot = plt.imshow(fgp_gpu3D[10,:,:], cmap="gray")
 plt.title('{}'.format('Recovered volume on the GPU using FGP-TV'))
+#%%
+print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+print ("_______________PD-TV (3D)__________________")
+print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+
+## plot 
+fig = plt.figure()
+plt.suptitle('Performance of PD-TV regulariser using the GPU')
+a=fig.add_subplot(1,2,1)
+a.set_title('Noisy Image')
+imgplot = plt.imshow(noisyVol[10,:,:],cmap="gray")
 
+# set parameters
+pars = {'algorithm' : PD_TV, \
+        'input' : noisyVol,\
+        'regularisation_parameter':0.02, \
+        'number_of_iterations' :1000 ,\
+        'tolerance_constant':1e-06,\
+        'methodTV': 0 ,\
+        'nonneg': 0,
+        'lipschitz_const' : 8,
+        'tau' : 0.0025}
+
+print ("#############PD TV GPU####################")
+start_time = timeit.default_timer()
+(pd_gpu3D, info_vec_gpu)  = PD_TV(pars['input'], 
+              pars['regularisation_parameter'],
+              pars['number_of_iterations'],
+              pars['tolerance_constant'], 
+              pars['methodTV'],
+              pars['nonneg'],
+              pars['lipschitz_const'],
+              pars['tau'],'gpu')
+
+Qtools = QualityTools(idealVol, pd_gpu3D)
+pars['rmse'] = Qtools.rmse()
+txtstr = printParametersToString(pars)
+txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time)
+print (txtstr)
+a=fig.add_subplot(1,2,2)
+
+# these are matplotlib.patch.Patch properties
+props = dict(boxstyle='round', facecolor='wheat', alpha=0.75)
+# place a text box in upper left in axes coords
+a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14,
+         verticalalignment='top', bbox=props)
+imgplot = plt.imshow(pd_gpu3D[10,:,:], cmap="gray")
+plt.title('{}'.format('Recovered volume on the GPU using PD-TV'))
 #%%
 print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
 print ("_______________SB-TV (3D)__________________")
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 5fe1a57..f93c19a 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -17,4 +17,12 @@ if (BUILD_MATLAB_WRAPPER)
 endif()
 if (BUILD_PYTHON_WRAPPER)
     add_subdirectory(Python)
-endif()
-\ No newline at end of file
+endif()
+find_package(OpenMP REQUIRED)
+if (OPENMP_FOUND)
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
+    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
+    set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}")
+   set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_SHARED_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}")
+   set (CMAKE_STATIC_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_STATIC_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}")
+endif()
diff --git a/src/Core/CMakeLists.txt b/src/Core/CMakeLists.txt
index eea0d63..4e43b5e 100644
--- a/src/Core/CMakeLists.txt
+++ b/src/Core/CMakeLists.txt
@@ -12,20 +12,18 @@ set (CIL_VERSION $ENV{CIL_VERSION} CACHE INTERNAL "Core Imaging Library version"
 message("CIL_VERSION ${CIL_VERSION}")
 #include (GenerateExportHeader)
 
+## Build the regularisers package as a library
+message("Creating Regularisers as a shared library")
 
-find_package(OpenMP)
+find_package(OpenMP REQUIRED)
 if (OPENMP_FOUND)
     set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
     set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
     set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}")
    set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_SHARED_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}")
    set (CMAKE_STATIC_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_STATIC_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}")
-   
 endif()
 
-## Build the regularisers package as a library
-message("Creating Regularisers as a shared library")
-
 message("CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS}")
 message("CMAKE_C_FLAGS ${CMAKE_C_FLAGS}")
 message("CMAKE_EXE_LINKER_FLAGS ${CMAKE_EXE_LINKER_FLAGS}")
@@ -39,21 +37,21 @@ if(WIN32)
   set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${FLAGS}")
   set (CMAKE_C_FLAGS "${CMAKE_CXX_FLAGS} ${FLAGS}")
   set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} /NODEFAULTLIB:MSVCRT.lib")
-  
+
   set (EXTRA_LIBRARIES)
-		
+
   message("library lib: ${LIBRARY_LIB}")
-  
+
 elseif(UNIX)
-   set (FLAGS "-O2 -funsigned-char -Wall  -Wl,--no-undefined  -DCCPiReconstructionIterative_EXPORTS ")  
+   set (FLAGS "-O2 -funsigned-char -Wall  -Wl,--no-undefined  -DCCPiReconstructionIterative_EXPORTS ")
    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${FLAGS}")
    set (CMAKE_C_FLAGS "${CMAKE_CXX_FLAGS} ${FLAGS}")
-  
-   set (EXTRA_LIBRARIES 
+
+   set (EXTRA_LIBRARIES
 		"gomp"
 		"m"
 		)
-   
+
 endif()
 message("CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS}")
 
@@ -66,6 +64,7 @@ message("Adding regularisers as a shared library")
 add_library(cilreg SHARED
 	    ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_CPU/FGP_TV_core.c
 	    ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_CPU/SB_TV_core.c
+            ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_CPU/PD_TV_core.c
 	    ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_CPU/TGV_core.c
 	    ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_CPU/Diffusion_core.c
 	    ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_CPU/Diffus4th_order_core.c
@@ -80,8 +79,8 @@ add_library(cilreg SHARED
 	    ${CMAKE_CURRENT_SOURCE_DIR}/inpainters_CPU/NonlocalMarching_Inpaint_core.c
 	    )
 target_link_libraries(cilreg ${EXTRA_LIBRARIES} )
-include_directories(cilreg PUBLIC 
-                      ${LIBRARY_INC}/include 
+include_directories(cilreg PUBLIC
+                      ${LIBRARY_INC}/include
 					  ${CMAKE_CURRENT_SOURCE_DIR}
 		              ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_CPU/
 		              ${CMAKE_CURRENT_SOURCE_DIR}/inpainters_CPU/  )
@@ -92,14 +91,14 @@ if (UNIX)
 message ("I'd install into ${CMAKE_INSTALL_PREFIX}/lib")
 install(TARGETS cilreg
 	LIBRARY DESTINATION lib
-	CONFIGURATIONS ${CMAKE_BUILD_TYPE} 
+	CONFIGURATIONS ${CMAKE_BUILD_TYPE}
 	)
 elseif(WIN32)
 message ("I'd install into ${CMAKE_INSTALL_PREFIX} lib bin")
-  install(TARGETS cilreg 
+  install(TARGETS cilreg
 	RUNTIME DESTINATION bin
 	ARCHIVE DESTINATION lib
-	CONFIGURATIONS ${CMAKE_BUILD_TYPE} 
+	CONFIGURATIONS ${CMAKE_BUILD_TYPE}
 	)
 endif()
 
@@ -114,6 +113,7 @@ if (BUILD_CUDA)
       CUDA_ADD_LIBRARY(cilregcuda SHARED
         ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/TV_ROF_GPU_core.cu
         ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/TV_FGP_GPU_core.cu
+        ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/TV_PD_GPU_core.cu
         ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/TV_SB_GPU_core.cu
         ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/LLT_ROF_GPU_core.cu
         ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/TGV_GPU_core.cu
@@ -126,14 +126,14 @@ if (BUILD_CUDA)
         message ("I'd install into ${CMAKE_INSTALL_PREFIX}/lib")
         install(TARGETS cilregcuda
         LIBRARY DESTINATION lib
-        CONFIGURATIONS ${CMAKE_BUILD_TYPE} 
+        CONFIGURATIONS ${CMAKE_BUILD_TYPE}
         )
       elseif(WIN32)
         message ("I'd install into ${CMAKE_INSTALL_PREFIX} lib bin")
         install(TARGETS cilregcuda
         RUNTIME DESTINATION bin
         ARCHIVE DESTINATION lib
-        CONFIGURATIONS ${CMAKE_BUILD_TYPE} 
+        CONFIGURATIONS ${CMAKE_BUILD_TYPE}
         )
       endif()
     else()
diff --git a/src/Core/regularisers_CPU/FGP_TV_core.c b/src/Core/regularisers_CPU/FGP_TV_core.c
index a16a2e5..12f2254 100644
--- a/src/Core/regularisers_CPU/FGP_TV_core.c
+++ b/src/Core/regularisers_CPU/FGP_TV_core.c
@@ -46,12 +46,12 @@ float TV_FGP_CPU_main(float *Input, float *Output, float *infovector, float lamb
     float tk = 1.0f;
     float tkp1 =1.0f;
     int count = 0;
-    
+
     if (dimZ <= 1) {
         /*2D case */
         float *Output_prev=NULL, *P1=NULL, *P2=NULL, *P1_prev=NULL, *P2_prev=NULL, *R1=NULL, *R2=NULL;
         DimTotal = (long)(dimX*dimY);
-        
+
         if (epsil != 0.0f) Output_prev = calloc(DimTotal, sizeof(float));
         P1 = calloc(DimTotal, sizeof(float));
         P2 = calloc(DimTotal, sizeof(float));
@@ -59,32 +59,32 @@ float TV_FGP_CPU_main(float *Input, float *Output, float *infovector, float lamb
         P2_prev = calloc(DimTotal, sizeof(float));
         R1 = calloc(DimTotal, sizeof(float));
         R2 = calloc(DimTotal, sizeof(float));
-        
+
         /* begin iterations */
         for(ll=0; ll<iterationsNumb; ll++) {
-            
+
             if ((epsil != 0.0f)  && (ll % 5 == 0)) copyIm(Output, Output_prev, (long)(dimX), (long)(dimY), 1l);
             /* computing the gradient of the objective function */
             Obj_func2D(Input, Output, R1, R2, lambdaPar, (long)(dimX), (long)(dimY));
-            
+
             /* apply nonnegativity */
             if (nonneg == 1) for(j=0; j<DimTotal; j++) {if (Output[j] < 0.0f) Output[j] = 0.0f;}
-            
+
             /*Taking a step towards minus of the gradient*/
             Grad_func2D(P1, P2, Output, R1, R2, lambdaPar, (long)(dimX), (long)(dimY));
-            
+
             /* projection step */
             Proj_func2D(P1, P2, methodTV, DimTotal);
-            
+
             /*updating R and t*/
             tkp1 = (1.0f + sqrtf(1.0f + 4.0f*tk*tk))*0.5f;
             Rupd_func2D(P1, P1_prev, P2, P2_prev, R1, R2, tkp1, tk, DimTotal);
-            
+
             /*storing old values*/
             copyIm(P1, P1_prev, (long)(dimX), (long)(dimY), 1l);
             copyIm(P2, P2_prev, (long)(dimX), (long)(dimY), 1l);
             tk = tkp1;
-            
+
             /* check early stopping criteria */
             if ((epsil != 0.0f)  && (ll % 5 == 0)) {
                 re = 0.0f; re1 = 0.0f;
@@ -105,7 +105,7 @@ float TV_FGP_CPU_main(float *Input, float *Output, float *infovector, float lamb
         /*3D case*/
         float *Output_prev=NULL, *P1=NULL, *P2=NULL, *P3=NULL, *P1_prev=NULL, *P2_prev=NULL, *P3_prev=NULL, *R1=NULL, *R2=NULL, *R3=NULL;
         DimTotal = (long)(dimX*dimY*dimZ);
-        
+
         if (epsil != 0.0f) Output_prev = calloc(DimTotal, sizeof(float));
         P1 = calloc(DimTotal, sizeof(float));
         P2 = calloc(DimTotal, sizeof(float));
@@ -116,28 +116,28 @@ float TV_FGP_CPU_main(float *Input, float *Output, float *infovector, float lamb
         R1 = calloc(DimTotal, sizeof(float));
         R2 = calloc(DimTotal, sizeof(float));
         R3 = calloc(DimTotal, sizeof(float));
-        
+
         /* begin iterations */
         for(ll=0; ll<iterationsNumb; ll++) {
-            
+
             if ((epsil != 0.0f)  && (ll % 5 == 0)) copyIm(Output, Output_prev, (long)(dimX), (long)(dimY), (long)(dimZ));
-            
+
             /* computing the gradient of the objective function */
             Obj_func3D(Input, Output, R1, R2, R3, lambdaPar, (long)(dimX), (long)(dimY), (long)(dimZ));
-            
+
             /* apply nonnegativity */
             if (nonneg == 1) for(j=0; j<DimTotal; j++) {if (Output[j] < 0.0f) Output[j] = 0.0f;}
-            
+
             /*Taking a step towards minus of the gradient*/
             Grad_func3D(P1, P2, P3, Output, R1, R2, R3, lambdaPar, (long)(dimX), (long)(dimY), (long)(dimZ));
-            
+
             /* projection step */
             Proj_func3D(P1, P2, P3, methodTV, DimTotal);
-            
+
             /*updating R and t*/
             tkp1 = (1.0f + sqrtf(1.0f + 4.0f*tk*tk))*0.5f;
             Rupd_func3D(P1, P1_prev, P2, P2_prev, P3, P3_prev, R1, R2, R3, tkp1, tk, DimTotal);
-            
+
             /* calculate norm - stopping rules*/
             if ((epsil != 0.0f)  && (ll % 5 == 0)) {
                 re = 0.0f; re1 = 0.0f;
@@ -151,22 +151,22 @@ float TV_FGP_CPU_main(float *Input, float *Output, float *infovector, float lamb
                 if (re < epsil)  count++;
                 if (count > 3) break;
             }
-            
+
             /*storing old values*/
             copyIm(P1, P1_prev, (long)(dimX), (long)(dimY), (long)(dimZ));
             copyIm(P2, P2_prev, (long)(dimX), (long)(dimY), (long)(dimZ));
             copyIm(P3, P3_prev, (long)(dimX), (long)(dimY), (long)(dimZ));
             tk = tkp1;
         }
-        
+
         if (epsil != 0.0f) free(Output_prev);
         free(P1); free(P2); free(P3); free(P1_prev); free(P2_prev); free(P3_prev); free(R1); free(R2); free(R3);
     }
-    
+
     /*adding info into info_vector */
     infovector[0] = (float)(ll);  /*iterations number (if stopped earlier based on tolerance)*/
     infovector[1] = re;  /* reached tolerance */
-    
+
     return 0;
 }
 
@@ -202,36 +202,6 @@ float Grad_func2D(float *P1, float *P2, float *D, float *R1, float *R2, float la
         }}
     return 1;
 }
-float Proj_func2D(float *P1, float *P2, int methTV, long DimTotal)
-{
-    float val1, val2, denom, sq_denom;
-    long i;
-    if (methTV == 0) {
-        /* isotropic TV*/
-#pragma omp parallel for shared(P1,P2) private(i,denom,sq_denom)
-        for(i=0; i<DimTotal; i++) {
-            denom = powf(P1[i],2) +  powf(P2[i],2);
-            if (denom > 1.0f) {
-                sq_denom = 1.0f/sqrtf(denom);
-                P1[i] = P1[i]*sq_denom;
-                P2[i] = P2[i]*sq_denom;
-            }
-        }
-    }
-    else {
-        /* anisotropic TV*/
-#pragma omp parallel for shared(P1,P2) private(i,val1,val2)
-        for(i=0; i<DimTotal; i++) {
-            val1 = fabs(P1[i]);
-            val2 = fabs(P2[i]);
-            if (val1 < 1.0f) {val1 = 1.0f;}
-            if (val2 < 1.0f) {val2 = 1.0f;}
-            P1[i] = P1[i]/val1;
-            P2[i] = P2[i]/val2;
-        }
-    }
-    return 1;
-}
 float Rupd_func2D(float *P1, float *P1_old, float *P2, float *P2_old, float *R1, float *R2, float tkp1, float tk, long DimTotal)
 {
     long i;
@@ -284,40 +254,6 @@ float Grad_func3D(float *P1, float *P2, float *P3, float *D, float *R1, float *R
             }}}
     return 1;
 }
-float Proj_func3D(float *P1, float *P2, float *P3, int methTV, long DimTotal)
-{
-    float val1, val2, val3, denom, sq_denom;
-    long i;
-    if (methTV == 0) {
-        /* isotropic TV*/
-#pragma omp parallel for shared(P1,P2,P3) private(i,val1,val2,val3,sq_denom)
-        for(i=0; i<DimTotal; i++) {
-            denom = powf(P1[i],2) + powf(P2[i],2) + powf(P3[i],2);
-            if (denom > 1.0f) {
-                sq_denom = 1.0f/sqrtf(denom);
-                P1[i] = P1[i]*sq_denom;
-                P2[i] = P2[i]*sq_denom;
-                P3[i] = P3[i]*sq_denom;
-            }
-        }
-    }
-    else {
-        /* anisotropic TV*/
-#pragma omp parallel for shared(P1,P2,P3) private(i,val1,val2,val3)
-        for(i=0; i<DimTotal; i++) {
-            val1 = fabs(P1[i]);
-            val2 = fabs(P2[i]);
-            val3 = fabs(P3[i]);
-            if (val1 < 1.0f) {val1 = 1.0f;}
-            if (val2 < 1.0f) {val2 = 1.0f;}
-            if (val3 < 1.0f) {val3 = 1.0f;}
-            P1[i] = P1[i]/val1;
-            P2[i] = P2[i]/val2;
-            P3[i] = P3[i]/val3;
-        }
-    }
-    return 1;
-}
 float Rupd_func3D(float *P1, float *P1_old, float *P2, float *P2_old, float *P3, float *P3_old, float *R1, float *R2, float *R3, float tkp1, float tk, long DimTotal)
 {
     long i;
diff --git a/src/Core/regularisers_CPU/FGP_TV_core.h b/src/Core/regularisers_CPU/FGP_TV_core.h
index 04e6e80..e9c3cde 100644
--- a/src/Core/regularisers_CPU/FGP_TV_core.h
+++ b/src/Core/regularisers_CPU/FGP_TV_core.h
@@ -29,12 +29,12 @@ limitations under the License.
 /* C-OMP implementation of FGP-TV [1] denoising/regularization model (2D/3D case)
  *
  * Input Parameters:
- * 1. Noisy image/volume 
- * 2. lambda - regularization parameter 
+ * 1. Noisy image/volume
+ * 2. lambda - regularization parameter
  * 3. Number of iterations
- * 4. eplsilon: tolerance constant 
+ * 4. eplsilon: tolerance constant
  * 5. TV-type: methodTV - 'iso' (0) or 'l1' (1)
- * 6. nonneg: 'nonnegativity (0 is OFF by default) 
+ * 6. nonneg: 'nonnegativity (0 is OFF by default)
  *
  * Output:
  * [1] Filtered/regularized image/volume
@@ -44,7 +44,7 @@ limitations under the License.
  * This function is based on the Matlab's code and paper by
  * [1] Amir Beck and Marc Teboulle, "Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems"
  */
- 
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -52,12 +52,10 @@ CCPI_EXPORT float TV_FGP_CPU_main(float *Input, float *Output, float *infovector
 
 CCPI_EXPORT float Obj_func2D(float *A, float *D, float *R1, float *R2, float lambda, long dimX, long dimY);
 CCPI_EXPORT float Grad_func2D(float *P1, float *P2, float *D, float *R1, float *R2, float lambda, long dimX, long dimY);
-CCPI_EXPORT float Proj_func2D(float *P1, float *P2, int methTV, long DimTotal);
 CCPI_EXPORT float Rupd_func2D(float *P1, float *P1_old, float *P2, float *P2_old, float *R1, float *R2, float tkp1, float tk, long DimTotal);
 
 CCPI_EXPORT float Obj_func3D(float *A, float *D, float *R1, float *R2, float *R3, float lambda, long dimX, long dimY, long dimZ);
 CCPI_EXPORT float Grad_func3D(float *P1, float *P2, float *P3, float *D, float *R1, float *R2, float *R3, float lambda, long dimX, long dimY, long dimZ);
-CCPI_EXPORT float Proj_func3D(float *P1, float *P2, float *P3, int methTV, long DimTotal);
 CCPI_EXPORT float Rupd_func3D(float *P1, float *P1_old, float *P2, float *P2_old, float *P3, float *P3_old, float *R1, float *R2, float *R3, float tkp1, float tk, long DimTotal);
 #ifdef __cplusplus
 }
diff --git a/src/Core/regularisers_CPU/FGP_dTV_core.c b/src/Core/regularisers_CPU/FGP_dTV_core.c
index afd7264..e828be6 100644
--- a/src/Core/regularisers_CPU/FGP_dTV_core.c
+++ b/src/Core/regularisers_CPU/FGP_dTV_core.c
@@ -52,11 +52,11 @@ float dTV_FGP_CPU_main(float *Input, float *InputRef, float *Output, float *info
     float tk = 1.0f;
     float tkp1=1.0f;
     int count = 0;
-    
-    
+
+
     float *Output_prev=NULL, *P1=NULL, *P2=NULL, *P1_prev=NULL, *P2_prev=NULL, *R1=NULL, *R2=NULL, *InputRef_x=NULL, *InputRef_y=NULL;
     DimTotal = (long)(dimX*dimY*dimZ);
-    
+
     if (epsil != 0.0f) Output_prev = calloc(DimTotal, sizeof(float));
     P1 = calloc(DimTotal, sizeof(float));
     P2 = calloc(DimTotal, sizeof(float));
@@ -66,39 +66,39 @@ float dTV_FGP_CPU_main(float *Input, float *InputRef, float *Output, float *info
     R2 = calloc(DimTotal, sizeof(float));
     InputRef_x = calloc(DimTotal, sizeof(float));
     InputRef_y = calloc(DimTotal, sizeof(float));
-    
+
     if (dimZ <= 1) {
         /*2D case */
         /* calculate gradient field (smoothed) for the reference image */
         GradNorm_func2D(InputRef, InputRef_x, InputRef_y, eta, (long)(dimX), (long)(dimY));
-        
+
         /* begin iterations */
         for(ll=0; ll<iterationsNumb; ll++) {
-            
+
             if ((epsil != 0.0f)  && (ll % 5 == 0)) copyIm(Output, Output_prev, (long)(dimX), (long)(dimY), 1l);
             /*projects a 2D vector field R-1,2 onto the orthogonal complement of another 2D vector field InputRef_xy*/
             ProjectVect_func2D(R1, R2, InputRef_x, InputRef_y, (long)(dimX), (long)(dimY));
-            
+
             /* computing the gradient of the objective function */
             Obj_dfunc2D(Input, Output, R1, R2, lambdaPar, (long)(dimX), (long)(dimY));
-            
+
             /* apply nonnegativity */
             if (nonneg == 1) for(j=0; j<DimTotal; j++) {if (Output[j] < 0.0f) Output[j] = 0.0f;}
-            
+
             /*Taking a step towards minus of the gradient*/
             Grad_dfunc2D(P1, P2, Output, R1, R2, InputRef_x, InputRef_y, lambdaPar, (long)(dimX), (long)(dimY));
-            
+
             /* projection step */
-            Proj_dfunc2D(P1, P2, methodTV, DimTotal);
-            
+            Proj_func2D(P1, P2, methodTV, DimTotal);
+
             /*updating R and t*/
             tkp1 = (1.0f + sqrt(1.0f + 4.0f*tk*tk))*0.5f;
             Rupd_dfunc2D(P1, P1_prev, P2, P2_prev, R1, R2, tkp1, tk, DimTotal);
-            
+
             copyIm(P1, P1_prev, (long)(dimX), (long)(dimY), 1l);
             copyIm(P2, P2_prev, (long)(dimX), (long)(dimY), 1l);
             tk = tkp1;
-            
+
             /* check early stopping criteria */
             if ((epsil != 0.0f)  && (ll % 5 == 0)) {
                 re = 0.0f; re1 = 0.0f;
@@ -116,45 +116,45 @@ float dTV_FGP_CPU_main(float *Input, float *InputRef, float *Output, float *info
     else {
         /*3D case*/
         float *P3=NULL, *P3_prev=NULL, *R3=NULL, *InputRef_z=NULL;
-        
+
         P3 = calloc(DimTotal, sizeof(float));
         P3_prev = calloc(DimTotal, sizeof(float));
         R3 = calloc(DimTotal, sizeof(float));
         InputRef_z = calloc(DimTotal, sizeof(float));
-        
+
         /* calculate gradient field (smoothed) for the reference volume */
         GradNorm_func3D(InputRef, InputRef_x, InputRef_y, InputRef_z, eta, (long)(dimX), (long)(dimY), (long)(dimZ));
-        
+
         /* begin iterations */
         for(ll=0; ll<iterationsNumb; ll++) {
-            
+
             if ((epsil != 0.0f)  && (ll % 5 == 0)) copyIm(Output, Output_prev, (long)(dimX), (long)(dimY), (long)(dimZ));
-            
+
             /*projects a 3D vector field R-1,2,3 onto the orthogonal complement of another 3D vector field InputRef_xyz*/
             ProjectVect_func3D(R1, R2, R3, InputRef_x, InputRef_y, InputRef_z, (long)(dimX), (long)(dimY), (long)(dimZ));
-            
+
             /* computing the gradient of the objective function */
             Obj_dfunc3D(Input, Output, R1, R2, R3, lambdaPar, (long)(dimX), (long)(dimY), (long)(dimZ));
-            
+
             /* apply nonnegativity */
             if (nonneg == 1) for(j=0; j<DimTotal; j++) {if (Output[j] < 0.0f) Output[j] = 0.0f;}
-            
+
             /*Taking a step towards minus of the gradient*/
             Grad_dfunc3D(P1, P2, P3, Output, R1, R2, R3, InputRef_x, InputRef_y, InputRef_z, lambdaPar, (long)(dimX), (long)(dimY), (long)(dimZ));
-            
+
             /* projection step */
-            Proj_dfunc3D(P1, P2, P3, methodTV, DimTotal);
-            
+            Proj_func3D(P1, P2, P3, methodTV, DimTotal);
+
             /*updating R and t*/
             tkp1 = (1.0f + sqrt(1.0f + 4.0f*tk*tk))*0.5f;
             Rupd_dfunc3D(P1, P1_prev, P2, P2_prev, P3, P3_prev, R1, R2, R3, tkp1, tk, DimTotal);
-            
+
             /*storing old values*/
             copyIm(P1, P1_prev, (long)(dimX), (long)(dimY), (long)(dimZ));
             copyIm(P2, P2_prev, (long)(dimX), (long)(dimY), (long)(dimZ));
             copyIm(P3, P3_prev, (long)(dimX), (long)(dimY), (long)(dimZ));
             tk = tkp1;
-            
+
             /* check early stopping criteria */
             if ((epsil != 0.0f)  && (ll % 5 == 0)) {
                 re = 0.0f; re1 = 0.0f;
@@ -168,16 +168,16 @@ float dTV_FGP_CPU_main(float *Input, float *InputRef, float *Output, float *info
                 if (count > 3) break;
             }
         }
-        
+
         free(P3); free(P3_prev); free(R3); free(InputRef_z);
     }
     if (epsil != 0.0f) free(Output_prev);
     free(P1); free(P2); free(P1_prev); free(P2_prev); free(R1); free(R2); free(InputRef_x); free(InputRef_y);
-    
+
     /*adding info into info_vector */
     infovector[0] = (float)(ll);  /*iterations number (if stopped earlier based on tolerance)*/
     infovector[1] = re;  /* reached tolerance */
-    
+
     return 0;
 }
 
@@ -185,7 +185,6 @@ float dTV_FGP_CPU_main(float *Input, float *InputRef, float *Output, float *info
 /********************************************************************/
 /***************************2D Functions*****************************/
 /********************************************************************/
-
 float GradNorm_func2D(float *B, float *B_x, float *B_y, float eta, long dimX, long dimY)
 {
     long i,j,index;
@@ -249,47 +248,17 @@ float Grad_dfunc2D(float *P1, float *P2, float *D, float *R1, float *R2, float *
             /* boundary conditions */
             if (i == dimX-1) val1 = 0.0f; else val1 = D[index] - D[j*dimX + (i+1)];
             if (j == dimY-1) val2 = 0.0f; else val2 = D[index] - D[(j+1)*dimX + i];
-            
+
             in_prod = val1*B_x[index] + val2*B_y[index];   /* calculate inner product */
             val1 = val1 - in_prod*B_x[index];
             val2 = val2 - in_prod*B_y[index];
-            
+
             P1[index] = R1[index] + multip*val1;
             P2[index] = R2[index] + multip*val2;
-            
+
         }}
     return 1;
 }
-float Proj_dfunc2D(float *P1, float *P2, int methTV, long DimTotal)
-{
-    float val1, val2, denom, sq_denom;
-    long i;
-    if (methTV == 0) {
-        /* isotropic TV*/
-#pragma omp parallel for shared(P1,P2) private(i,denom,sq_denom)
-        for(i=0; i<DimTotal; i++) {
-            denom = powf(P1[i],2) +  powf(P2[i],2);
-            if (denom > 1.0f) {
-                sq_denom = 1.0f/sqrtf(denom);
-                P1[i] = P1[i]*sq_denom;
-                P2[i] = P2[i]*sq_denom;
-            }
-        }
-    }
-    else {
-        /* anisotropic TV*/
-#pragma omp parallel for shared(P1,P2) private(i,val1,val2)
-        for(i=0; i<DimTotal; i++) {
-            val1 = fabs(P1[i]);
-            val2 = fabs(P2[i]);
-            if (val1 < 1.0f) {val1 = 1.0f;}
-            if (val2 < 1.0f) {val2 = 1.0f;}
-            P1[i] = P1[i]/val1;
-            P2[i] = P2[i]/val2;
-        }
-    }
-    return 1;
-}
 float Rupd_dfunc2D(float *P1, float *P1_old, float *P2, float *P2_old, float *R1, float *R2, float tkp1, float tk, long DimTotal)
 {
     long i;
@@ -314,14 +283,14 @@ float GradNorm_func3D(float *B, float *B_x, float *B_y, float *B_z, float eta, l
     for(k=0; k<dimZ; k++) {
         for(j=0; j<dimY; j++) {
             for(i=0; i<dimX; i++) {
-                
+
                 index = (dimX*dimY)*k + j*dimX+i;
-                
+
                 /* zero boundary conditions */
                 if (i == dimX-1) {val1 = 0.0f;} else {val1 = B[(dimX*dimY)*k + j*dimX+(i+1)];}
                 if (j == dimY-1) {val2 = 0.0f;} else {val2 = B[(dimX*dimY)*k + (j+1)*dimX+i];}
                 if (k == dimZ-1) {val3 = 0.0f;} else {val3 = B[(dimX*dimY)*(k+1) + (j)*dimX+i];}
-                
+
                 gradX = val1 - B[index];
                 gradY = val2 - B[index];
                 gradZ = val3 - B[index];
@@ -382,52 +351,18 @@ float Grad_dfunc3D(float *P1, float *P2, float *P3, float *D, float *R1, float *
                 if (i == dimX-1) val1 = 0.0f; else val1 = D[index] - D[(dimX*dimY)*k + j*dimX + (i+1)];
                 if (j == dimY-1) val2 = 0.0f; else val2 = D[index] - D[(dimX*dimY)*k + (j+1)*dimX + i];
                 if (k == dimZ-1) val3 = 0.0f; else val3 = D[index] - D[(dimX*dimY)*(k+1) + j*dimX + i];
-                
+
                 in_prod = val1*B_x[index] + val2*B_y[index] + val3*B_z[index];   /* calculate inner product */
                 val1 = val1 - in_prod*B_x[index];
                 val2 = val2 - in_prod*B_y[index];
                 val3 = val3 - in_prod*B_z[index];
-                
+
                 P1[index] = R1[index] + multip*val1;
                 P2[index] = R2[index] + multip*val2;
                 P3[index] = R3[index] + multip*val3;
             }}}
     return 1;
 }
-float Proj_dfunc3D(float *P1, float *P2, float *P3, int methTV, long DimTotal)
-{
-    float val1, val2, val3, denom, sq_denom;
-    long i;
-    if (methTV == 0) {
-        /* isotropic TV*/
-#pragma omp parallel for shared(P1,P2,P3) private(i,val1,val2,val3,sq_denom)
-        for(i=0; i<DimTotal; i++) {
-            denom = powf(P1[i],2) + powf(P2[i],2) + powf(P3[i],2);
-            if (denom > 1.0f) {
-                sq_denom = 1.0f/sqrtf(denom);
-                P1[i] = P1[i]*sq_denom;
-                P2[i] = P2[i]*sq_denom;
-                P3[i] = P3[i]*sq_denom;
-            }
-        }
-    }
-    else {
-        /* anisotropic TV*/
-#pragma omp parallel for shared(P1,P2,P3) private(i,val1,val2,val3)
-        for(i=0; i<DimTotal; i++) {
-            val1 = fabs(P1[i]);
-            val2 = fabs(P2[i]);
-            val3 = fabs(P3[i]);
-            if (val1 < 1.0f) {val1 = 1.0f;}
-            if (val2 < 1.0f) {val2 = 1.0f;}
-            if (val3 < 1.0f) {val3 = 1.0f;}
-            P1[i] = P1[i]/val1;
-            P2[i] = P2[i]/val2;
-            P3[i] = P3[i]/val3;
-        }
-    }
-    return 1;
-}
 float Rupd_dfunc3D(float *P1, float *P1_old, float *P2, float *P2_old, float *P3, float *P3_old, float *R1, float *R2, float *R3, float tkp1, float tk, long DimTotal)
 {
     long i;
diff --git a/src/Core/regularisers_CPU/FGP_dTV_core.h b/src/Core/regularisers_CPU/FGP_dTV_core.h
index 9ace06d..8582170 100644
--- a/src/Core/regularisers_CPU/FGP_dTV_core.h
+++ b/src/Core/regularisers_CPU/FGP_dTV_core.h
@@ -59,14 +59,12 @@ CCPI_EXPORT float GradNorm_func2D(float *B, float *B_x, float *B_y, float eta, l
 CCPI_EXPORT float ProjectVect_func2D(float *R1, float *R2, float *B_x, float *B_y, long dimX, long dimY);
 CCPI_EXPORT float Obj_dfunc2D(float *A, float *D, float *R1, float *R2, float lambda, long dimX, long dimY);
 CCPI_EXPORT float Grad_dfunc2D(float *P1, float *P2, float *D, float *R1, float *R2, float *B_x, float *B_y, float lambda, long dimX, long dimY);
-CCPI_EXPORT float Proj_dfunc2D(float *P1, float *P2, int methTV, long DimTotal);
 CCPI_EXPORT float Rupd_dfunc2D(float *P1, float *P1_old, float *P2, float *P2_old, float *R1, float *R2, float tkp1, float tk, long DimTotal);
 
 CCPI_EXPORT float GradNorm_func3D(float *B, float *B_x, float *B_y, float *B_z, float eta, long dimX, long dimY, long dimZ);
 CCPI_EXPORT float ProjectVect_func3D(float *R1, float *R2, float *R3, float *B_x, float *B_y, float *B_z, long dimX, long dimY, long dimZ);
 CCPI_EXPORT float Obj_dfunc3D(float *A, float *D, float *R1, float *R2, float *R3, float lambda, long dimX, long dimY, long dimZ);
 CCPI_EXPORT float Grad_dfunc3D(float *P1, float *P2, float *P3, float *D, float *R1, float *R2, float *R3, float *B_x, float *B_y, float *B_z, float lambda, long dimX, long dimY, long dimZ);
-CCPI_EXPORT float Proj_dfunc3D(float *P1, float *P2, float *P3, int methTV, long DimTotal);
 CCPI_EXPORT float Rupd_dfunc3D(float *P1, float *P1_old, float *P2, float *P2_old, float *P3, float *P3_old, float *R1, float *R2, float *R3, float tkp1, float tk, long DimTotal);
 #ifdef __cplusplus
 }
diff --git a/src/Core/regularisers_CPU/PD_TV_core.c b/src/Core/regularisers_CPU/PD_TV_core.c
new file mode 100644
index 0000000..534091b
--- /dev/null
+++ b/src/Core/regularisers_CPU/PD_TV_core.c
@@ -0,0 +1,252 @@
+/*
+ * This work is part of the Core Imaging Library developed by
+ * Visual Analytics and Imaging System Group of the Science Technology
+ * Facilities Council, STFC
+ *
+ * Copyright 2019 Daniil Kazantsev
+ * Copyright 2019 Srikanth Nagella, Edoardo Pasca
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "PD_TV_core.h"
+
+/* C-OMP implementation of Primal-Dual TV [1] by Chambolle Pock denoising/regularization model (2D/3D case)
+ *
+ * Input Parameters:
+ * 1. Noisy image/volume
+ * 2. lambdaPar - regularization parameter
+ * 3. Number of iterations
+ * 4. eplsilon: tolerance constant
+ * 5. lipschitz_const: convergence related parameter
+ * 6. TV-type: methodTV - 'iso' (0) or 'l1' (1)
+ * 7. nonneg: 'nonnegativity (0 is OFF by default, 1 is ON)
+ * 8. tau: time marching parameter
+
+ * Output:
+ * [1] TV - Filtered/regularized image/volume
+ * [2] Information vector which contains [iteration no., reached tolerance]
+ *
+ * [1] Antonin Chambolle, Thomas Pock. "A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging", 2010
+ */
+
+float PDTV_CPU_main(float *Input, float *U, float *infovector, float lambdaPar, int iterationsNumb, float epsil, float lipschitz_const, int methodTV, int nonneg, float tau, int dimX, int dimY, int dimZ)
+{
+    int ll;
+    long j, DimTotal;
+    float re, re1, sigma, theta, lt;
+    re = 0.0f; re1 = 0.0f;
+    int count = 0;
+
+    //tau = 1.0/powf(lipschitz_const,0.5);
+    //sigma = 1.0/powf(lipschitz_const,0.5);
+    sigma = 1.0/(lipschitz_const*tau);
+    theta = 1.0f;
+    lt = tau/lambdaPar;
+    ll = 0;
+
+
+    DimTotal = (long)(dimX*dimY*dimZ);
+
+    copyIm(Input, U, (long)(dimX), (long)(dimY), (long)(dimZ));
+
+    if (dimZ <= 1) {
+        /*2D case */
+        float *U_old=NULL, *P1=NULL, *P2=NULL;
+
+        U_old = calloc(DimTotal, sizeof(float));
+        P1 = calloc(DimTotal, sizeof(float));
+        P2 = calloc(DimTotal, sizeof(float));
+
+        /* begin iterations */
+        for(ll=0; ll<iterationsNumb; ll++) {
+
+            /* computing the the dual P variable */
+            DualP2D(U, P1, P2, (long)(dimX), (long)(dimY), sigma);
+
+            /* apply nonnegativity */
+            if (nonneg == 1) for(j=0; j<DimTotal; j++) {if (U[j] < 0.0f) U[j] = 0.0f;}
+
+            /* projection step */
+            Proj_func2D(P1, P2, methodTV, DimTotal);
+
+            /* copy U to U_old */
+            copyIm(U, U_old, (long)(dimX), (long)(dimY), 1l);
+
+            /* calculate divergence */
+            DivProj2D(U, Input, P1, P2,(long)(dimX), (long)(dimY), lt, tau);
+
+            /* check early stopping criteria */
+            if ((epsil != 0.0f)  && (ll % 5 == 0)) {
+                re = 0.0f; re1 = 0.0f;
+                for(j=0; j<DimTotal; j++)
+                {
+                    re += powf(U[j] - U_old[j],2);
+                    re1 += powf(U[j],2);
+                }
+                re = sqrtf(re)/sqrtf(re1);
+                if (re < epsil)  count++;
+                if (count > 3) break;
+            }
+            /*get updated solution*/
+
+            getX(U, U_old, theta, DimTotal);
+        }
+        free(P1); free(P2); free(U_old);
+    }
+    else {
+          /*3D case*/
+        float *U_old=NULL, *P1=NULL, *P2=NULL, *P3=NULL;
+        U_old = calloc(DimTotal, sizeof(float));
+        P1 = calloc(DimTotal, sizeof(float));
+        P2 = calloc(DimTotal, sizeof(float));
+        P3 = calloc(DimTotal, sizeof(float));
+
+        /* begin iterations */
+        for(ll=0; ll<iterationsNumb; ll++) {
+
+         /* computing the the dual P variable */
+            DualP3D(U, P1, P2, P3, (long)(dimX), (long)(dimY),  (long)(dimZ), sigma);
+
+            /* apply nonnegativity */
+            if (nonneg == 1) for(j=0; j<DimTotal; j++) {if (U[j] < 0.0f) U[j] = 0.0f;}
+
+            /* projection step */
+            Proj_func3D(P1, P2, P3, methodTV, DimTotal);
+
+            /* copy U to U_old */
+            copyIm(U, U_old, (long)(dimX), (long)(dimY), (long)(dimZ));
+
+            DivProj3D(U, Input, P1, P2, P3, (long)(dimX), (long)(dimY), (long)(dimZ), lt, tau);
+
+            /* check early stopping criteria */
+            if ((epsil != 0.0f)  && (ll % 5 == 0)) {
+                re = 0.0f; re1 = 0.0f;
+                for(j=0; j<DimTotal; j++)
+                {
+                    re += powf(U[j] - U_old[j],2);
+                    re1 += powf(U[j],2);
+                }
+                re = sqrtf(re)/sqrtf(re1);
+                if (re < epsil)  count++;
+                if (count > 3) break;
+            }
+            /*get updated solution*/
+
+            getX(U, U_old, theta, DimTotal);
+        }
+        free(P1); free(P2); free(P3); free(U_old);
+    }
+    /*adding info into info_vector */
+    infovector[0] = (float)(ll);  /*iterations number (if stopped earlier based on tolerance)*/
+    infovector[1] = re;  /* reached tolerance */
+
+    return 0;
+}
+
+/*****************************************************************/
+/************************2D-case related Functions */
+/*****************************************************************/
+
+/*Calculating dual variable (using forward differences)*/
+float DualP2D(float *U, float *P1, float *P2, long dimX, long dimY, float sigma)
+{
+     long i,j,index;
+     #pragma omp parallel for shared(U,P1,P2) private(index,i,j)
+     for(j=0; j<dimY; j++) {
+       for(i=0; i<dimX; i++) {
+          index = j*dimX+i;
+          /* symmetric boundary conditions (Neuman) */
+          if (i == dimX-1) P1[index] += sigma*(U[j*dimX+(i-1)] - U[index]);
+          else P1[index] += sigma*(U[j*dimX+(i+1)] - U[index]);
+          if (j == dimY-1) P2[index] += sigma*(U[(j-1)*dimX+i] - U[index]);
+          else  P2[index] += sigma*(U[(j+1)*dimX+i] - U[index]);
+        }}
+     return 1;
+}
+
+/* Divergence for P dual */
+float DivProj2D(float *U, float *Input, float *P1, float *P2, long dimX, long dimY, float lt, float tau)
+{
+  long i,j,index;
+  float P_v1, P_v2, div_var;
+  #pragma omp parallel for shared(U,Input,P1,P2) private(index, i, j, P_v1, P_v2, div_var)
+  for(j=0; j<dimY; j++) {
+    for(i=0; i<dimX; i++) {
+            index = j*dimX+i;
+            /* symmetric boundary conditions (Neuman) */
+            if (i == 0) P_v1 = -P1[index];
+            else P_v1 = -(P1[index] - P1[j*dimX+(i-1)]);
+            if (j == 0) P_v2 = -P2[index];
+            else  P_v2 = -(P2[index] - P2[(j-1)*dimX+i]);
+            div_var = P_v1 + P_v2;
+            U[index] = (U[index] - tau*div_var + lt*Input[index])/(1.0 + lt);
+          }}
+  return *U;
+}
+
+/*get the updated solution*/
+float getX(float *U, float *U_old, float theta, long DimTotal)
+{
+    long i;
+    #pragma omp parallel for shared(U,U_old) private(i)
+    for(i=0; i<DimTotal; i++) {
+          U[i] +=  theta*(U[i] - U_old[i]);
+          }
+    return *U;
+}
+
+
+/*****************************************************************/
+/************************3D-case related Functions */
+/*****************************************************************/
+/*Calculating dual variable (using forward differences)*/
+float DualP3D(float *U, float *P1, float *P2, float *P3, long dimX, long dimY, long dimZ, float sigma)
+{
+     long i,j,k,index;
+     #pragma omp parallel for shared(U,P1,P2,P3) private(index,i,j,k)
+     for(k=0; k<dimZ; k++) {
+         for(j=0; j<dimY; j++) {
+           for(i=0; i<dimX; i++) {
+          index = (dimX*dimY)*k + j*dimX+i;
+          /* symmetric boundary conditions (Neuman) */
+          if (i == dimX-1) P1[index] += sigma*(U[(dimX*dimY)*k + j*dimX+(i-1)] - U[index]);
+          else P1[index] += sigma*(U[(dimX*dimY)*k + j*dimX+(i+1)] - U[index]);
+          if (j == dimY-1) P2[index] += sigma*(U[(dimX*dimY)*k + (j-1)*dimX+i] - U[index]);
+          else  P2[index] += sigma*(U[(dimX*dimY)*k + (j+1)*dimX+i] - U[index]);
+          if (k == dimZ-1) P3[index] += sigma*(U[(dimX*dimY)*(k-1) + j*dimX+i] - U[index]);
+          else  P3[index] += sigma*(U[(dimX*dimY)*(k+1) + j*dimX+i] - U[index]);
+        }}}
+     return 1;
+}
+
+/* Divergence for P dual */
+float DivProj3D(float *U, float *Input, float *P1, float *P2, float *P3, long dimX, long dimY, long dimZ, float lt, float tau)
+{
+  long i,j,k,index;
+  float P_v1, P_v2, P_v3, div_var;
+  #pragma omp parallel for shared(U,Input,P1,P2) private(index, i, j, k, P_v1, P_v2, P_v3, div_var)
+  for(k=0; k<dimZ; k++) {
+      for(j=0; j<dimY; j++) {
+        for(i=0; i<dimX; i++) {
+            index = (dimX*dimY)*k + j*dimX+i;
+            /* symmetric boundary conditions (Neuman) */
+            if (i == 0) P_v1 = -P1[index];
+            else P_v1 = -(P1[index] - P1[(dimX*dimY)*k + j*dimX+(i-1)]);
+            if (j == 0) P_v2 = -P2[index];
+            else  P_v2 = -(P2[index] - P2[(dimX*dimY)*k + (j-1)*dimX+i]);
+            if (k == 0) P_v3 = -P3[index];
+            else  P_v3 = -(P3[index] - P3[(dimX*dimY)*(k-1) + j*dimX+i]);
+            div_var = P_v1 + P_v2 + P_v3;
+            U[index] = (U[index] - tau*div_var + lt*Input[index])/(1.0 + lt);
+   }}}
+  return *U;
+}
diff --git a/src/Core/regularisers_CPU/PD_TV_core.h b/src/Core/regularisers_CPU/PD_TV_core.h
new file mode 100644
index 0000000..97edc05
--- /dev/null
+++ b/src/Core/regularisers_CPU/PD_TV_core.h
@@ -0,0 +1,60 @@
+/*
+This work is part of the Core Imaging Library developed by
+Visual Analytics and Imaging System Group of the Science Technology
+Facilities Council, STFC
+
+Copyright 2019 Daniil Kazantsev
+Copyright 2019 Srikanth Nagella, Edoardo Pasca
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+
+//#include <matrix.h>
+#include <math.h>
+#include <stdlib.h>
+#include <memory.h>
+#include <stdio.h>
+#include "omp.h"
+#include "utils.h"
+#include "CCPiDefines.h"
+
+/* C-OMP implementation of Primal-Dual TV [1] by Chambolle Pock denoising/regularization model (2D/3D case)
+ *
+ * Input Parameters:
+ * 1. Noisy image/volume
+ * 2. lambdaPar - regularization parameter
+ * 3. Number of iterations
+ * 4. eplsilon: tolerance constant
+ * 5. lipschitz_const: convergence related parameter
+ * 6. TV-type: methodTV - 'iso' (0) or 'l1' (1)
+ * 7. nonneg: 'nonnegativity (0 is OFF by default, 1 is ON)
+
+ * Output:
+ * [1] TV - Filtered/regularized image/volume
+ * [2] Information vector which contains [iteration no., reached tolerance]
+ *
+ * [1] Antonin Chambolle, Thomas Pock. "A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging", 2010
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+float PDTV_CPU_main(float *Input, float *U, float *infovector, float lambdaPar, int iterationsNumb, float epsil, float lipschitz_const, int methodTV, int nonneg, float tau, int dimX, int dimY, int dimZ);
+
+CCPI_EXPORT float DualP2D(float *U, float *P1, float *P2, long dimX, long dimY, float sigma);
+CCPI_EXPORT float DivProj2D(float *U, float *Input, float *P1, float *P2, long dimX, long dimY, float lt, float tau);
+CCPI_EXPORT float getX(float *U, float *U_old, float theta, long DimTotal);
+
+CCPI_EXPORT float DualP3D(float *U, float *P1, float *P2, float *P3, long dimX, long dimY, long dimZ, float sigma);
+CCPI_EXPORT float DivProj3D(float *U, float *Input, float *P1, float *P2, float *P3, long dimX, long dimY, long dimZ, float lt, float tau);
+#ifdef __cplusplus
+}
+#endif
diff --git a/src/Core/regularisers_CPU/utils.c b/src/Core/regularisers_CPU/utils.c
index 5bb3a5c..088ace9 100644
--- a/src/Core/regularisers_CPU/utils.c
+++ b/src/Core/regularisers_CPU/utils.c
@@ -65,7 +65,7 @@ float TV_energy2D(float *U, float *U0, float *E_val, float lambda, int type, int
 {
     int i, j, i1, j1, index;
     float NOMx_2, NOMy_2, E_Grad=0.0f, E_Data=0.0f;
-    
+
     /* first calculate \grad U_xy*/
     for(j=0; j<dimY; j++) {
         for(i=0; i<dimX; i++) {
@@ -73,7 +73,7 @@ float TV_energy2D(float *U, float *U0, float *E_val, float lambda, int type, int
             /* boundary conditions */
             i1 = i + 1; if (i == dimX-1) i1 = i;
             j1 = j + 1; if (j == dimY-1) j1 = j;
-            
+
             /* Forward differences */
             NOMx_2 = powf((float)(U[j1*dimX + i] - U[index]),2); /* x+ */
             NOMy_2 = powf((float)(U[j*dimX + i1] - U[index]),2); /* y+ */
@@ -90,7 +90,7 @@ float TV_energy3D(float *U, float *U0, float *E_val, float lambda, int type, int
 {
     long i, j, k, i1, j1, k1, index;
     float NOMx_2, NOMy_2, NOMz_2, E_Grad=0.0f, E_Data=0.0f;
-    
+
     /* first calculate \grad U_xy*/
     for(j=0; j<(long)(dimY); j++) {
         for(i=0; i<(long)(dimX); i++) {
@@ -100,12 +100,12 @@ float TV_energy3D(float *U, float *U0, float *E_val, float lambda, int type, int
                 i1 = i + 1; if (i == (long)(dimX-1)) i1 = i;
                 j1 = j + 1; if (j == (long)(dimY-1)) j1 = j;
                 k1 = k + 1; if (k == (long)(dimZ-1)) k1 = k;
-                
+
                 /* Forward differences */
                 NOMx_2 = powf((float)(U[(dimX*dimY)*k + j1*dimX+i] - U[index]),2); /* x+ */
                 NOMy_2 = powf((float)(U[(dimX*dimY)*k + j*dimX+i1] - U[index]),2); /* y+ */
                 NOMz_2 = powf((float)(U[(dimX*dimY)*k1 + j*dimX+i] - U[index]),2); /* z+ */
-                
+
                 E_Grad += 2.0f*lambda*sqrtf((float)(NOMx_2) + (float)(NOMy_2) + (float)(NOMz_2)); /* gradient term energy */
                 E_Data += (powf((float)(U[index]-U0[index]),2)); /* fidelity term energy */
             }
@@ -131,12 +131,12 @@ float Im_scale2D(float *Input, float *Scaled, int w, int h, int w2, int h2)
             x_diff = (x_ratio * j) - x;
             y_diff = (y_ratio * i) - y;
             index = y*w+x ;
-            
+
             A = Input[index];
             B = Input[index+1];
             C = Input[index+w];
             D = Input[index+w+1];
-            
+
             gray = (float)(A*(1.0 - x_diff)*(1.0 - y_diff) +  B*(x_diff)*(1.0 - y_diff) +
                     C*(y_diff)*(1.0 - x_diff) +  D*(x_diff*y_diff));
 
@@ -144,3 +144,70 @@ float Im_scale2D(float *Input, float *Scaled, int w, int h, int w2, int h2)
         }}
     return *Scaled;
 }
+
+/*2D Projection onto convex set for P (called in PD_TV, FGP_dTV and FGP_TV methods)*/
+float Proj_func2D(float *P1, float *P2, int methTV, long DimTotal)
+{
+    float val1, val2, denom, sq_denom;
+    long i;
+    if (methTV == 0) {
+        /* isotropic TV*/
+#pragma omp parallel for shared(P1,P2) private(i,denom,sq_denom)
+        for(i=0; i<DimTotal; i++) {
+            denom = powf(P1[i],2) +  powf(P2[i],2);
+            if (denom > 1.0f) {
+                sq_denom = 1.0f/sqrtf(denom);
+                P1[i] = P1[i]*sq_denom;
+                P2[i] = P2[i]*sq_denom;
+            }
+        }
+    }
+    else {
+        /* anisotropic TV*/
+#pragma omp parallel for shared(P1,P2) private(i,val1,val2)
+        for(i=0; i<DimTotal; i++) {
+            val1 = fabs(P1[i]);
+            val2 = fabs(P2[i]);
+            if (val1 < 1.0f) {val1 = 1.0f;}
+            if (val2 < 1.0f) {val2 = 1.0f;}
+            P1[i] = P1[i]/val1;
+            P2[i] = P2[i]/val2;
+        }
+    }
+    return 1;
+}
+/*3D Projection onto convex set for P (called in PD_TV, FGP_TV, FGP_dTV methods)*/
+float Proj_func3D(float *P1, float *P2, float *P3, int methTV, long DimTotal)
+{
+    float val1, val2, val3, denom, sq_denom;
+    long i;
+    if (methTV == 0) {
+        /* isotropic TV*/
+#pragma omp parallel for shared(P1,P2,P3) private(i,val1,val2,val3,sq_denom)
+        for(i=0; i<DimTotal; i++) {
+            denom = powf(P1[i],2) + powf(P2[i],2) + powf(P3[i],2);
+            if (denom > 1.0f) {
+                sq_denom = 1.0f/sqrtf(denom);
+                P1[i] = P1[i]*sq_denom;
+                P2[i] = P2[i]*sq_denom;
+                P3[i] = P3[i]*sq_denom;
+            }
+        }
+    }
+    else {
+        /* anisotropic TV*/
+#pragma omp parallel for shared(P1,P2,P3) private(i,val1,val2,val3)
+        for(i=0; i<DimTotal; i++) {
+            val1 = fabs(P1[i]);
+            val2 = fabs(P2[i]);
+            val3 = fabs(P3[i]);
+            if (val1 < 1.0f) {val1 = 1.0f;}
+            if (val2 < 1.0f) {val2 = 1.0f;}
+            if (val3 < 1.0f) {val3 = 1.0f;}
+            P1[i] = P1[i]/val1;
+            P2[i] = P2[i]/val2;
+            P3[i] = P3[i]/val3;
+        }
+    }
+    return 1;
+}
diff --git a/src/Core/regularisers_CPU/utils.h b/src/Core/regularisers_CPU/utils.h
index 8f0ba59..4b57f71 100644
--- a/src/Core/regularisers_CPU/utils.h
+++ b/src/Core/regularisers_CPU/utils.h
@@ -31,6 +31,8 @@ CCPI_EXPORT float TV_energy2D(float *U, float *U0, float *E_val, float lambda, i
 CCPI_EXPORT float TV_energy3D(float *U, float *U0, float *E_val, float lambda, int type, int dimX, int dimY, int dimZ);
 CCPI_EXPORT float TV_energy3D(float *U, float *U0, float *E_val, float lambda, int type, int dimX, int dimY, int dimZ);
 CCPI_EXPORT float Im_scale2D(float *Input, float *Scaled, int w, int h, int w2, int h2);
+CCPI_EXPORT float Proj_func2D(float *P1, float *P2, int methTV, long DimTotal);
+CCPI_EXPORT float Proj_func3D(float *P1, float *P2, float *P3, int methTV, long DimTotal);
 #ifdef __cplusplus
 }
 #endif
diff --git a/src/Core/regularisers_GPU/TV_PD_GPU_core.cu b/src/Core/regularisers_GPU/TV_PD_GPU_core.cu
new file mode 100644
index 0000000..59fda3b
--- /dev/null
+++ b/src/Core/regularisers_GPU/TV_PD_GPU_core.cu
@@ -0,0 +1,522 @@
+/*
+This work is part of the Core Imaging Library developed by
+Visual Analytics and Imaging System Group of the Science Technology
+Facilities Council, STFC
+
+Copyright 2017 Daniil Kazantsev
+Copyright 2017 Srikanth Nagella, Edoardo Pasca
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+
+#include "TV_PD_GPU_core.h"
+#include "shared.h"
+#include <thrust/functional.h>
+#include <thrust/device_vector.h>
+#include <thrust/transform_reduce.h>
+
+/* CUDA implementation of Primal-Dual TV [1] by Chambolle Pock denoising/regularization model (2D/3D case)
+ *
+ * Input Parameters:
+ * 1. Noisy image/volume
+ * 2. lambdaPar - regularization parameter
+ * 3. Number of iterations
+ * 4. eplsilon: tolerance constant
+ * 5. lipschitz_const: convergence related parameter
+ * 6. TV-type: methodTV - 'iso' (0) or 'l1' (1)
+ * 7. nonneg: 'nonnegativity (0 is OFF by default, 1 is ON)
+ * 8. tau: time marching parameter
+
+ * Output:
+ * [1] TV - Filtered/regularized image/volume
+ * [2] Information vector which contains [iteration no., reached tolerance]
+ *
+ * [1] Antonin Chambolle, Thomas Pock. "A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging", 2010
+ */
+
+#define BLKXSIZE2D 16
+#define BLKYSIZE2D 16
+
+#define BLKXSIZE 8
+#define BLKYSIZE 8
+#define BLKZSIZE 8
+
+#define idivup(a, b) ( ((a)%(b) != 0) ? (a)/(b)+1 : (a)/(b) )
+// struct square { __host__ __device__ float operator()(float x) { return x * x; } };
+
+/************************************************/
+/*****************2D modules*********************/
+/************************************************/
+
+__global__ void dualPD_kernel(float *U, float *P1, float *P2, float sigma, int N, int M)
+{
+
+   //calculate each thread global index
+   const int xIndex=blockIdx.x*blockDim.x+threadIdx.x;
+   const int yIndex=blockIdx.y*blockDim.y+threadIdx.y;
+
+   int index = xIndex + N*yIndex;
+
+   if ((xIndex < N) && (yIndex < M)) {
+     if (xIndex == N-1) P1[index] += sigma*(U[(xIndex-1) + N*yIndex] - U[index]);
+     else P1[index] += sigma*(U[(xIndex+1) + N*yIndex] - U[index]);
+     if (yIndex == M-1) P2[index] += sigma*(U[xIndex + N*(yIndex-1)] - U[index]);
+     else  P2[index] += sigma*(U[xIndex + N*(yIndex+1)] - U[index]);
+   }
+   return;
+}
+__global__ void Proj_funcPD2D_iso_kernel(float *P1, float *P2, int N, int M, int ImSize)
+{
+
+   float denom;
+   //calculate each thread global index
+   const int xIndex=blockIdx.x*blockDim.x+threadIdx.x;
+   const int yIndex=blockIdx.y*blockDim.y+threadIdx.y;
+
+   int index = xIndex + N*yIndex;
+
+   if ((xIndex < N) && (yIndex < M)) {
+       denom = pow(P1[index],2) +  pow(P2[index],2);
+       if (denom > 1.0f) {
+           P1[index] = P1[index]/sqrt(denom);
+           P2[index] = P2[index]/sqrt(denom);
+       }
+   }
+   return;
+}
+__global__ void Proj_funcPD2D_aniso_kernel(float *P1, float *P2, int N, int M, int ImSize)
+{
+
+   float val1, val2;
+   //calculate each thread global index
+   const int xIndex=blockIdx.x*blockDim.x+threadIdx.x;
+   const int yIndex=blockIdx.y*blockDim.y+threadIdx.y;
+
+   int index = xIndex + N*yIndex;
+
+   if ((xIndex < N) && (yIndex < M)) {
+               val1 = abs(P1[index]);
+               val2 = abs(P2[index]);
+               if (val1 < 1.0f) {val1 = 1.0f;}
+               if (val2 < 1.0f) {val2 = 1.0f;}
+               P1[index] = P1[index]/val1;
+               P2[index] = P2[index]/val2;
+   }
+   return;
+}
+__global__ void DivProj2D_kernel(float *U, float *Input, float *P1, float *P2, float lt, float tau, int N, int M)
+{
+   float P_v1, P_v2, div_var;
+
+   //calculate each thread global index
+   const int xIndex=blockIdx.x*blockDim.x+threadIdx.x;
+   const int yIndex=blockIdx.y*blockDim.y+threadIdx.y;
+
+   int index = xIndex + N*yIndex;
+
+   if ((xIndex < N) && (yIndex < M)) {
+     if (xIndex == 0) P_v1 = -P1[index];
+     else P_v1 = -(P1[index] - P1[(xIndex-1) + N*yIndex]);
+     if (yIndex == 0) P_v2 = -P2[index];
+     else  P_v2 = -(P2[index] - P2[xIndex + N*(yIndex-1)]);
+     div_var = P_v1 + P_v2;
+     U[index] = (U[index] - tau*div_var + lt*Input[index])/(1.0 + lt);
+   }
+   return;
+}
+__global__ void PDnonneg2D_kernel(float* Output, int N, int M, int num_total)
+{
+   int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
+   int yIndex = blockDim.y * blockIdx.y + threadIdx.y;
+
+   int index = xIndex + N*yIndex;
+
+   if (index < num_total)	{
+       if (Output[index] < 0.0f) Output[index] = 0.0f;
+   }
+}
+/************************************************/
+/*****************3D modules*********************/
+/************************************************/
+__global__ void dualPD3D_kernel(float *U, float *P1, float *P2, float *P3, float sigma, int N, int M, int Z)
+{
+
+  //calculate each thread global index
+  int i = blockDim.x * blockIdx.x + threadIdx.x;
+  int j = blockDim.y * blockIdx.y + threadIdx.y;
+  int k = blockDim.z * blockIdx.z + threadIdx.z;
+
+  int index = (N*M)*k + i + N*j;
+
+  if ((i < N) && (j < M) && (k < Z)) {
+     if (i == N-1) P1[index] += sigma*(U[(N*M)*k + (i-1) + N*j] - U[index]);
+     else P1[index] += sigma*(U[(N*M)*k + (i+1) + N*j] - U[index]);
+     if (j == M-1) P2[index] += sigma*(U[(N*M)*k + i + N*(j-1)] - U[index]);
+     else  P2[index] += sigma*(U[(N*M)*k + i + N*(j+1)] - U[index]);
+     if (k == Z-1) P3[index] += sigma*(U[(N*M)*(k-1) + i + N*j] - U[index]);
+     else  P3[index] += sigma*(U[(N*M)*(k+1) + i + N*j] - U[index]);
+   }
+   return;
+}
+__global__ void Proj_funcPD3D_iso_kernel(float *P1, float *P2, float *P3, int N, int M, int Z, int ImSize)
+{
+
+   float denom,sq_denom;
+   //calculate each thread global index
+   int i = blockDim.x * blockIdx.x + threadIdx.x;
+   int j = blockDim.y * blockIdx.y + threadIdx.y;
+   int k = blockDim.z * blockIdx.z + threadIdx.z;
+
+   int index = (N*M)*k + i + N*j;
+
+   if ((i < N) && (j < M) && (k <  Z)) {
+       denom = pow(P1[index],2) +  pow(P2[index],2) + pow(P3[index],2);
+       if (denom > 1.0f) {
+           sq_denom = 1.0f/sqrt(denom);
+           P1[index] *= sq_denom;
+           P2[index] *= sq_denom;
+           P3[index] *= sq_denom;
+       }
+   }
+   return;
+}
+__global__ void Proj_funcPD3D_aniso_kernel(float *P1, float *P2, float *P3, int N, int M, int Z, int ImSize)
+{
+
+   float val1, val2, val3;
+   //calculate each thread global index
+   int i = blockDim.x * blockIdx.x + threadIdx.x;
+   int j = blockDim.y * blockIdx.y + threadIdx.y;
+   int k = blockDim.z * blockIdx.z + threadIdx.z;
+
+   int index = (N*M)*k + i + N*j;
+
+   if ((i < N) && (j < M) && (k <  Z)) {
+               val1 = abs(P1[index]);
+               val2 = abs(P2[index]);
+               val3 = abs(P3[index]);
+               if (val1 < 1.0f) {val1 = 1.0f;}
+               if (val2 < 1.0f) {val2 = 1.0f;}
+               if (val3 < 1.0f) {val3 = 1.0f;}
+               P1[index] /= val1;
+               P2[index] /= val2;
+               P3[index] /= val3;
+   }
+   return;
+}
+__global__ void DivProj3D_kernel(float *U, float *Input, float *P1, float *P2, float *P3, float lt, float tau, int N, int M, int Z)
+{
+   float P_v1, P_v2, P_v3, div_var;
+
+   //calculate each thread global index
+   int i = blockDim.x * blockIdx.x + threadIdx.x;
+   int j = blockDim.y * blockIdx.y + threadIdx.y;
+   int k = blockDim.z * blockIdx.z + threadIdx.z;
+
+   int index = (N*M)*k + i + N*j;
+
+   if ((i < N) && (j < M) && (k <  Z)) {
+     if (i == 0) P_v1 = -P1[index];
+     else P_v1 = -(P1[index] - P1[(N*M)*k + (i-1) + N*j]);
+     if (j == 0) P_v2 = -P2[index];
+     else  P_v2 = -(P2[index] - P2[(N*M)*k + i + N*(j-1)]);
+     if (k == 0) P_v3 = -P3[index];
+     else  P_v3 = -(P3[index] - P3[(N*M)*(k-1) + i + N*j]);
+     div_var = P_v1 + P_v2 + P_v3;
+     U[index] = (U[index] - tau*div_var + lt*Input[index])/(1.0 + lt);
+   }
+   return;
+}
+
+__global__ void PDnonneg3D_kernel(float* Output, int N, int M, int Z, int num_total)
+{
+   int i = blockDim.x * blockIdx.x + threadIdx.x;
+   int j = blockDim.y * blockIdx.y + threadIdx.y;
+   int k = blockDim.z * blockIdx.z + threadIdx.z;
+
+   int index = (N*M)*k + i + N*j;
+
+   if (index < num_total)	{
+       if (Output[index] < 0.0f) Output[index] = 0.0f;
+   }
+}
+__global__ void PDcopy_kernel2D(float *Input, float* Output, int N, int M, int num_total)
+{
+   int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
+   int yIndex = blockDim.y * blockIdx.y + threadIdx.y;
+
+   int index = xIndex + N*yIndex;
+
+   if (index < num_total)	{
+       Output[index] = Input[index];
+   }
+}
+
+__global__ void PDcopy_kernel3D(float *Input, float* Output, int N, int M, int Z, int num_total)
+{
+   int i = blockDim.x * blockIdx.x + threadIdx.x;
+   int j = blockDim.y * blockIdx.y + threadIdx.y;
+   int k = blockDim.z * blockIdx.z + threadIdx.z;
+
+   int index = (N*M)*k + i + N*j;
+
+   if (index < num_total)	{
+       Output[index] = Input[index];
+   }
+}
+
+__global__ void getU2D_kernel(float *Input, float *Input_old, float theta, int N, int M, int num_total)
+{
+   int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
+   int yIndex = blockDim.y * blockIdx.y + threadIdx.y;
+
+   int index = xIndex + N*yIndex;
+
+   if (index < num_total)	{
+       Input[index] += theta*(Input[index] - Input_old[index]);
+   }
+}
+
+__global__ void getU3D_kernel(float *Input, float *Input_old, float theta, int N, int M, int Z, int num_total)
+{
+  int i = blockDim.x * blockIdx.x + threadIdx.x;
+  int j = blockDim.y * blockIdx.y + threadIdx.y;
+  int k = blockDim.z * blockIdx.z + threadIdx.z;
+
+  int index = (N*M)*k + i + N*j;
+
+   if (index < num_total)	{
+       Input[index] += theta*(Input[index] - Input_old[index]);
+   }
+}
+
+__global__ void PDResidCalc2D_kernel(float *Input1, float *Input2, float* Output, int N, int M, int num_total)
+{
+    int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
+    int yIndex = blockDim.y * blockIdx.y + threadIdx.y;
+
+    int index = xIndex + N*yIndex;
+
+    if (index < num_total)	{
+        Output[index] = Input1[index] - Input2[index];
+    }
+}
+
+__global__ void PDResidCalc3D_kernel(float *Input1, float *Input2, float* Output, int N, int M, int Z, int num_total)
+{
+   	int i = blockDim.x * blockIdx.x + threadIdx.x;
+    int j = blockDim.y * blockIdx.y + threadIdx.y;
+    int k = blockDim.z * blockIdx.z + threadIdx.z;
+
+    int index = (N*M)*k + i + N*j;
+
+    if (index < num_total)	{
+        Output[index] = Input1[index] - Input2[index];
+    }
+}
+
+
+/*%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*/
+
+////////////MAIN HOST FUNCTION ///////////////
+extern "C" int TV_PD_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iter, float epsil, float lipschitz_const, int methodTV, int nonneg, float tau, int dimX, int dimY, int dimZ)
+{
+   int deviceCount = -1; // number of devices
+   cudaGetDeviceCount(&deviceCount);
+   if (deviceCount == 0) {
+       fprintf(stderr, "No CUDA devices found\n");
+       return -1;
+   }
+   int count = 0, i;
+   float re, sigma, theta, lt;
+   re = 0.0f;
+
+   sigma = 1.0/(lipschitz_const*tau);
+   theta = 1.0f;
+   lt = tau/lambdaPar;
+
+   if (dimZ <= 1) {
+   /*2D verson*/
+     int ImSize = dimX*dimY;
+     float *d_input, *d_update, *d_old=NULL, *P1=NULL, *P2=NULL;
+
+     dim3 dimBlock(BLKXSIZE2D,BLKYSIZE2D);
+     dim3 dimGrid(idivup(dimX,BLKXSIZE2D), idivup(dimY,BLKYSIZE2D));
+
+      /*allocate space for images on device*/
+      checkCudaErrors( cudaMalloc((void**)&d_input,ImSize*sizeof(float)) );
+      checkCudaErrors( cudaMalloc((void**)&d_update,ImSize*sizeof(float)) );
+      checkCudaErrors( cudaMalloc((void**)&d_old,ImSize*sizeof(float)) );
+      checkCudaErrors( cudaMalloc((void**)&P1,ImSize*sizeof(float)) );
+      checkCudaErrors( cudaMalloc((void**)&P2,ImSize*sizeof(float)) );
+
+       checkCudaErrors( cudaMemcpy(d_input,Input,ImSize*sizeof(float),cudaMemcpyHostToDevice));
+       checkCudaErrors( cudaMemcpy(d_update,Input,ImSize*sizeof(float),cudaMemcpyHostToDevice));
+       cudaMemset(P1, 0, ImSize*sizeof(float));
+       cudaMemset(P2, 0, ImSize*sizeof(float));
+
+       /********************** Run CUDA 2D kernel here ********************/
+       /* The main kernel */
+       for (i = 0; i < iter; i++) {
+
+           /* computing the the dual P variable */
+           dualPD_kernel<<<dimGrid,dimBlock>>>(d_update, P1, P2, sigma, dimX, dimY);
+           checkCudaErrors( cudaDeviceSynchronize() );
+           checkCudaErrors(cudaPeekAtLastError() );
+
+           if (nonneg != 0) {
+           PDnonneg2D_kernel<<<dimGrid,dimBlock>>>(d_update, dimX, dimY, ImSize);
+           checkCudaErrors( cudaDeviceSynchronize() );
+           checkCudaErrors(cudaPeekAtLastError() ); }
+
+           /* projection step */
+           if (methodTV == 0) Proj_funcPD2D_iso_kernel<<<dimGrid,dimBlock>>>(P1, P2, dimX, dimY, ImSize); /*isotropic TV*/
+           else Proj_funcPD2D_aniso_kernel<<<dimGrid,dimBlock>>>(P1, P2, dimX, dimY, ImSize); /*anisotropic TV*/
+           checkCudaErrors( cudaDeviceSynchronize() );
+           checkCudaErrors(cudaPeekAtLastError() );
+
+           /* copy U to U_old */
+           PDcopy_kernel2D<<<dimGrid,dimBlock>>>(d_update, d_old, dimX, dimY, ImSize);
+           checkCudaErrors( cudaDeviceSynchronize() );
+           checkCudaErrors(cudaPeekAtLastError() );
+
+           /* calculate divergence */
+           DivProj2D_kernel<<<dimGrid,dimBlock>>>(d_update, d_input, P1, P2, lt, tau, dimX, dimY);
+           checkCudaErrors( cudaDeviceSynchronize() );
+           checkCudaErrors(cudaPeekAtLastError() );
+
+           if ((epsil != 0.0f) && (i % 5 == 0)) {
+               /* calculate norm - stopping rules using the Thrust library */
+               PDResidCalc2D_kernel<<<dimGrid,dimBlock>>>(d_update, d_old, P1, dimX, dimY, ImSize);
+               checkCudaErrors( cudaDeviceSynchronize() );
+               checkCudaErrors(cudaPeekAtLastError() );
+
+               // setup arguments
+               square<float>        unary_op;
+               thrust::plus<float> binary_op;
+               thrust::device_vector<float> d_vec(P1, P1 + ImSize);
+               float reduction = std::sqrt(thrust::transform_reduce(d_vec.begin(), d_vec.end(), unary_op, 0.0f, binary_op));
+               thrust::device_vector<float> d_vec2(d_update, d_update + ImSize);
+               float reduction2 = std::sqrt(thrust::transform_reduce(d_vec2.begin(), d_vec2.end(), unary_op, 0.0f, binary_op));
+
+               // compute norm
+               re = (reduction/reduction2);
+               if (re < epsil)  count++;
+               if (count > 3) break;
+             }
+
+           getU2D_kernel<<<dimGrid,dimBlock>>>(d_update, d_old, theta, dimX, dimY, ImSize);
+           checkCudaErrors( cudaDeviceSynchronize() );
+           checkCudaErrors(cudaPeekAtLastError() );
+          }
+           //copy result matrix from device to host memory
+           cudaMemcpy(Output,d_update,ImSize*sizeof(float),cudaMemcpyDeviceToHost);
+
+           cudaFree(d_input);
+           cudaFree(d_update);
+           cudaFree(d_old);
+           cudaFree(P1);
+           cudaFree(P2);
+
+   }
+   else {
+           /*3D verson*/
+           int ImSize = dimX*dimY*dimZ;
+           float *d_input, *d_update, *d_old=NULL, *P1=NULL, *P2=NULL, *P3=NULL;
+
+           dim3 dimBlock(BLKXSIZE,BLKYSIZE,BLKZSIZE);
+           dim3 dimGrid(idivup(dimX,BLKXSIZE), idivup(dimY,BLKYSIZE),idivup(dimZ,BLKZSIZE));
+
+           /*allocate space for images on device*/
+           checkCudaErrors( cudaMalloc((void**)&d_input,ImSize*sizeof(float)) );
+           checkCudaErrors( cudaMalloc((void**)&d_update,ImSize*sizeof(float)) );
+           checkCudaErrors( cudaMalloc((void**)&d_old,ImSize*sizeof(float)) );
+           checkCudaErrors( cudaMalloc((void**)&P1,ImSize*sizeof(float)) );
+           checkCudaErrors( cudaMalloc((void**)&P2,ImSize*sizeof(float)) );
+           checkCudaErrors( cudaMalloc((void**)&P3,ImSize*sizeof(float)) );
+
+            checkCudaErrors( cudaMemcpy(d_input,Input,ImSize*sizeof(float),cudaMemcpyHostToDevice));
+            checkCudaErrors( cudaMemcpy(d_update,Input,ImSize*sizeof(float),cudaMemcpyHostToDevice));
+            cudaMemset(P1, 0, ImSize*sizeof(float));
+            cudaMemset(P2, 0, ImSize*sizeof(float));
+            cudaMemset(P3, 0, ImSize*sizeof(float));
+           /********************** Run CUDA 3D kernel here ********************/
+       for (i = 0; i < iter; i++) {
+
+         /* computing the the dual P variable */
+         dualPD3D_kernel<<<dimGrid,dimBlock>>>(d_update, P1, P2, P3, sigma, dimX, dimY, dimZ);
+         checkCudaErrors( cudaDeviceSynchronize() );
+         checkCudaErrors(cudaPeekAtLastError() );
+
+         if (nonneg != 0) {
+         PDnonneg3D_kernel<<<dimGrid,dimBlock>>>(d_update, dimX, dimY, dimZ, ImSize);
+         checkCudaErrors( cudaDeviceSynchronize() );
+         checkCudaErrors(cudaPeekAtLastError() ); }
+
+         /* projection step */
+         if (methodTV == 0) Proj_funcPD3D_iso_kernel<<<dimGrid,dimBlock>>>(P1, P2, P3, dimX, dimY, dimZ, ImSize); /*isotropic TV*/
+         else Proj_funcPD3D_aniso_kernel<<<dimGrid,dimBlock>>>(P1, P2, P3, dimX, dimY, dimZ, ImSize); /*anisotropic TV*/
+         checkCudaErrors( cudaDeviceSynchronize() );
+         checkCudaErrors(cudaPeekAtLastError() );
+
+         /* copy U to U_old */
+         PDcopy_kernel3D<<<dimGrid,dimBlock>>>(d_update, d_old, dimX, dimY, dimZ, ImSize);
+         checkCudaErrors( cudaDeviceSynchronize() );
+         checkCudaErrors(cudaPeekAtLastError() );
+
+         /* calculate divergence */
+         DivProj3D_kernel<<<dimGrid,dimBlock>>>(d_update, d_input, P1, P2, P3, lt, tau, dimX, dimY, dimZ);
+         checkCudaErrors( cudaDeviceSynchronize() );
+         checkCudaErrors(cudaPeekAtLastError() );
+
+         if ((epsil != 0.0f) && (i % 5 == 0)) {
+         /* calculate norm - stopping rules using the Thrust library */
+         PDResidCalc3D_kernel<<<dimGrid,dimBlock>>>(d_update, d_old, P1, dimX, dimY, dimZ, ImSize);
+         checkCudaErrors( cudaDeviceSynchronize() );
+         checkCudaErrors(cudaPeekAtLastError() );
+
+        // setup arguments
+         square<float>        unary_op;
+         thrust::plus<float> binary_op;
+         thrust::device_vector<float> d_vec(P1, P1 + ImSize);
+         float reduction = std::sqrt(thrust::transform_reduce(d_vec.begin(), d_vec.end(), unary_op, 0.0f, binary_op));
+         thrust::device_vector<float> d_vec2(d_update, d_update + ImSize);
+         float reduction2 = std::sqrt(thrust::transform_reduce(d_vec2.begin(), d_vec2.end(), unary_op, 0.0f, binary_op));
+
+           // compute norm
+           re = (reduction/reduction2);
+           if (re < epsil)  count++;
+           if (count > 3) break;
+           }
+
+           /* get U*/
+           getU3D_kernel<<<dimGrid,dimBlock>>>(d_update, d_old, theta, dimX, dimY, dimZ, ImSize);
+           checkCudaErrors( cudaDeviceSynchronize() );
+           checkCudaErrors(cudaPeekAtLastError() );
+         }
+           /***************************************************************/
+           //copy result matrix from device to host memory
+           cudaMemcpy(Output,d_update,ImSize*sizeof(float),cudaMemcpyDeviceToHost);
+
+           cudaFree(d_input);
+           cudaFree(d_update);
+           cudaFree(d_old);
+           cudaFree(P1);
+           cudaFree(P2);
+           cudaFree(P3);
+
+   }
+   //cudaDeviceReset();
+   /*adding info into info_vector */
+   infovector[0] = (float)(i);  /*iterations number (if stopped earlier based on tolerance)*/
+   infovector[1] = re;  /* reached tolerance */
+   return 0;
+}
diff --git a/src/Core/regularisers_GPU/TV_PD_GPU_core.h b/src/Core/regularisers_GPU/TV_PD_GPU_core.h
new file mode 100644
index 0000000..2b123d9
--- /dev/null
+++ b/src/Core/regularisers_GPU/TV_PD_GPU_core.h
@@ -0,0 +1,9 @@
+#ifndef _TV_PD_GPU_
+#define _TV_PD_GPU_
+
+#include "CCPiDefines.h"
+#include <memory.h>
+
+extern "C" CCPI_EXPORT int TV_PD_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iter, float epsil, float lipschitz_const, int methodTV, int nonneg, float tau, int dimX, int dimY, int dimZ);
+
+#endif
diff --git a/src/Matlab/mex_compile/compileCPU_mex_Linux.m b/src/Matlab/mex_compile/compileCPU_mex_Linux.m
index 2ed7ea6..5330d7f 100644
--- a/src/Matlab/mex_compile/compileCPU_mex_Linux.m
+++ b/src/Matlab/mex_compile/compileCPU_mex_Linux.m
@@ -28,6 +28,10 @@ movefile('FGP_TV.mex*',Pathmove);
 fprintf('%s \n', 'Compiling SB-TV...');
 mex SB_TV.c SB_TV_core.c utils.c CFLAGS="\$CFLAGS -fopenmp -Wall -std=c99" LDFLAGS="\$LDFLAGS -fopenmp"
 movefile('SB_TV.mex*',Pathmove);
+
+fprintf('%s \n', 'Compiling PD-TV...');
+mex PD_TV.c PD_TV_core.c utils.c CFLAGS="\$CFLAGS -fopenmp -Wall -std=c99" LDFLAGS="\$LDFLAGS -fopenmp"
+movefile('PD_TV.mex*',Pathmove);
  
 fprintf('%s \n', 'Compiling dFGP-TV...');
 mex FGP_dTV.c FGP_dTV_core.c utils.c CFLAGS="\$CFLAGS -fopenmp -Wall -std=c99" LDFLAGS="\$LDFLAGS -fopenmp"
@@ -75,7 +79,8 @@ movefile('NonlocalMarching_Inpaint.mex*',Pathmove);
 delete SB_TV_core* ROF_TV_core* FGP_TV_core* FGP_dTV_core* TNV_core* utils* Diffusion_core* Diffus4th_order_core* TGV_core* LLT_ROF_core* CCPiDefines.h
 delete PatchSelect_core* Nonlocal_TV_core*
 delete Diffusion_Inpaint_core* NonlocalMarching_Inpaint_core*
-fprintf('%s \n', '<<<<<<< Regularisers successfully compiled! >>>>>>>');
+delete PD_TV_core*
+fprintf('%s \n', '<<<<<<< CPU regularisers were successfully compiled! >>>>>>>');
 
 pathA2 = sprintf(['..' fsep '..' fsep '..' fsep '..' fsep 'demos' fsep 'Matlab_demos'], 1i);
 cd(pathA2);
 \ No newline at end of file
diff --git a/src/Matlab/mex_compile/compileGPU_mex.m b/src/Matlab/mex_compile/compileGPU_mex.m
index 56fcd38..577630f 100644
--- a/src/Matlab/mex_compile/compileGPU_mex.m
+++ b/src/Matlab/mex_compile/compileGPU_mex.m
@@ -41,6 +41,11 @@ fprintf('%s \n', 'Compiling SB-TV...');
 mex -g -I/usr/local/cuda-10.0/include -L/usr/local/cuda-10.0/lib64 -lcudart -lcufft -lmwgpu SB_TV_GPU.cpp TV_SB_GPU_core.o
 movefile('SB_TV_GPU.mex*',Pathmove);
 
+fprintf('%s \n', 'Compiling PD-TV...');
+!/usr/local/cuda/bin/nvcc -O0 -c TV_PD_GPU_core.cu -Xcompiler -fPIC -I~/SOFT/MATLAB9/extern/include/
+mex -g -I/usr/local/cuda-10.0/include -L/usr/local/cuda-10.0/lib64 -lcudart -lcufft -lmwgpu PD_TV_GPU.cpp TV_PD_GPU_core.o
+movefile('PD_TV_GPU.mex*',Pathmove);
+
 fprintf('%s \n', 'Compiling TGV...');
 !/usr/local/cuda/bin/nvcc -O0 -c TGV_GPU_core.cu -Xcompiler -fPIC -I~/SOFT/MATLAB9/extern/include/
 mex -g -I/usr/local/cuda-10.0/include -L/usr/local/cuda-10.0/lib64 -lcudart -lcufft -lmwgpu TGV_GPU.cpp TGV_GPU_core.o
@@ -72,6 +77,7 @@ mex -g -I/usr/local/cuda-10.0/include -L/usr/local/cuda-10.0/lib64 -lcudart -lcu
 movefile('PatchSelect_GPU.mex*',Pathmove);
 
 delete TV_ROF_GPU_core* TV_FGP_GPU_core* TV_SB_GPU_core* dTV_FGP_GPU_core* NonlDiff_GPU_core* Diffus_4thO_GPU_core* TGV_GPU_core* LLT_ROF_GPU_core* CCPiDefines.h
+delete TV_PD_GPU_core*
 delete PatchSelect_GPU_core* Nonlocal_TV_core* shared.h
 fprintf('%s \n', 'All successfully compiled!');
 
diff --git a/src/Matlab/mex_compile/regularisers_CPU/PD_TV.c b/src/Matlab/mex_compile/regularisers_CPU/PD_TV.c
new file mode 100644
index 0000000..e5ab1e4
--- /dev/null
+++ b/src/Matlab/mex_compile/regularisers_CPU/PD_TV.c
@@ -0,0 +1,100 @@
+/*
+ * This work is part of the Core Imaging Library developed by
+ * Visual Analytics and Imaging System Group of the Science Technology
+ * Facilities Council, STFC
+ *
+ * Copyright 2019 Daniil Kazantsev
+ * Copyright 2019 Srikanth Nagella, Edoardo Pasca
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "matrix.h"
+#include "mex.h"
+#include "PD_TV_core.h"
+
+/* C-OMP implementation of Primal-Dual TV [1] by Chambolle Pock denoising/regularization model (2D/3D case)
+ *
+ * Input Parameters:
+ * 1. Noisy image/volume
+ * 2. lambdaPar - regularization parameter
+ * 3. Number of iterations
+ * 4. eplsilon: tolerance constant
+ * 5. TV-type: methodTV - 'iso' (0) or 'l1' (1)
+ * 6. nonneg: 'nonnegativity (0 is OFF by default, 1 is ON)
+ * 7. lipschitz_const: convergence related parameter
+ * 8. tau: convergence related parameter
+ 
+ * Output:
+ * [1] TV - Filtered/regularized image/volume
+ * [2] Information vector which contains [iteration no., reached tolerance]
+ *
+ * [1] Antonin Chambolle, Thomas Pock. "A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging", 2010
+ */
+void mexFunction(
+        int nlhs, mxArray *plhs[],
+        int nrhs, const mxArray *prhs[])
+        
+{
+    int number_of_dims, iter, methTV, nonneg;
+    mwSize dimX, dimY, dimZ;
+    const mwSize *dim_array;
+    float *Input, *infovec=NULL, *Output=NULL, lambda, epsil, lipschitz_const, tau;
+    
+    number_of_dims = mxGetNumberOfDimensions(prhs[0]);
+    dim_array = mxGetDimensions(prhs[0]);
+    
+    /*Handling Matlab input data*/
+    if ((nrhs < 2) || (nrhs > 7)) mexErrMsgTxt("At least 2 parameters is required, all parameters are: Image(2D/3D), Regularization parameter, iterations number, tolerance, penalty type ('iso' or 'l1'), nonnegativity switch, lipschitz_const");
+    
+    Input  = (float *) mxGetData(prhs[0]); /*noisy image (2D/3D) */
+    lambda =  (float) mxGetScalar(prhs[1]); /* regularization parameter */
+    iter = 500; /* default iterations number */
+    epsil = 1.0e-06; /* default tolerance constant */
+    methTV = 0;  /* default isotropic TV penalty */
+    nonneg = 0; /* default nonnegativity switch, off - 0 */
+    lipschitz_const = 8.0; /* lipschitz_const */
+    tau = 0.0025; /* tau convergence const */
+    
+    if (mxGetClassID(prhs[0]) != mxSINGLE_CLASS) {mexErrMsgTxt("The input image must be in a single precision"); }
+    
+    if ((nrhs == 3) || (nrhs == 4) || (nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  iter = (int) mxGetScalar(prhs[2]); /* iterations number */
+    if ((nrhs == 4) || (nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  epsil =  (float) mxGetScalar(prhs[3]); /* tolerance constant */
+    if ((nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  {
+        char *penalty_type;
+        penalty_type = mxArrayToString(prhs[4]); /* choosing TV penalty: 'iso' or 'l1', 'iso' is the default */
+        if ((strcmp(penalty_type, "l1") != 0) && (strcmp(penalty_type, "iso") != 0)) mexErrMsgTxt("Choose TV type: 'iso' or 'l1',");
+        if (strcmp(penalty_type, "l1") == 0)  methTV = 1;  /* enable 'l1' penalty */
+        mxFree(penalty_type);
+    }
+    if ((nrhs == 6) || (nrhs == 7) || (nrhs == 8))  {
+        nonneg = (int) mxGetScalar(prhs[5]);
+        if ((nonneg != 0) && (nonneg != 1)) mexErrMsgTxt("Nonnegativity constraint can be enabled by choosing 1 or off - 0");
+    }
+    if ((nrhs == 7) || (nrhs == 8)) lipschitz_const = (float) mxGetScalar(prhs[6]);
+    if (nrhs == 8) tau = (float) mxGetScalar(prhs[7]);   
+    
+    /*Handling Matlab output data*/
+    dimX = dim_array[0]; dimY = dim_array[1]; dimZ = dim_array[2];    
+       
+    if (number_of_dims == 2) {
+        dimZ = 1; /*2D case*/
+        Output = (float*)mxGetPr(plhs[0] = mxCreateNumericArray(2, dim_array, mxSINGLE_CLASS, mxREAL));        
+    }
+    if (number_of_dims == 3) {    
+        Output = (float*)mxGetPr(plhs[0] = mxCreateNumericArray(3, dim_array, mxSINGLE_CLASS, mxREAL));
+    }    
+    mwSize vecdim[1];
+    vecdim[0] = 2;
+    infovec = (float*)mxGetPr(plhs[1] = mxCreateNumericArray(1, vecdim, mxSINGLE_CLASS, mxREAL));
+    
+    /* running the function */    
+    PDTV_CPU_main(Input, Output, infovec, lambda, iter,  epsil, lipschitz_const, methTV, nonneg, tau, dimX, dimY, dimZ);
+}
diff --git a/src/Matlab/mex_compile/regularisers_GPU/PD_TV_GPU.cpp b/src/Matlab/mex_compile/regularisers_GPU/PD_TV_GPU.cpp
new file mode 100644
index 0000000..e853dd3
--- /dev/null
+++ b/src/Matlab/mex_compile/regularisers_GPU/PD_TV_GPU.cpp
@@ -0,0 +1,101 @@
+/*
+ * This work is part of the Core Imaging Library developed by
+ * Visual Analytics and Imaging System Group of the Science Technology
+ * Facilities Council, STFC
+ *
+ * Copyright 2019 Daniil Kazantsev
+ * Copyright 2019 Srikanth Nagella, Edoardo Pasca
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "matrix.h"
+#include "mex.h"
+#include "TV_PD_GPU_core.h"
+
+/* GPU implementation of Primal-Dual TV [1] by Chambolle Pock denoising/regularization model (2D/3D case)
+ *
+ * Input Parameters:
+ * 1. Noisy image/volume
+ * 2. lambdaPar - regularization parameter
+ * 3. Number of iterations
+ * 4. eplsilon: tolerance constant
+ * 5. TV-type: methodTV - 'iso' (0) or 'l1' (1)
+ * 6. nonneg: 'nonnegativity (0 is OFF by default, 1 is ON)
+ * 7. lipschitz_const: convergence related parameter
+ * 8. tau: convergence related parameter
+ 
+ * Output:
+ * [1] TV - Filtered/regularized image/volume
+ * [2] Information vector which contains [iteration no., reached tolerance]
+ *
+ * [1] Antonin Chambolle, Thomas Pock. "A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging", 2010
+ */
+
+void mexFunction(
+        int nlhs, mxArray *plhs[],
+        int nrhs, const mxArray *prhs[])
+        
+{
+    int number_of_dims, iter, methTV, nonneg;
+    mwSize dimX, dimY, dimZ;
+    const mwSize *dim_array;
+    float *Input, *infovec=NULL, *Output=NULL, lambda, epsil, lipschitz_const, tau;
+    
+    number_of_dims = mxGetNumberOfDimensions(prhs[0]);
+    dim_array = mxGetDimensions(prhs[0]);
+    
+    /*Handling Matlab input data*/
+    if ((nrhs < 2) || (nrhs > 7)) mexErrMsgTxt("At least 2 parameters is required, all parameters are: Image(2D/3D), Regularization parameter, iterations number, tolerance, penalty type ('iso' or 'l1'), nonnegativity switch, lipschitz_const");
+    
+    Input  = (float *) mxGetData(prhs[0]); /*noisy image (2D/3D) */
+    lambda =  (float) mxGetScalar(prhs[1]); /* regularization parameter */
+    iter = 500; /* default iterations number */
+    epsil = 1.0e-06; /* default tolerance constant */
+    methTV = 0;  /* default isotropic TV penalty */
+    nonneg = 0; /* default nonnegativity switch, off - 0 */
+    lipschitz_const = 8.0; /* lipschitz_const */
+    tau = 0.0025; /* tau convergence const */
+    
+    if (mxGetClassID(prhs[0]) != mxSINGLE_CLASS) {mexErrMsgTxt("The input image must be in a single precision"); }
+    
+    if ((nrhs == 3) || (nrhs == 4) || (nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  iter = (int) mxGetScalar(prhs[2]); /* iterations number */
+    if ((nrhs == 4) || (nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  epsil =  (float) mxGetScalar(prhs[3]); /* tolerance constant */
+    if ((nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  {
+        char *penalty_type;
+        penalty_type = mxArrayToString(prhs[4]); /* choosing TV penalty: 'iso' or 'l1', 'iso' is the default */
+        if ((strcmp(penalty_type, "l1") != 0) && (strcmp(penalty_type, "iso") != 0)) mexErrMsgTxt("Choose TV type: 'iso' or 'l1',");
+        if (strcmp(penalty_type, "l1") == 0)  methTV = 1;  /* enable 'l1' penalty */
+        mxFree(penalty_type);
+    }
+    if ((nrhs == 6) || (nrhs == 7) || (nrhs == 8))  {
+        nonneg = (int) mxGetScalar(prhs[5]);
+        if ((nonneg != 0) && (nonneg != 1)) mexErrMsgTxt("Nonnegativity constraint can be enabled by choosing 1 or off - 0");
+    }
+    if ((nrhs == 7) || (nrhs == 8)) lipschitz_const = (float) mxGetScalar(prhs[6]);
+    if (nrhs == 8) tau = (float) mxGetScalar(prhs[7]);   
+    
+    /*Handling Matlab output data*/
+    dimX = dim_array[0]; dimY = dim_array[1]; dimZ = dim_array[2];    
+       
+    if (number_of_dims == 2) {
+        dimZ = 1; /*2D case*/
+        Output = (float*)mxGetPr(plhs[0] = mxCreateNumericArray(2, dim_array, mxSINGLE_CLASS, mxREAL));        
+    }
+    if (number_of_dims == 3) {    
+        Output = (float*)mxGetPr(plhs[0] = mxCreateNumericArray(3, dim_array, mxSINGLE_CLASS, mxREAL));
+    }    
+    mwSize vecdim[1];
+    vecdim[0] = 2;
+    infovec = (float*)mxGetPr(plhs[1] = mxCreateNumericArray(1, vecdim, mxSINGLE_CLASS, mxREAL));
+    
+    /* running the function */    
+    TV_PD_GPU_main(Input, Output, infovec, lambda, iter,  epsil, lipschitz_const, methTV, nonneg, tau, dimX, dimY, dimZ);
+}
diff --git a/src/Python/ccpi/filters/regularisers.py b/src/Python/ccpi/filters/regularisers.py
index 0b5b2ee..5f4001a 100644
--- a/src/Python/ccpi/filters/regularisers.py
+++ b/src/Python/ccpi/filters/regularisers.py
@@ -2,9 +2,9 @@
 script which assigns a proper device core function based on a flag ('cpu' or 'gpu')
 """
 
-from ccpi.filters.cpu_regularisers import TV_ROF_CPU, TV_FGP_CPU, TV_SB_CPU, dTV_FGP_CPU, TNV_CPU, NDF_CPU, Diff4th_CPU, TGV_CPU, LLT_ROF_CPU, PATCHSEL_CPU, NLTV_CPU
+from ccpi.filters.cpu_regularisers import TV_ROF_CPU, TV_FGP_CPU, TV_PD_CPU, TV_SB_CPU, dTV_FGP_CPU, TNV_CPU, NDF_CPU, Diff4th_CPU, TGV_CPU, LLT_ROF_CPU, PATCHSEL_CPU, NLTV_CPU
 try:
-    from ccpi.filters.gpu_regularisers import TV_ROF_GPU, TV_FGP_GPU, TV_SB_GPU, dTV_FGP_GPU, NDF_GPU, Diff4th_GPU, TGV_GPU, LLT_ROF_GPU, PATCHSEL_GPU
+    from ccpi.filters.gpu_regularisers import TV_ROF_GPU, TV_FGP_GPU, TV_PD_GPU, TV_SB_GPU, dTV_FGP_GPU, NDF_GPU, Diff4th_GPU, TGV_GPU, LLT_ROF_GPU, PATCHSEL_GPU
     gpu_enabled = True
 except ImportError:
     gpu_enabled = False
@@ -51,6 +51,33 @@ def FGP_TV(inputData, regularisation_parameter,iterations,
             raise ValueError ('GPU is not available')
         raise ValueError('Unknown device {0}. Expecting gpu or cpu'\
                          .format(device))
+
+def PD_TV(inputData, regularisation_parameter, iterations,
+                     tolerance_param, methodTV, nonneg, lipschitz_const, tau, device='cpu'):
+    if device == 'cpu':
+        return TV_PD_CPU(inputData,
+                     regularisation_parameter,
+                     iterations,
+                     tolerance_param,
+                     methodTV,
+                     nonneg,
+                     lipschitz_const,
+                     tau)
+    elif device == 'gpu' and gpu_enabled:
+        return TV_PD_GPU(inputData,
+                     regularisation_parameter,
+                     iterations,
+                     tolerance_param,
+                     methodTV,
+                     nonneg,
+                     lipschitz_const,
+                     tau)
+    else:
+        if not gpu_enabled and device == 'gpu':
+            raise ValueError ('GPU is not available')
+        raise ValueError('Unknown device {0}. Expecting gpu or cpu'\
+                         .format(device))
+
 def SB_TV(inputData, regularisation_parameter, iterations,
                      tolerance_param, methodTV, device='cpu'):
     if device == 'cpu':
@@ -212,4 +239,3 @@ def NDF_INP(inputData, maskData, regularisation_parameter, edge_parameter, itera
 
 def NVM_INP(inputData, maskData, SW_increment, iterations):
         return NVM_INPAINT_CPU(inputData, maskData, SW_increment, iterations)
-
diff --git a/src/Python/setup-regularisers.py.in b/src/Python/setup-regularisers.py.in
index 4c578e3..c4ad143 100644
--- a/src/Python/setup-regularisers.py.in
+++ b/src/Python/setup-regularisers.py.in
@@ -8,13 +8,13 @@ from Cython.Distutils import build_ext
 import os
 import sys
 import numpy
-import platform	
+import platform
 
 cil_version=os.environ['CIL_VERSION']
 if  cil_version == '':
     print("Please set the environmental variable CIL_VERSION")
     sys.exit(1)
-	
+
 library_include_path = ""
 library_lib_path = ""
 try:
@@ -23,7 +23,7 @@ try:
 except:
     library_include_path = os.environ['PREFIX']+'/include'
     pass
-    
+
 extra_include_dirs = [numpy.get_include(), library_include_path]
 #extra_library_dirs = [os.path.join(library_include_path, "..", "lib")]
 extra_compile_args = []
@@ -39,6 +39,7 @@ extra_include_dirs += [os.path.join(".." , "Core"),
                        os.path.join(".." , "Core",  "inpainters_CPU"),
                        os.path.join(".." , "Core",  "regularisers_GPU" , "TV_FGP" ) ,
                        os.path.join(".." , "Core",  "regularisers_GPU" , "TV_ROF" ) ,
+                       os.path.join(".." , "Core",  "regularisers_GPU" , "TV_PD" ) ,
                        os.path.join(".." , "Core",  "regularisers_GPU" , "TV_SB" ) ,
                        os.path.join(".." , "Core",  "regularisers_GPU" , "TGV" ) ,
                        os.path.join(".." , "Core",  "regularisers_GPU" , "LLTROF" ) ,
@@ -48,12 +49,12 @@ extra_include_dirs += [os.path.join(".." , "Core"),
                        os.path.join(".." , "Core",  "regularisers_GPU" , "PatchSelect" ) ,
 						   "."]
 
-if platform.system() == 'Windows':				   
-    extra_compile_args[0:] = ['/DWIN32','/EHsc','/DBOOST_ALL_NO_LIB' , '/openmp' ]   
+if platform.system() == 'Windows':
+    extra_compile_args[0:] = ['/DWIN32','/EHsc','/DBOOST_ALL_NO_LIB' , '/openmp' ]
 else:
     extra_compile_args = ['-fopenmp','-O2', '-funsigned-char', '-Wall', '-std=c++0x']
     extra_libraries += [@EXTRA_OMP_LIB@]
-    
+
 setup(
     name='ccpi',
 	description='CCPi Core Imaging Library - Image regularisers',
@@ -61,13 +62,13 @@ setup(
     cmdclass = {'build_ext': build_ext},
     ext_modules = [Extension("ccpi.filters.cpu_regularisers",
                              sources=[os.path.join("." , "src", "cpu_regularisers.pyx" ) ],
-                             include_dirs=extra_include_dirs, 
-							 library_dirs=extra_library_dirs, 
-							 extra_compile_args=extra_compile_args, 
-							 libraries=extra_libraries ), 
-    
+                             include_dirs=extra_include_dirs,
+							 library_dirs=extra_library_dirs,
+							 extra_compile_args=extra_compile_args,
+							 libraries=extra_libraries ),
+
     ],
-	zip_safe = False,	
+	zip_safe = False,
 	packages = {'ccpi', 'ccpi.filters', 'ccpi.supp'},
 )
 
diff --git a/src/Python/src/cpu_regularisers.pyx b/src/Python/src/cpu_regularisers.pyx
index 4917d06..8de6aea 100644
--- a/src/Python/src/cpu_regularisers.pyx
+++ b/src/Python/src/cpu_regularisers.pyx
@@ -20,6 +20,7 @@ cimport numpy as np
 
 cdef extern float TV_ROF_CPU_main(float *Input, float *Output, float *infovector, float *lambdaPar, int lambda_is_arr, int iterationsNumb, float tau, float epsil, int dimX, int dimY, int dimZ);
 cdef extern float TV_FGP_CPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iterationsNumb, float epsil, int methodTV, int nonneg, int dimX, int dimY, int dimZ);
+cdef extern float PDTV_CPU_main(float *Input, float *U, float *infovector, float lambdaPar, int iterationsNumb, float epsil, float lipschitz_const, int methodTV, int nonneg, float tau, int dimX, int dimY, int dimZ);
 cdef extern float SB_TV_CPU_main(float *Input, float *Output, float *infovector, float mu, int iter, float epsil, int methodTV, int dimX, int dimY, int dimZ);
 cdef extern float LLT_ROF_CPU_main(float *Input, float *Output, float *infovector, float lambdaROF, float lambdaLLT, int iterationsNumb, float tau, float epsil, int dimX, int dimY, int dimZ);
 cdef extern float TGV_main(float *Input, float *Output, float *infovector, float lambdaPar, float alpha1, float alpha0, int iterationsNumb, float L2, float epsil, int dimX, int dimY, int dimZ);
@@ -58,9 +59,6 @@ def TV_ROF_2D(np.ndarray[np.float32_t, ndim=2, mode="c"] inputData,
     cdef np.ndarray[np.float32_t, ndim=1, mode="c"] infovec = \
             np.ones([2], dtype='float32')
 
-    # Run ROF iterations for 2D data
-    # TV_ROF_CPU_main(&inputData[0,0], &outputData[0,0], &infovec[0], regularisation_parameter, iterationsNumb, marching_step_parameter, tolerance_param, dims[1], dims[0], 1)
-     # Run ROF iterations for 2D data
     if isinstance (regularisation_parameter, np.ndarray):
         reg = regularisation_parameter.copy()
         TV_ROF_CPU_main(&inputData[0,0], &outputData[0,0], &infovec[0], &reg[0,0],  1, iterationsNumb, marching_step_parameter, tolerance_param, dims[1], dims[0], 1)
@@ -158,6 +156,75 @@ def TV_FGP_3D(np.ndarray[np.float32_t, ndim=3, mode="c"] inputData,
                        dims[2], dims[1], dims[0])
     return (outputData,infovec)
 
+#****************************************************************#
+#****************** Total-variation Primal-dual *****************#
+#****************************************************************#
+def TV_PD_CPU(inputData, regularisation_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, lipschitz_const, tau):
+    if inputData.ndim == 2:
+        return TV_PD_2D(inputData, regularisation_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, lipschitz_const, tau)
+    elif inputData.ndim == 3:
+        return TV_PD_3D(inputData, regularisation_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, lipschitz_const, tau)
+
+def TV_PD_2D(np.ndarray[np.float32_t, ndim=2, mode="c"] inputData,
+                     float regularisation_parameter,
+                     int iterationsNumb,
+                     float tolerance_param,
+                     int methodTV,
+                     int nonneg,
+                     float lipschitz_const,
+                     float tau):
+
+    cdef long dims[2]
+    dims[0] = inputData.shape[0]
+    dims[1] = inputData.shape[1]
+
+    cdef np.ndarray[np.float32_t, ndim=2, mode="c"] outputData = \
+            np.zeros([dims[0],dims[1]], dtype='float32')
+
+    cdef np.ndarray[np.float32_t, ndim=1, mode="c"] infovec = \
+            np.ones([2], dtype='float32')
+
+    #/* Run FGP-TV iterations for 2D data */
+    PDTV_CPU_main(&inputData[0,0], &outputData[0,0], &infovec[0], regularisation_parameter,
+                       iterationsNumb,
+                       tolerance_param,
+                       lipschitz_const,
+                       methodTV,
+                       nonneg,
+                       tau,
+                       dims[1],dims[0], 1)
+    return (outputData,infovec)
+
+def TV_PD_3D(np.ndarray[np.float32_t, ndim=3, mode="c"] inputData,
+                     float regularisation_parameter,
+                     int iterationsNumb,
+                     float tolerance_param,
+                     int methodTV,
+                     int nonneg,                     
+                     float lipschitz_const,
+                     float tau):
+
+    cdef long dims[3]
+    dims[0] = inputData.shape[0]
+    dims[1] = inputData.shape[1]
+    dims[2] = inputData.shape[2]
+
+    cdef np.ndarray[np.float32_t, ndim=3, mode="c"] outputData = \
+            np.zeros([dims[0], dims[1], dims[2]], dtype='float32')
+    cdef np.ndarray[np.float32_t, ndim=1, mode="c"] infovec = \
+            np.zeros([2], dtype='float32')
+
+    #/* Run FGP-TV iterations for 3D data */
+    PDTV_CPU_main(&inputData[0,0,0], &outputData[0,0,0], &infovec[0], regularisation_parameter,
+                       iterationsNumb,
+                       tolerance_param,
+                       lipschitz_const,
+                       methodTV,
+                       nonneg,
+                       tau,
+                       dims[2], dims[1], dims[0])
+    return (outputData,infovec)
+
 #***************************************************************#
 #********************** Total-variation SB *********************#
 #***************************************************************#
diff --git a/src/Python/src/gpu_regularisers.pyx b/src/Python/src/gpu_regularisers.pyx
index 8cd8c93..b22d15e 100644
--- a/src/Python/src/gpu_regularisers.pyx
+++ b/src/Python/src/gpu_regularisers.pyx
@@ -22,6 +22,7 @@ CUDAErrorMessage = 'CUDA error'
 
 cdef extern int TV_ROF_GPU_main(float* Input, float* Output, float *infovector, float *lambdaPar, int lambda_is_arr, int iter, float tau, float epsil, int N, int M, int Z);
 cdef extern int TV_FGP_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iter, float epsil, int methodTV, int nonneg, int N, int M, int Z);
+cdef extern int TV_PD_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iter, float epsil, float lipschitz_const, int methodTV, int nonneg, float tau, int dimX, int dimY, int dimZ);
 cdef extern int TV_SB_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iter, float epsil, int methodTV, int N, int M, int Z);
 cdef extern int LLT_ROF_GPU_main(float *Input, float *Output, float *infovector, float lambdaROF, float lambdaLLT, int iterationsNumb, float tau,  float epsil, int N, int M, int Z);
 cdef extern int TGV_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, float alpha1, float alpha0, int iterationsNumb, float L2, float epsil, int dimX, int dimY, int dimZ);
@@ -70,6 +71,75 @@ def TV_FGP_GPU(inputData,
                      tolerance_param,
                      methodTV,
                      nonneg)
+# Total-variation Primal-Dual (PD)
+def TV_PD_GPU(inputData, regularisation_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, lipschitz_const, tau):
+    if inputData.ndim == 2:
+        return TVPD2D(inputData, regularisation_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, lipschitz_const, tau)
+    elif inputData.ndim == 3:
+        return TVPD3D(inputData, regularisation_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, lipschitz_const, tau)
+
+def TVPD2D(np.ndarray[np.float32_t, ndim=2, mode="c"] inputData,
+                     float regularisation_parameter,
+                     int iterationsNumb,
+                     float tolerance_param,
+                     int methodTV,
+                     int nonneg,
+                     float lipschitz_const,
+                     float tau):
+
+    cdef long dims[2]
+    dims[0] = inputData.shape[0]
+    dims[1] = inputData.shape[1]
+
+    cdef np.ndarray[np.float32_t, ndim=2, mode="c"] outputData = \
+            np.zeros([dims[0],dims[1]], dtype='float32')
+
+    cdef np.ndarray[np.float32_t, ndim=1, mode="c"] infovec = \
+            np.ones([2], dtype='float32')
+
+    if (TV_PD_GPU_main(&inputData[0,0], &outputData[0,0], &infovec[0], regularisation_parameter,
+                       iterationsNumb,
+                       tolerance_param,
+                       lipschitz_const,
+                       methodTV,
+                       nonneg,
+                       tau,
+                       dims[1],dims[0], 1) ==0):
+        return (outputData,infovec)
+    else:
+        raise ValueError(CUDAErrorMessage);
+
+def TVPD3D(np.ndarray[np.float32_t, ndim=3, mode="c"] inputData,
+                     float regularisation_parameter,
+                     int iterationsNumb,
+                     float tolerance_param,
+                     int methodTV,
+                     int nonneg,
+                     float lipschitz_const,
+                     float tau):
+
+    cdef long dims[3]
+    dims[0] = inputData.shape[0]
+    dims[1] = inputData.shape[1]
+    dims[2] = inputData.shape[2]
+
+    cdef np.ndarray[np.float32_t, ndim=3, mode="c"] outputData = \
+            np.zeros([dims[0], dims[1], dims[2]], dtype='float32')
+    cdef np.ndarray[np.float32_t, ndim=1, mode="c"] infovec = \
+            np.zeros([2], dtype='float32')
+
+    if (TV_PD_GPU_main(&inputData[0,0,0], &outputData[0,0,0], &infovec[0], regularisation_parameter,
+                       iterationsNumb,
+                       tolerance_param,
+                       lipschitz_const,
+                       methodTV,
+                       nonneg,
+                       tau,
+                       dims[2], dims[1], dims[0]) ==0):
+        return (outputData,infovec)
+    else:
+        raise ValueError(CUDAErrorMessage);
+
 # Total-variation Split Bregman (SB)
 def TV_SB_GPU(inputData,
                      regularisation_parameter,
@@ -195,7 +265,7 @@ def ROFTV2D(np.ndarray[np.float32_t, ndim=2, mode="c"] inputData,
     if isinstance (regularisation_parameter, np.ndarray):
         reg = regularisation_parameter.copy()
         # Running CUDA code here
-        if (TV_ROF_GPU_main(&inputData[0,0], &outputData[0,0], &infovec[0], 
+        if (TV_ROF_GPU_main(&inputData[0,0], &outputData[0,0], &infovec[0],
                        &reg[0,0], 1,
                        iterations,
                        time_marching_parameter,
diff --git a/test/test_CPU_regularisers.py b/test/test_CPU_regularisers.py
index 95e2a3f..266ca8a 100644
--- a/test/test_CPU_regularisers.py
+++ b/test/test_CPU_regularisers.py
@@ -3,7 +3,7 @@ import unittest
 import os
 #import timeit
 import numpy as np
-from ccpi.filters.regularisers import FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th, ROF_TV
+from ccpi.filters.regularisers import FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th, ROF_TV, PD_TV
 from testroutines import BinReader, rmse 
 ###############################################################################
 
@@ -39,6 +39,15 @@ class TestRegularisers(unittest.TestCase):
 
         self.assertAlmostEqual(rms,0.02,delta=0.01)
 
+    def test_PD_TV_CPU(self):
+        Im,input,ref = self.getPars()
+
+        pd_cpu,info = PD_TV(input, 0.02, 300, 0.0, 0, 0, 8, 0.0025, 'cpu');
+
+        rms = rmse(Im, pd_cpu)
+        
+        self.assertAlmostEqual(rms,0.02,delta=0.01)
+
     def test_TV_ROF_CPU(self):
         # set parameters
         Im, input,ref = self.getPars()
diff --git a/test/test_run_test.py b/test/test_run_test.py
index f27e7fc..1707aec 100755
--- a/test/test_run_test.py
+++ b/test/test_run_test.py
@@ -2,7 +2,7 @@ import unittest
 import numpy as np
 import os
 import timeit
-from ccpi.filters.regularisers import ROF_TV, FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th
+from ccpi.filters.regularisers import ROF_TV, FGP_TV, PD_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th
 #from PIL import Image
 from testroutines import BinReader, rmse, printParametersToString
 
@@ -164,6 +164,94 @@ class TestRegularisers(unittest.TestCase):
 
         self.assertLessEqual(diff_im.sum() , 1)
 
+    def test_PD_TV_CPU_vs_GPU(self):
+        print(__name__)
+        #filename = os.path.join("test","lena_gray_512.tif")
+        #plt = TiffReader()
+        filename = os.path.join("test","test_imageLena.bin")
+        plt = BinReader()
+        # read image
+        Im = plt.imread(filename)
+        Im = np.asarray(Im, dtype='float32')
+
+        Im = Im/255
+        perc = 0.05
+        u0 = Im + np.random.normal(loc = 0 ,
+                                          scale = perc * Im ,
+                                          size = np.shape(Im))
+        u_ref = Im + np.random.normal(loc = 0 ,
+                                          scale = 0.01 * Im ,
+                                          size = np.shape(Im))
+
+        # map the u0 u0->u0>0
+        # f = np.frompyfunc(lambda x: 0 if x < 0 else x, 1,1)
+        u0 = u0.astype('float32')
+        u_ref = u_ref.astype('float32')
+
+        print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+        print ("____________PD-TV bench___________________")
+        print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
+
+
+        pars = {'algorithm' : PD_TV, \
+                'input' : u0,\
+                'regularisation_parameter':0.02, \
+                'number_of_iterations' :1500 ,\
+                'tolerance_constant':0.0,\
+                'methodTV': 0 ,\
+                'nonneg': 0,
+                'lipschitz_const' : 8,
+                'tau' : 0.0025}
+                
+        print ("#############PD TV CPU####################")
+        start_time = timeit.default_timer()
+        (pd_cpu,info_vec_cpu) = PD_TV(pars['input'], 
+                      pars['regularisation_parameter'],
+                      pars['number_of_iterations'],
+                      pars['tolerance_constant'], 
+                      pars['methodTV'],
+                      pars['nonneg'],
+                      pars['lipschitz_const'],
+                      pars['tau'],'cpu')
+
+        rms = rmse(Im, pd_cpu)
+        pars['rmse'] = rms
+
+        txtstr = printParametersToString(pars)
+        txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time)
+        print (txtstr)
+
+        print ("##############PD TV GPU##################")
+        start_time = timeit.default_timer()
+        try:
+            (pd_gpu,info_vec_gpu) = PD_TV(pars['input'], 
+              pars['regularisation_parameter'],
+              pars['number_of_iterations'],
+              pars['tolerance_constant'], 
+              pars['methodTV'],
+              pars['nonneg'],
+              pars['lipschitz_const'],
+              pars['tau'],'gpu')
+
+        except ValueError as ve:
+            self.skipTest("Results not comparable. GPU computing error.")
+
+        rms = rmse(Im, pd_gpu)
+        pars['rmse'] = rms
+        pars['algorithm'] = PD_TV
+        txtstr = printParametersToString(pars)
+        txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time)
+        print (txtstr)
+
+        print ("--------Compare the results--------")
+        tolerance = 1e-05
+        diff_im = np.zeros(np.shape(pd_cpu))
+        diff_im = abs(pd_cpu - pd_gpu)
+        diff_im[diff_im > tolerance] = 1
+
+        self.assertLessEqual(diff_im.sum() , 1)
+
+
     def test_SB_TV_CPU_vs_GPU(self):
         print(__name__)
         #filename = os.path.join("test","lena_gray_512.tif")