Wed 7 Jun 2023 : new makefile_windows2 for 2022 Community Microsoft Visual
  Studio and NIVDIA CUDA Toolkit 12.1 on new computer.

Thu 15 Dec 2022 : to write_dbl_bstimeflops and write_dbl_qrtimeflops added
  the arithmetic intensity, updated dbl_{tabs, baqr}_testers.cpp.

Wed 14 Dec 2022 : new write_{dbl, dbl2, dbl4, dbl8}_bstimeflops, with code
  extracted from {dbl, dbl2, dbl4, dbl8}_tabs_testers.cpp.  Updated makefiles.
  New write_{dbl, dbl2, dbl4, dbl8}_qrtimeflops, with code extracted 
  from {dbl, dbl2, dbl4, dbl8}_baqr_testers.cpp.  Updated makefiles.

Tue 27 Sep 2022 : in {dbl2,dbl4,dbl8}_tabs_kernels.cu fixed the zero tests
  in the invert_tiles kernels, fixing the wrong last test on real data.
  In dbl8_tabs_kernels.cu, moved the __syncthreads() in the invert tiles
  on complex data to happen outside the if tests.

Mon 26 Sep 2022 : fixed bugs in dbl2_baqr_kernels.cu in case beta is zero,
  for the small_house kernels.  Applied similar updates to the analogue
  kernels in dbl4_baqr_kernels.cu and dbl8_baqr_kernels.cu.
  In {dbl2,dlb4,dbl8}_tabs_kernels.cu, omitted the last test in the
  invert_tiles on real data ...

Sun 25 sep 2022 : bug fix in dbl_tabs_kernels.cu for sparse matrices,
  added extra tests and initializations to deal with zero entries.
  Applied the same modifications to dbl2_tabs_kernels.cu,
  also for complex data and in {dbl4,dbl8}_tabs_kernels.cu as well.

Mon 19 Sep 2022 : in {dbl2,dbl4,dbl8}_factorizations.cpp, fixed an index
  error in the last function.

Mon 12 Sep 2022 : extended dbl8_factorizations with a function on complex data.

Wed 7 Sep 2022 : extended {dbl2,dbl4}_factorizations, with a function for
  the back substition after qr on complex data.

Tue 6 Sep 2022 : to dbl_factorizations, added function for back substitution
  after qr on complex data.

Sun 4 Sep 2022 : tested on zero beta in dbl2_baqr_kernels.cu.

Fri 2 Sep 2022 : added back substitution to dbl4_factorizations and
  to dbl8_factorizations.

Wed 31 Aug 2022 : fix type error in dbl_factorizations.h,
  added back substitution function to dbl2_factorizations.

Thu 18 Aug 2022 : bug fixed in dbl_baqr_kernels.cu for cases with zero beta,
  as generated by code added to dbl_baqr_testers.cpp.
  Fixed bugs in makefiles_* to build test_dbl_tabs.  Patched division 
  by zero in dbl_tabs_kernels.cu, update in dbl_tabs_testers.cpp.

Wed 10 Aug 2022 : to dbl_factorizations added a function to compute the
  least squares solution after a QR decomposition.

Fri 17 Dec 2021 : added the computation of the arithmetic intensity
  to {dbl,dbl2,dbl4,dbl8}_baqr_testers.cpp.

Wed 15 Dec 2021 : added the computation of the arithmetic intensity
  to {dbl,dbl2,dbl4,dbl8}_tabs_testers.cpp.

Fri 26 Nov 2021 : fixed bug in dbl8_baqr_kernels and splitted kernels
  used to update R.  The code on complex octo double data is now correct.

Wed 24 Nov 2021 : in dbl8_baqr_kernels, splitted the cmplx8_normalize
  into four different kernels.

Tue 16 Nov 2021 : in cmplx8_normalize of dbl8_baqr_kernels.cu, loaded the
  inverses for the multipliers into new local variables.

Fri 12 Nov 2021 : in cmplx8_normalize of dbl8_baqr_kernels.cu, eliminated
  many local variables.  Bug fix in call to odf_mlt_d in dbl8_baqr_kernels.cu.

Mon 8 Nov 2021 : in dbl8_baqr_kernels.cu, improved one kernel and defined 
  another kernels for complex data.  It works now for one tile.
  Splitted kernels in dbl8_baqr_kernels.h and dbl8_baqr_kernels.cu
  for real and imaginary parts, for RHdotv and medium_subvbetaRHv.

Sat 6 Nov 2021 : improved cmplx8_small_leftRupdate in dbl8_baqr_kernels.cu.
  Bug fixed in call to cmplx8_small_house kernel in dbl8_baqr_kernels.cu.

Thu 4 Nov 2021 : in dbl4_baqr_kernels, splitted the kernel for RHdotv in two,
  one for the real and another for the imaginary parts.
  Splitted the cmplx4_medium_subvbetaRHv in a real and imaginary parts kernel
  and updated the test_dbl4_testers.cpp.  It now works on all complex data.

Wed 3 Nov 2021 : corrected output of a complex Q in dbl4_baqr_kernels.cu.
  Removed four local variables from the cmplx4_sum_betaRHdotv kernel
  in dbl4_baqr_kernels.cu.

Sat 30 Oct 2021 : more improvements in dbl4_baqr_kernels.cu for complex data.
  Improved the documentation in dbl4_baqr_kernels.h.
  Reformatting of code and rewritings in dbl4_baqr_kernels.cu.

Fri 29 Oct 2021 : improved dbl4_baqr_kernels.cu so one kernel function
  on complex data uses 8 fewer local variables.  Applied the same type
  of improvement to three other kernel functions in dbl4_baqr_kernels.cu.

Wed 27 Oct 2021 : improvements in dbl4_baqr_kernels.cu to use fewer local
  variables in two functions on complex data.  Made similar improvements
  to dbl8_baqr_kernels.cu.

Mon 25 Oct 2021 : update dbl4_baqr_kernels.cu to use cqd_shmemsize in the
  cmplx4_ kernel functions.

Mon 11 Oct 2021 : updated dbl8_tabs_testers.cpp with the option to read
  the test matrix from file.  Added the computation of the kernel time
  flops to dbl_tabs_testers.cpp.

Sat 9 Oct 2021 : added return 0; at the end of the functions in
  make_data_files.cpp. 

Wed 6 Oct 2021 : bug fix in dbl8_tabs_kernels.cu.

Mon 4 Oct 2021 : bug fixed in dbl4_tabs_kernels.cu, with minor edit in
  dbl4_tabs_testers.cpp.

Sun 3 Oct 2021 : new dbl8_qrbs_testers and test_dbl8_qrbs,
  updated the makefiles.  Added to {dbl,dbl2,dbl4,dbl8}_qrbs_testers.cpp
  the computation of the total flops.

Sat 2 Oct 2021 : fixed a bug in dbl8_baqr_kernels.cu.
  New dbl4_qrbs_testers and test_dbl4_qrbs, updated the makefiles.

Fri 1 Oct 2021 : updated {dbl4,dbl8}_tabs_testers.cpp with the option to
  read in the test matrix.  Updated the makefiles.
  In {dbl4,dbl8}_baqr_kernels.cu, initialized v properly in the function
  to call the kernels for large Householder vector computations.
  Fixed the prototype of large_sum_of_squares in {dbl4,dbl8}_baqr_kernels.

Thu 30 Sep 2021 : completed a first version of the kernels on complex data
  in dbl8_baqr_kernels.cu.

Wed 29 Sep 2021 : in makefile_unix, added the -O3 flag to all nvcc calls.
  To dbl8_baqr_kernels.cu, added all functions that call the kernels on
  complex data.  Updated the flop counts in dbl8_baqr_testers.cpp and
  fixed the calls to the GPU version of the blocked QR on complex data.

Tue 28 Sep 2021 : fixed the initialization of Q in dbl8_baqr_kernels.cu.

Mon 27 Sep 2021 : completed a first version in dbl8_baqr_kernels.cu, of
  the kernels on real data.  Update dbl8_baqr_testers.cpp and makefile_unix.

Sun 26 Sep 2021 : added functions for complex data in dbl_data_files
  and to make_data_files.cpp.  Updated dbl4_tabs_testers.cpp so complex
  matrices can be read from file.  To dbl8_baqr_kernels.cu, defined the
  functions that call the kernels on real data.

Sat 25 Sep 2021 : new dbl_data_files and make_data_files to write random
  upper triangular matrices to file, updated the makefiles.
  Extended dbl_tabs_flopcounts and dbl4_tabs_testers to try to deal with
  the overflow of the flop counts, updated the makefiles again.
  Fixed name of a function in dbl_data_files.

Fri 24 Sep 2021 : moved flopcounts from dbl2_baqr_kernels into
  dbl_baqr_flopcounts.  Did the same for dbl4_baqr_kernels.
  Fixed many prototypes in dbl8_baqr_kernels.h, defined one function
  in dbl8_baqr_kernels.cu.

Thu 23 Sep 2021 : new dbl8_baqr_kernels.h with the prototypes for the
  kernels to accelerated the blocked Householder QR in octo double precision.

Wed 22 Sep 2021 : removed superfluous __syncthreads() in dbl4_tabs_kernels.cu,
  minor edits in the document of dbl4_tabs_kernels.h.
  Improved dbl4_tabs_kernels.cu and dbl8_tabs_kernels.cu.

Mon 20 Sep 2021 : new dbl8_baqr_testers and test_dbl8_baqr.cpp,
  updated the makefiles.

Sun 19 Sep 2021 : fixed two bugs in dbl8_tabs_kernels.cu so inverting small
  complex matrices is correct.  Fixed another two bugs in dbl8_tabs_kernels.cu
  so complex matrices of up to dimension 169 are inverted correctly.
  Fixed last(!) bug in dbl8_tabs_kernels.cu and generated complex upper
  triangular matrices better in dbl8_tabs_testers.cpp.

Sat 18 Sep 2021 : fixed dbl8_tabs_testers.cpp by copying A into A_d before 
  the CPU computes the inverse.  Generated the random upper triangular matrix
  to test tiling in double precision in dbl8_tabs_testers.cpp; updated
  the makefiles.  Adjusted the operation counts in dbl8_tabs_testers.cpp.

Fri 17 Sep 2021 : in dbl4_tabs_testers.cpp, generated the upper triangular
  test matrices in double precision.  Fixed the bound on the shared memory
  size in dbl8_tabs_kernels.  To dbl8_tabs_kernels.cu, added many
  __syncthreads() statements ...

Thu 16 Sep 2021 : completed dbl8_tabs_kernels, updated makefiles.
  Updated dbl8_tabs_testers.cpp with the calls to the tiled kernels.

Wed 15 Sep 2021 : extended dbl8_tabs_kernels with the kernel to invert one
  tile of complex numbers, extended dbl8_tabs_testers.cpp.
  In dbl2_tabs_testers.cpp, generated the random matrices for the tiling
  in double precision, updated the makefiles.

Tue 14 Sep 2021 : new dbl8_test_utilities for better tests on the tiled
  back substitution, applied in dbl8_tabs_testers.cpp.  Updated makefiles.
  Fixed bugs in dbl8_tabs_host.cpp and dbl8_tabs_testers.cpp.
  New dbl8_tabs_kernels, updated the makefiles.

Mon 13 Sep 2021 : new dbl8_tabs_host with code for a tiled back substitution
  in octo double precision on the host.  Added basic tests in the new
  dbl8_tabs_testers, called by test_dbl8_tabs.cpp, updated the makefiles.

Sun 12 Sep 2021 : completed dbl8_factors_testers.cpp and test_dbl8_factors.cpp
  with tests on the Householder QR decomposition.
  Corrected dbl8_factors_testers.cpp.

Sat 11 Sep 2021 : fixed bugs in dbl8_factorizations.cpp and
  dbl8_factors_testers.cpp so the complex LU solver works.
  Completed the code for the Householder QR in dbl8_factorizations.

Fri 10 Sep 2021 : new dbl8_factorizations, dbl8_factors_testers,
  and test_dbl8_factors.cpp, updated the makefiles.

Thu 9 Sep 2021 : completed the corrections in the documentation of
  dbl4_factorizations.h.

Wed 8 Sep 2021 : edited the documentation in dbl4_factorizations.h.

Tue 7 Sep 2021 : small edits in the documentation of random4_matrices.h,
  added the new file random8_matrices.h with the specifications of functions
  to generate random matrices of octo doubles.

Wed 1 Sep 2021 : added __syncthreads() statements in dbl4_baqr_kernels.cu.

Tue 31 Aug 2021 : corrected errors with allocation and deallocation
  in dbl4_baqr_kernels.cu.  Fixed another error in dbl4_baqr_kernels.cu.
  Removed many double free() in dbl4_baqr_kernels.cu.

Mon 30 Aug 2021 : completed the code transformation in dbl4_baqr_kernels.
  Added calls to the GPU code in dbl4_baqr_testers.cpp, and updated 
  the makefiles for unix and windows.

Sun 29 Aug 2021 : fixed more errors in dbl4_baqr_kernels.h and defined
  all kernel wrappers in dbl4_baqr_kernels.cu.  Defined several kernels
  in dbl4_baqr_kernels.cu.

Sat 28 Aug 2021 : fixed errors in dbl4_baqr_kernels.h and defined the
  two main functions in dbl4_baqr_kernels.cu.

Fri 27 Aug 2021 : defined the specifications of the kernels for the
  blocked Householder QR in quad double precison, in dbl4_baqr_kernels.h.

Thu 26 Aug 2021 : new dbl4_baqr_testers and test_dbl4_baqr.cpp.
  Fixed bugs in dbl4_baqr_host.cpp and dbl4_factors_testers.cpp.
  Updated the makefiles.  Bug fixed in dbl4_baqr_host.cpp so now
  also the complex case works correctly.

Wed 25 Aug 2021 : added dbl4_baqr_host, a blocked Householder QR on the host.

Tue 24 Aug 2021 : updated dbl4_tabs_testers.cpp with the calls to the GPU
  code, applied some modifications to dbl4_tabs_kernels.cu.
  Corrected prototype of dbl4_back_substitute in dbl4_tabs_kernels.

Mon 23 Aug 2021 : new dbl4_{test_utilities,tabs_host,tabs_testers} and
  test_dbl4_tabs.cpp for a tiled back substitution with quad doubles.
  Updated the makefiles.  New dbl4_tabs_kernels, updated makefiles.

Sun 22 Aug 2021 : fixed two errors in random4_matrices.cpp.
  Fixed compilation errors with dbl4_factorizations.
  New dbl4_factors_testers and test_dbl4_factors.cpp.  Updated the makefiles.
  Fixed dbl4_factorizations.cpp and dbl4_factors_testers.cpp.

Sat 21 Aug 2021 : new random4_matrices to generate real and complex random
   matrices of quad doubles.  Minor edit in dbl2_factorizations.h.
   New dbl4_factorizations to factor matrices in quad double precision.

Fri 20 Aug 2021 : renamed the bound on the shared memory size in
  dbl_tabs_kernels and applied the GPU tiled back substitution in
  dbl_qrbs_testers.cpp, updated the makefile_unix.
  New gettimeofday4win.h and gettimeofday4win.cpp for separate compilation
  to avoid redefinitions in dbl_baqr_kernels.cu and dbl_tabs_kernels.cu.
  Updated makefile_windows.  Modified dbl_qrbs_testers.cpp.
  Updated dbl2_baqr_kernels.cu and dbl2_tabs_kernels.cu for the new
  gettimeofday4win.  Renamed dd_shmemsize in dbl2_tabs_kernels.h.
  Updated dbl2_qrbs_testers.cpp to work on complex data,
  updated makefile_unix and makefile_windows.

Thu 19 Aug 2021 : edited dbl_baqr_testers.h.  New dbl_qrbs_testers and
  test_dbl_qrbs.cpp, to test QR + back substitution on the CPU in double
  precision, with updated makefiles.  New dbl2_test_utilities with code
  extracted from dbl2_tabs_testers.  To test QR + back substitution in
  double double precision, added dbl2_qrbs_testers and test_dbl2_qrbs.cpp,
  updated the makefiles.

Wed 18 Aug 2021 : to {dbl,dbl2}_baqr_testers.cpp, added total of all times
  spent by the kernels and computed also the "kernel time flops."

Tue 17 Aug 2021 : added flopcounts to dbl2_tabs_kernels, updated 
  dbl2_tabs_testers.cpp and the makefiles.  To dbl_tabs_testers.cpp,
  added the computation of the "kernel time flops."

Mon 16 Aug 2021 : added timers for each different kernel to
  {dbl,dbl2}_tabs_kernels, updated {dbl,dbl2}_tabs_testers.cpp.
  Added flopcount functions to dbl_tabs_kernels, updated dbl_tabs_testers.cpp
  to report those counts along with the flops.
  Moved the flopcount functions from dbl_tabs_kernels into the new
  dbl_tabs_flopcounts.

Sun 15 Aug 2021 : improved dbl_tabs_testers.cpp and test_dbl_tabs.cpp,
  with improved output messages on kind of tests and error types.
  Improved dbl2_tabs_testers.cpp and test_dbl2_tabs.cpp similarly.

Sat 14 Aug 2021 : added probing testers to dbl_factors_testers.cpp,
  applied in dbl_baqr_testers.cpp.

Fri 13 Aug 2021 : to dbl_baqr_testers.cpp and dbl2_baqr_testers.cpp,
  added the explicit computation of the flops at the end.

Thu 12 Aug 2021 : new dbl_baqr_flopcounts, with code factored out from
  the dbl_baqr_kernels.  Updated the makefiles.
  In dbl_baqr_kernels.cu, initialized the global flop counts properly.
  In dbl_baqr_kernels.h, ensured "long long int" instead of "long int"
  is used for all flop counters.  Improved the flopcount parameters in
  dbl_baqr_kernels and added flopcount parameters to dbl2_baqr_kernels.
  Added flopcount functions to dbl2_baqr_kernels, updated makefiles
  and dbl2_baqr_testers.cpp.  Fixed dbl2_factor_testers.cpp.

Wed 11 Aug 2021 : fixed bugs in dbl2_baqr_kernels.cu, added extra cout
  in dbl2_baqr_host.cpp.  Added the "-> Testing ..." banner after the GPU
  computation in dbl*_baqr_testers.cpp.  Extended dbl2_baqr_kernels
  with kernels for Householder vectors of larger, complex matrices.
  To dbl2_factors_testers, added test functions with random index probes,
  as more efficient alternatives to testing the complete matrix.
  Applied in dbl2_baqr_testers.cpp.

Tue 10 Aug 2021 : added new kernels to dbl2_baqr_kernels to compute the
  Householder vectors for larger matrices.  Added the value of the dimension
  to a kernel in dbl2_baqr_kernels, to fix the sum of squares computation.
  Modified the computation of the Householder vector for large matrices.

Mon 9 Aug 2021 : fixed the RTdotv kernel in dbl_baqr_kernels.cu.
  Added flopcount functions in dbl_baqr_kernels.  Added analogous kernels 
  and flopcounts to dbl_baqr_kernels for complex data.
  Used prompt_baqr_setup in test_dbl2_baqr.cpp, improved dbl2_baqr_testers,
  updated the makefiles.  Added better kernels to dbl2_baqr_kernels,
  and update dbl2_baqr_testers.cpp for the extra time variable.

Sun 8 Aug 2021 : added separate value to time the beta*R^T*v kernel
  in dbl_baqr_kernels, updated dbl_baqr_testers.cpp.  In dbl_baqr_kernels,
  splitted the kernel to compute beta*R^T*v in two separate kernels.

Sat 7 Aug 2021 : new prompt_baqr_setup to set the parameters of the tests,
  updated dbl_baqr_testers, test_dbl_baqr.cpp, and the makefiles.
  Completed taking mode into account in dbl_baqr_testers.cpp.
  Added all flopcounts for complex data in dbl_baqr_kernels and extended
  dbl_baqr_testers.cpp.  In dbl_baqr_testers.cpp, dbl_baqr_kernels,
  replaced the "long int" by "long long int" as needed on windows
  for accurate flop counts.

Fri 6 Aug 2021 : first version of improved kernels to compute W on complex
  data in dbl2_baqr_kernels.  Bug fixed in dbl2_baqr_kernels.cu.
  Updated the flopcounts in dbl_baqr_kernels, for real data.

Thu 5 Aug 2021 : fixed call to _small_leftRupdate in dbl_baqr_kernels.cu
  so it now works for all dimensions.  Added first versions of kernels to
  compute the W matrix better for complex data.  Applied Hermitian
  transpose to V instead of to W in dbl_baqr_kernels.cu, which fixed
  the complex case for square matrices.  Fixed the naming of the functions
  in dbl_baqr_kernels, no longer calling the separate kernel for WYT.
  Improved the computation of W in dbl2_baqr_kernels, on real data.

Wed 4 Aug 2021 : added the count of calls to sqrt() in dbl_baqr_kernels,
  updated dbl_baqr_testers.cpp.  Increased the shared memory limits for
  real data in {dbl,dbl2}_baqr_kernels.  Edit in test_dbl_baqr_testers.cpp;
  added new kernels to dbl_baqr_kernels to compute the W matrix better.
  Corrected dbl_baqr_kernels.cu, taking into account the order of V and W
  in the computation of W*Y^T ...

Tue 3 Aug 2021 : to dbl_baqr_kernels, added functions to accumulate the
  number of floating-point operations for each kernel.  Updated the
  dbl_baqr_testers.cpp, for the extra arguments in the GPU_dbl function.

Mon 2 Aug 2021 : in dbl_baqr_kernels.cu, reduced the number of threads in
  the small update of R kernel.  Used the small kernel for w = beta*R^T*v
  in dbl_baqr_kernels.cu, to then use w in the medium leftRupdate kernel.
  Rewrote the leftRupdate kernel with multiple blocks of threads in
  dbl_baqr_kernels.cu.  In dbl_baqr_kernels, added kernels to reduce the 
  first tile more efficiently on complex data.  Improved the leftRupdate to
  work with multiple blocks in dbl2_baqr_kernels, updated test_dbl2_baqr.cpp.

Sun 1 Aug 2021 : removed ad hoc output statements in dbl2_factorizations.cpp,
  and from dbl2_baqr_kernels, fixed cmplx2_VB_to_W kernel, added printing
  of timers in dbl2_baqr_testers.cpp.  In dbl_baqr_kernels.cu, added the
  padding also in the allocation of Q and QWYH, to obtain the correct Q
  for a 129-by-128 matrix in one tile of size 128.
  Added the padding for Q and QWYT also in dbl2_baqr_kernels.cu.
  In dbl_baqr_testers.cpp and dbl2_baqr_testers.cpp, raised the
  tolerances to declare success or failure.  Added a verbose option
  to the leftRupdate in dbl_factorizations, updated dbl_baqr_host.cpp.
  To dbl_baqr_kernels, added two new kernels to attempt to reduce
  with multiple blocks of threads.

Sat 31 Jul 2021 : defined all complex kernels in dbl2_baqr_kernels
  and extended dbl2_baqr_testers.cpp.  Added output statements to
  dbl2_factorizations.cpp for debugging, extended dbl2_baqr_kernels 
  and dbl2_baqr_testers.cpp; swap tests in test_dbl2_baqr.cpp.

Fri 30 Jul 2021 : adding a __syncthreads() in the kernel to compute W
  in dbl2_baqr_kernels.cu improved the accuracy dramatically.
  Added __syncthreads() in dbl_baqr_kernels.cu in the kernel to compute W.
  Some edits in dbl_baqr_kernels, fixed a print in dbl_baqr_kernels.cu,
  added stubs for all complex kernels in dbl2_baqr_kernels.

Thu 29 Jul 2021 : added verbose options and messages to dbl2_baqr_host,
  fixed the kernel to compute the Householder vector in dbl2_baqr_kernels.cu.
  In dbl2_baqr_kernels.cu, fixed the leftRupdate kernel.
  Added more print messages to dbl2_baqr_host.cpp and fixed the call to
  the kernel to compute QWYT in dbl2_baqr_kernels.cu.

Wed 28 Jul 2021 : in dbl_baqr_kernels.cu, fixed the printing of matrices
  of real and imaginary parts of complex numbers.  Fixed the W*Y^H
  in dbl_baqr_kernels.cu.  In dbl_baqr_kernels.cu, fixed the computation
  of the W matrix for complex inputs.  New dbl2_baqr_kernels with stubs
  for the kernels in double double precision.  To dbl2_baqr_kernels,
  added kernels, edited dbl_baqr_kernels.h, updated dbl2_baqr_testers.cpp
  and test_dbl2_baqr.cpp.

Tue 27 Jul 2021 : added stubs to dbl_baqr_kernels, to prepare for the complex
  versions of the accelerated blocked Householder QR decomposition.
  To dbl_baqr_kernels, added stubs to all kernels on complex data.
  In dbl_baqr_host.cpp, fixed bug in printing V and W matrices.
  Updated dbl_baqr_testers.cpp, added more prints in dbl_baqr_host.cpp,
  provided kernels to dbl_baqr_kernels on complex data.

Mon 26 Jul 2021 : fixed the size of matrix V in dbl_baqr_kernels.cu.
  Passed rowdim in the update of Q in dbl_baqr_kernels.
  In dbl_baqr_kernels.cu, applied the correct dimensions of the V matrix
  in case of multiple tiles so it now works for 2 square tiles.
  Updated dbl_baqr_kernels.h and dbl_baqr_kernels.cu with an additional
  input parameters, which makes it work for all dimensions.

Sun 25 Jul 2021 : more messages in dbl_baqr_host.cpp and dbl_baqr_kernels.cu.
  Defined the case when nrows1 == 0 in dbl_baqr_kernels.cu.
  In dbl_baqr_kernels.cu, fixed dbl_small_leftRupdate with endcol index.
  Fixed the dimensions of V and W in dbl_baqr_kernels.cu.

Sat 24 Jul 2021 : improved the tests in dbl_factors_testers, updated the
  calls in dbl_baqr_testers.cpp.  Likewise, improved dbl2_factors_testers
  and updated dbl2_baqr_testers.cpp.  In dbl_baqr_kernels.cu, fixed the
  bugs so only the first tile is reduced and not the next tiles.
  In dbl_baqr_host.cpp, made the labels in the output of the matrices
  consistent with the output labels in dbl_baqr_kernels.cu.

Fri 23 Jul 2021 : eliminated the x0_d as a separate variable on the device
  in dbl_baqr_kernels, fixing a bug when verbose is set to false.

Thu 22 Jul 2021 : the kernel for W*Y^T also works for Y*W^T, updated
  dbl_baqr_kernels and dbl_baqr_testers.cpp.  First version of a kernel to
  multiply W*Y^T with C in dbl_baqr_kernels, updated dbl_baqr_testers.cpp.
  Added extra print of the reduced matrix in dbl_baqr_host.cpp,
  fixed many problems with the latest kernel in dbl_baqr_kernels.cu.
  Updated dbl_baqr_kernels.cu to fix the WYTC kernel.  Added the kernel
  to update R to dbl_baqr_kernels, updated dbl_baqr_testers.cpp.

Wed 21 Jul 2021 : kernels to compute W*Y^T in dbl_baqr_kernels.cu works for
  one square tile.  Update dbl_baqr_kernels.cu to compute W*Y^T for tiles
  with more rows than columns, with extra memory padding.
  Improved offset and dimension computation in {dbl,dbl2}_baqr_host.cpp.
  To dbl_baqr_kernels, added a kernel to compute Q*W*Y^T.
  Added the update to Q kernel to dbl_baqr_kernels, with tests added
  to dbl_baqr_testers.cpp show that it works for one square tile.
  Fixed dbl_baqr_kernels.cu so it works for any single tile.

Tue 20 Jul 2021 : fixed the W kernel in dbl_baqr_kernels.cu, with corrected
  printing of V and W in dbl_baqr_host.cpp, when verbose, for nrows > ncols.
  Adjusted printing of elapsed time in dbl_baqr.testers.cpp.
  Added new functions to dbl_baqr_kernels, to wrap the kernels.
  To dbl_baqr_kernels, added first version of a kernel to compute W*Y^T,
  updated dbl_baqr_testers.cpp.

Mon 19 Jul 2021 : in {dbl,dbl2}_factorizations.cpp, fixed the loop end to
  not skip the very last step for matrices with more rows than columns.
  Fixed a kernel in dbl_baqr_kernels.cu so it works for one tile, also
  with more rows than columns.  First version of a kernel to compute W
  added to dbl_baqr_kernels, updated dbl_baqr_testers.cpp.

Sun 18 Jul 2021 : new kernel to update one tile by one block of threads.

Sat 17 Jul 2021 : adjusted the column loop counter in dbl_factorizations.cpp
  to skip the very last step in the reduction to upper triangular form.

Fri 16 Jul 2021 : added messages to {dbl,dbl2}_tabs_testers.cpp to announce
  the separate computational stages.  Added second kernel to reduce a
  square tile to an upper triangular one.  To dbl_baqr_kernels, added
  an extra output variable to record time spent on separate kernels.
  Updated dbl_baqr_testers.cpp.

Thu 15 Jul 2021 : a first kernel in dbl_baqr_kernels, updated makefiles.
  Tested and corrected the kernel to compute the Householder vector in
  dbl_baqr_kernels, updated dbl_baqr_testers.cpp and dbl_baqr_host.cpp.

Wed 14 Jul 2021 : factored out test functions in dbl_factors_testers, for
  use in dbl_baqr_testers.cpp.  Update makefile_unix and makefile_windows.
  Factored out test functions in dbl2_factors_testers.
  Fixed formatting in dbl2_factors_testers.cpp.  New dbl2_baqr_host,
  dbl2_baqr_testers, test_dbl2_baqr.cpp, with updated makefiles.
  Fixed specification of the complex house function in dbl2_factorizations.h.
  Edited function head in dbl2_factorizations.cpp.
  Added complex blocked Householder QR to dbl2_baqr_host, dbl2_baqr_testers,
  and test_dbl2_baqr.cpp.  Added to {dbl,dbl2}_baqr_host,
  updated {dbl,dbl2}_baqr_testers.cpp.

Tue 13 Jul 2021 : added blocked Householder QR to dbl_baqr_host,
  with updates in dbl_baqr_testers.cpp.  Added complex versions
  to dbl_baqr_{host, testers} and updated test_dbl_baqr.cpp.

Mon 12 Jul 2021 : factored out test function from dbl_tabs_testers into the
  new dbl_test_utilities, for use in dbl_baqr_testers, called by
  test_dbl_baqr, updated the makefiles.  New dbl_bqr_host, tested by
  dbl_baqr_testers.cpp, with updated makefiles.

Sun 11 Jul 2021 : added timers to dbl*_tabs_{host,kernels},
  and dbl*_tabs_testers.cpp, modifying 10 files.

Sat 10 Jul 2021 : small edits in dbl_factorizations.{h,cpp}.
  Extended dbl2_factorizations, dbl2_factors_testers, test_dbl2_factors
  with a complex Householder QR in double double precision.
  Added verbose level to {dbl,dbl2}_factors_testers.cpp.

Fri 9 Jul 2021 : extended dbl_factorizations with a Householder QR,
  extended dbl_factors_testers and test_dbl_factors.  Added Householder QR
  to dbl2_factorizations, dbl2_factors_testers, test_dbl2_factors.
  To dbl_factorizations, dbl_factors_testers and test_dbl_factors,
  added a Householder QR for complex inputs.

Thu 8 Jul 2021 : fixed bug in dbl2_factorizations.cpp.
  In dbl_tabs_host.cpp, used the back substitution of dbl_factorizations,
  instead of the dbl_matrices_host; updated the makefiles.
  Extended test_dbl2_tabs, random2_matrices, dbl2_tabs_host, and
  dbl2_tabs_testers with the inverse of one upper triangular matrix.
  Extended dbl2_tabs_kernels and dbl2_tabs_testers with functions for
  complex inverses of small and medium sized upper triangular matrices.
  Added tiled version to dbl2_tabs_host, extended dbl2_tabs_testers,
  and test_dbl2_tabs.cpp.  Updated dbl2_tabs_kernels and dbl2_tabs_testers
  for complex double double upper triangular matrices.

Wed 7 Jul 2021 : improved dbl_tabs_host.cpp and dbl2_tabs_host.cpp not to
  use the matrix-matrix multiplication with the inverse diagonal tile.
  Added writing of inverse diagonal tiles to dbl2_tabs_host.cpp, added
  code for inverse to dbl2_tabs_kernels, and tests to dbl2_tabs_testers.
  Bug fixed in CPU_cmplx_upper_lead_solver of dbl_matrices_host.cpp.
  Extended *dbl_factor* with complex versions of the LU solvers.
  Added function for complex matrices to random_matrices.
  Extended dbl_tabs_host, dbl_tabs_kernels, dbl_tabs_testers, and
  test_dbl_tabs.cpp with a direct method of the inverse of an upper
  triangular complex matrix.  Added a complex tiled upper triangular inverse
  to dbl_tabs_host, dbl_tabs_kernels, dbl_tabs_testers, with an update to
  test_dbl_tabs.cpp.  Extended random2_matrices, dbl2_factorizations,
  dbl2_factors_testers and test_dbl2_factors with complex versions.

Tue 6 Jul 2021 : to dbl_tabs_kernels, added code to invert the diagonal tiles,
  compared in dbl_tabs_testers with the result on the host, updated the code
  in dbl_tabs_host to have the diagonal tiles returned.
  To dbl_tabs_kernels added multiplication with the inverse,
  tested by dbl_tabs_testers.cpp.
  Added back substitution kernels to dbl_tabs_kernels.

Mon 5 Jul 2021 : new dbl_factorizations, dbl_factors_testers,
  test_dbl_factors, with an update to random_matrices; updated makefiles.
  Applied the LU factorization to generate a random upper triangular matrix,
  in the dbl_tabs_testers; updated makefile_unix and makefile_windows.
  New dbl2_factorizations, dbl2_factors_testers, test_dbl2_factors,
  updated random2_matrices and the makefiles.  Applied the LU factorization
  in dbl2_tabs_testers and updated the makefiles.
  Updated dbl_tabs_testers.cpp so also the tiled version runs on the
  upper factor of the LU on a random matrix.  Added a tiled upper inverse 
  to dbl2_tabs_host, extended dbl2_tabs_testers and test_dbl2_tabs.

Sun 4 Jul 2021 : added diagnostics to dbl2_tabs_testers.
  New dbl2_tabs_kernels, tested by dbl2_tabs_testers, updated makefiles.

Sat 3 Jul 2021 : added dbl_Matrix_Difference_Sum function to the
  dbl_tabs_testers.h and dbl_tabs_testers.cpp.
  Improved the documentation of dbl_tabs_kernels.
  Added a function to dbl_tabs_kernels for matrices of medium size.
  New random2_matrices, dbl2_tabs_host, dbl2_tabs_testers, test_dbl2_tabs,
  with updated makefiles.

Fri 2 Jul 2021 : new dbl_tabs_kernels, tested by dbl_tabs_testers.cpp;
  updated the makefiles.  Fixed bug in dbl_tabs_kernels.cu.

Thu 1 Jul 2021 : extended random_matrices with a function to generate
  a random upper triangular matrix.  New dbl_tabs_host, dbl_tabs_testers,
  and test_dbl_tabs, with updated makefiles.  Added the computation of
  the error and condition number to dbl_tabs_testers.
  Added a tiled solver to dbl_tabs_host, extended dbl_tabs_testers
  and test_dbl_tabs.

Tue 29 Jun 2021 : extended dbl_linearization with matrix versions,
  tested by test_dbl_linearization.  Added the linearized versions of the
  upper triangular solvers to dbl_matrices_host, with added tests in
  test_matrices_testers.cpp.  Updated makefile_unix and makefile_windows.

Mon 28 Jun 2021 : updated random_matrices with new versions of functions to
  generate random matrices based on the log(1+x) expansion.
  Add dbl_linearization and test_dbl_linearization, updated makefiles;
  added function to random_matrices to make random complex vectors.

Fri 25 Jun 2021 : added prompt for verbose level in dbl_vectors_testers.cpp,
  with a sum of errors computation on the inner products.
  Extended dbl_matrices_kernels with complex versions of the matrix-vector
  products, tested by dbl_vectors_testers.cpp.
  Added the log2(#cols) summation kernels to dbl_matrices_kernels.cu,
  with an update in dbl_vectors_testers.cpp.

Thu 24 Jun 2021 : fixed bug in the convolutions kernel of 
  dbl_matrices_kernels.cu, tested by dbl_vectors_testers.cpp.
  Added summation to dbl_matrices_kernels, tested by dbl_vectors_testers.cpp,
  okay for dimensions equal to powers of 2.
  Updated dbl_matrices_kernels and dbl_vectors_testers.cpp
  so the sum reduction now works for any dimension.
  Added complex versions to dbl_matrices_kernels, with added tests
  to dbl_vectors_testers.cpp.

Wed 23 Jun 2021 : more uniform and consistent naming of functions
  in dbl_matrices_host, dbl_matrices_kernels, with caused updates
  in dbl_vectors_testers.cpp and dbl_matrices_testers.cpp.

Tue 22 Jun 2021 : new dbl_matrices_kernels with kernels for inner products.
  Updated the makefile_unix and makefile_windows.

Mon 21 Jun 2021 : new dbl_vectors_testers with code from dbl_matrices_testers;
  splitted test_dbl_series_vectors into test_dbl_vectors, test_dbl_matrices.
  Updated all makefiles accordingly.  Shuffled tests in test_dbl_matrices.cpp.
  New test_upper_jobs.cpp, with updated makefiles.

Tue 2 Mar 2021 : added verbose flags in dbl_matrices_host to the lufac
  functions and the solvers.

Sat 27 Feb 2021 : added verbose mode to the forward substitution solvers in
  dbl_matrices_host; and did the same for the backward substitution solvers.

Sat 20 Feb 2021 : to dbl_matrices_host, added forward substitution and
  a function for a LU factorization solver.  Adjusted the forward
  substitution for systems with ones on the diagonal.  Added testers on
  the LU solver in dbl_matrices_testers, updated test_dbl_series_vectors.
  Add functions on complex data to dbl_matrices_{host,testers}, updated
  test_dbl_series_vectors.

Fri 19 Feb 2021 : extended dbl_matrices_host with a matrix-matrix product
  to test the code for the LU factorization in dbl_matrices_testers,
  called in test_dbl_series_vectors.cpp.  Updated makefiles because
  random_numbers are needed to randomize the exp(x) series.

Thu 18 Feb 2021 : added an inplace lu factorization to dbl_matrices_host.

Sun 14 Feb 2021 : added generation of random lower triangular matrices
  to random matrices.

Sat 13 Feb 2021 : to dbl_matrices_host, added an upper triangular solver
  for series with complex coefficients, extended test_dbl_series_vectors.
  Moved functions from test_dbl_series_vectors into dbl_matrices_testers.

Fri 12 Feb 2021 : added an upper triangular solver to dbl_matrices_host,
  tested by test_dbl_series_vectors.

Thu 11 Feb 2021 : extended random_matrices with functions to generate
  upper triangular matrices of random power series.

Wed 10 Feb 2021 : new dbl_matrices_host with code extracted from the
  test_dbl_series_vectors.cpp.

Mon 8 Feb 2021 : extended random_matrices with the generation of complex
  series matrices and test_dbl_series_vectors with a test on the complex
  matrix-vector product.

Sun 7 Feb 2021 : extended random_matrices and test_dbl_series_vectors
  with functions to work with vectors of complex numbered series.

Sat 6 Feb 2021 : extended random_matrices with function to generate
  one random vector and one random matrix.  Added matrix vector products
  and tests in test_dbl_series_vectors.cpp.

Fri 5 Feb 2021 : added the inner product to test_dbl_series_vectors.cpp.

