|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.3 |
| 4 | +31-Aug-2018 |
| 5 | + |
| 6 | +common: |
| 7 | + * thread memory allocation has been switched back to the method |
| 8 | + used before version 0.3.1 due to unexpected problems caused by |
| 9 | + the new code under some circumstances. A new compile-time option |
| 10 | + USE_TLS has been added to enable the new code, and it is hoped |
| 11 | + that this can become the default again in the next version. |
| 12 | + * LAPAck PR272 has been integrated, which fixes spurious errors |
| 13 | + in DSYEVR and related functions caused by missing conversion |
| 14 | + from ILAENV to ILAENV_2STAGE in several _2stage routines. |
| 15 | + * the cmake-generated OpenBLASConfig.cmake now uses correct case |
| 16 | + for the name of the library |
| 17 | + * added support for Haiku OS |
| 18 | + |
| 19 | +x86_64: |
| 20 | + * added AVX512 implementations of SDOT, DDOT, SAXPY, DAXPY, |
| 21 | + DSCAL, DGEMVN and DSYMVL |
| 22 | + * added a workaround for a cygwin issue that prevented compilation |
| 23 | + of AVX512 code |
| 24 | + |
| 25 | +IBM Z: |
| 26 | + * added autodetection of Z14 |
| 27 | + * fixed TRMM errors in the generic target |
| 28 | + |
| 29 | +==================================================================== |
| 30 | +Version 0.3.2 |
| 31 | +30-Jul-2018 |
| 32 | + |
| 33 | +common: |
| 34 | + * fixes for regressions caused by the rewrite of the thread |
| 35 | + initialization code in 0.3.1 |
| 36 | + |
| 37 | +POWER: |
| 38 | + * fixed cpu autodetection for the BSDs |
| 39 | + |
| 40 | +MIPS64: |
| 41 | + * fixed utest errors in AXPY, DSDOT, ROT and SWAP |
| 42 | + |
| 43 | +x86_64: |
| 44 | + * added autodetection of AMD Ryzen 2 |
| 45 | + * fixed build with older versions of MSVC |
| 46 | + |
| 47 | +==================================================================== |
| 48 | +Version 0.3.1 |
| 49 | +01-Jul-2018 |
| 50 | + |
| 51 | +common: |
| 52 | + * rewritten thread initialization code with significantly reduced overhead |
| 53 | + * added CBLAS interfaces to the IxAMIN BLAS extension functions |
| 54 | + * fixed the lapack-test target |
| 55 | + * CMAKE builds now create an OpenBLASConfig.cmake file |
| 56 | + * ZAXPY now uses a single thread for small input sizes |
| 57 | + * the LAPACK code was updated from Reference-LAPACK/lapack#253 |
| 58 | + (fixing LAPACKE interfaces to Aasen's functions) |
| 59 | + |
| 60 | +POWER: |
| 61 | + * corrected CROT and ZROT behaviour with zero INC_X |
| 62 | + |
| 63 | +ARMV7: |
| 64 | + * corrected xDOT behaviour with zero INC_X or INC_Y |
| 65 | + |
| 66 | +x86_64: |
| 67 | + * retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER, |
| 68 | + this affects PENRYN,DUNNINGTON,OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO |
| 69 | + (which will still be supported via the slower PRESCOTT kernels when this option is not set) |
| 70 | + * added an option DYNAMIC_LIST that (used in conjunction with DYNAMIC_ARCH) allows to |
| 71 | + specify the list of x86_64 targets to include. Any target not on the list will be supported |
| 72 | + by the Sandybridge or Nehalem kernels if available, or by Prescott. |
| 73 | + * improved SWITCH_RATIO on Haswell for increased GEMM throughput |
| 74 | + * added initial support for Intel Skylake X, including an AVX512 SGEMM kernel |
| 75 | + * added autodetection of Intel Cannon Lake series as Skylake X |
| 76 | + * added a default L2 cache size for hypervisors that return zero here (Chromebook) |
| 77 | + * fixed a name clash with recent Windows10 headers that broke the build with (at least) |
| 78 | + recent mingw from MSYS2 |
| 79 | + * fixed a link error in mixed clang/gfortran builds with OpenMP |
| 80 | + * updated the OSX deployment target to 10.8 |
| 81 | + * switched on parallel make for builds on MS Windows by default |
| 82 | + |
| 83 | +x86: |
| 84 | + * fixed SSWAP and DSWAP behaviour with zero INC_X and INC_Y |
| 85 | + |
| 86 | +==================================================================== |
| 87 | +Version 0.3.0 |
| 88 | +23-May-2108 |
| 89 | + |
| 90 | +common: |
| 91 | + * fixed some more thread race and locking bugs |
| 92 | + * added preliminary support for calling an OpenMP build of the library from multiple threads |
| 93 | + * removed performance impact of thread locks added in 0.2.20 on OpenMP code |
| 94 | + * general code cleanup |
| 95 | + * optimized DSDOT implementation |
| 96 | + * improved thread distribution for GEMM |
| 97 | + * corrected IMATCOPY/OMATCOPY implementation |
| 98 | + * fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations |
| 99 | + * cmake build improvements |
| 100 | + * pkgconfig file now contains build options |
| 101 | + * openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build |
| 102 | + * corrections and improvements for systems with more than 64 cpus |
| 103 | + * LAPACK code updated to 3.8.0 including later fixes |
| 104 | + * added ReLAPACK, a recursive implementation of several LAPACK functions |
| 105 | + * Rewrote ROTMG to handle cases that the netlib code failed to address |
| 106 | + * Disabled (broken) multithreading code for xTRMV |
| 107 | + * corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard |
| 108 | + * shared memory access failures on startup are now handled more gracefully |
| 109 | + * restored utests from earlier releases (and made them pass on all affected systems) |
| 110 | + |
| 111 | +SPARC: |
| 112 | + * several fixes for cpu autodetection |
| 113 | + |
| 114 | +POWER: |
| 115 | + * corrected vector register overwriting in several Power8 kernels |
| 116 | + * optimized additional BLAS functions |
| 117 | + |
| 118 | +ARM: |
| 119 | + * added support for CortexA53 and A72 |
| 120 | + * added autodetection for ThunderX2T99 |
| 121 | + * made most optimized kernels the default for generic ARMv8 targets |
| 122 | + |
| 123 | +x86_64: |
| 124 | + * parallelized DDOT kernel for Haswell |
| 125 | + * changed alignment directives in assembly kernels to boost performance on OSX |
| 126 | + * fixed register handling in the GEMV microkernels (bug exposed by gcc7) |
| 127 | + * added support for building on OpenBSD and Dragonfly |
| 128 | + * updated compiler options to work with Intel release 2018 |
| 129 | + * support fully optimized build with clang/flang on Microsoft Windows |
| 130 | + * fixed building on AIX |
| 131 | + |
| 132 | +IBM Z: |
| 133 | + * added optimized BLAS 1/2 functions |
| 134 | + |
| 135 | +MIPS: |
| 136 | + * fixed cpu autodetection helper code |
| 137 | + * added mips32 1004K cpu (Mediatek MT7621 and similar SoC) |
| 138 | + * added mips64 I6500 cpu |
| 139 | + |
2 | 140 | ====================================================================
|
3 | 141 | Version 0.2.20
|
4 | 142 | 24-Jul-2017
|
|
0 commit comments