BLAS(Basic Linear Algebra Subprograms)是一组线性代数计算中通用的基本运算操作函数集合[1] 。BLAS Technical (BLAST) Forum负责规范BLAS函数接口, 并在网站[1]公布一个由Fortran语言编写的BLAS库。这个Fortran版BLAS库通常被称为BLAS参考库(the reference implementation)。 BLAS参考库使用的算法能高效地给出正确的结果,但仍有许多优化潜力。要想获得更高的计算效率,可以使用优化的BLAS库。

BLAS是LAPACK的子集,LAPACK是更丰富的线性代数程序库。

BLAS库的实现  

向量和矩阵运算是数值计算的基础,BLAS库通常是一个软件计算效率的决定性因素。除了BLAS参考库以外,还有多种衍生版本和优化版本。这些BLAS库实现中,有些仅实现了其它编程语言的BLAS库接口,有些是基于BLAS参考库的Fortran语言代码翻译成其它编程语言,有些是通过二进制文件代码转化方法将BLAS参考库转换成其它变成语言代码,有些是在BLAS参考库的基础上,针对不同硬件(如CPU,GPU)架构特点做进一步优化[4][5]。

ATLAS BLAS[3] 

The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK.
 

OpenBLAS[4] 

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Intel® Math Kernel Library[5]

Intel® Math Kernel Library (Intel® MKL) accelerates math processing and neural network routines that increase application performance and reduce development time. Intel MKL includes highly vectorized and threaded Linear Algebra, Fast Fourier Transforms (FFT), Neural Network, Vector Math and Statistics functions. The easiest way to take advantage of all of that processing power is to use a carefully optimized math library. Even the best compiler can’t compete with the level of performance possible from a hand-optimized library. If your application already relies on the BLAS or LAPACK functionality, simply re-link with Intel MKL to get better performance on Intel and compatible architectures.

cuBLAS[6]

The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library that delivers 6x to 17x faster performance than the latest MKL BLAS.

clBLAS[7]

This repository houses the code for the OpenCL™ BLAS portion of clMath. The complete set of BLAS level 1, 2 & 3 routines is implemented. 

BLIS[10]

BLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, enable optimized implementations of most of its commonly used and computationally intensive operations. Select kernels have been optimized for the AMD EPYCTM processor family. The optimizations are done for single and double precision routines.

BLAS 函数  

BLAS库中函数根据运算对象的不同,分为三类:

  • Level 1 函数处理单一向量的线性运算以及两个向量的二元运算。Level 1 函数最初出现在1979年公布的BLAS库中。
  • Level 2 函数处理矩阵与向量的运算,同时也包含线性方程求解计算。 Level 2 函数公布于1988年。
  • Level 3 函数包含矩阵与矩阵运算。Level 3 函数发表于1990年。

BLAS 函数接口命名规范

Level 1 接口函数名称由“前缀+操作简称“组成
例如 SROTG函数,其中

  • S    -- 标明矩阵或向量中元素数据类型的前缀;
  • ROTG -- 向量运算简称.
     
    前缀: 矩阵或向量内元素的数据类型,有以下几种:
  • S - 单精度浮点数
  • D - 双精度浮点数
  • C - 复数 
  • Z - 16位复数

Level 2 和 Level 3函数涉及矩阵运算,接口函数名称由”前缀 + 矩阵类型 + 操作简称“组成。
例如: SGEMV

  • S     -- 标明矩阵或向量中元素数据类型的前缀;
  • GE   -- 矩阵类型
  • MV  -- 向量或矩阵运算简称
     
    BLAS库中使用的矩阵类型有以下几种:
  • GE - GEneral  稠密矩阵
  • GB - General Band 带状矩阵

  • SY - SYmmetric    对称矩阵
  • SB - Symmetric Band 对称带状矩阵
  • SP - Symmetric Packed  压缩存储对称矩阵

  • HE - HEmmitian     Hemmitian矩阵,自共轭矩阵
  • HB - Hemmitian Band   带状Hemmitian矩阵
  • HP - Hemmitian Packed  压缩存储Hemmitian矩阵

  • TR - TRiangular      三角矩阵
  • TB - Triangular Band  三角带状矩阵 
  • TP - Triangular Packed  压缩存储三角矩阵

Level 1

PROTG
 - Description: generate plane rotation
 - Syntax: PROTG( A, B, C, S)
    - P: S(single float), D(double float)

PROTMG
 - Description: generate modified plane rotation
 - Syntax: PROTMG( D1, D2, A, B, PARAM)
    - P: S(single float), D(double float)

PROT
 - Description: apply plane rotation
 - Syntax: PROT( N, X, INCX, Y, INCY, C, S)
    - P: S(single float), D(double float)

PROTM
 - Description: apply modified plane rotation
 - Syntax: PROTM( N, X, INCX, Y, INCY, PARAM)
    - P: S(single float), D(double float)

PSWAP
 - Description: swap x and y
 - Syntax: PSWAP( N, X, INCX, Y, INCY)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PSCAL
 - Description: x = ax
 - Syntax: PSCAL( N, ALPHA, X, INCX)
    - P: S(single float), D(double float), C(complex), Z(complex
16), CS, ZD

PCOPY
 - Description: copy x into y
 - Syntax: PCOPY( N, X, INCX, Y, INCY)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PAXPY
 - Description: copy x into y
 - Syntax: PAXPY( N, ALPHA, X, INCX, Y, INCY)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PDOT
 - Description: dot product
 - Syntax: PDOT( N, X, INCX, Y, INCY)
    - P: S(single float), D(double float), DS

PNRM2
 - Description: Euclidean norm
 - Syntax: PNRM2( N, X, INCX)
    - P:  S(single float), D(double float), CS, ZD

PASUM
 - Description: sum of absolute values
 - Syntax: PASUM( N, X, INCX)
    - P:  S(single float), D(double float), CS, ZD

IXAMAX
 - Description: index of max absolute value
 - Syntax: IXAMAX( N, X, INCX)

Level 2

PGEMV
 - Description: matrix vector multiply
 - Syntax: PGEMV( TRANS, M, N, ALPHA, A, LDA, X, INCX, BETA, Y, INCY)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PGBMV
 - Description: banded matrix vector multiply
 - Syntax: PGEMV( TRANS, M, N, KL, KU, ALPHA, A, LDA, X, INCX, BETA, Y, INCY)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PSYMV
 - Description: symmetric matrix vector multiply
 - Syntax: PGEMV( TRANS, N, ALPHA, A, LDA, X, INCX, BETA, Y, INCY)
    - P:  S(single float), D(double float),

PSBMV
 - Description: symmetric banded matrix vector multiply
 - Syntax: PGEMV( TRANS, N, K, ALPHA, A, LDA, X, INCX, BETA, Y, INCY)
    - P:  S(single float), D(double float),

PSPMV -
 - Description: symmetric packed matrix vector multiply
 - Syntax: PGEMV( TRANS, N, ALPHA, AP, X, INCX, BETA, Y, INCY)
    - P:  S(single float), D(double float),

PTRMV
 - Description: triangular matrix vector multiply
 - Syntax: PTRMV( UPLO, TRANS, DIAG, A, LDA, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PTBMV -
 - Description: triangular banded matrix vector multiply
 - Syntax: PTRSV( UPLO, TRANS, DIAG, N, K, A, LDA, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PTPMV
 - Description: triangular packed matrix vector multiply
 - Syntax: PTPMV( UPLO, TRANS, DIAG, N, AP, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PTRSV
 - Description: solving triangular matrix problems
 - Syntax: PTRSV( UPLO, TRANS, DIAG, N, A, LDA, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PTBSV
 - Description: solving triangular banded matrix problems
 - Syntax: PTBSV( UPLO, TRANS, DIAG, N, K, A, LDA, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PTPSV
 - Description: solving triangular packed matrix problems
 - Syntax: PGER( UPLO, TRANS, DIAG, N, AP, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PGER
 - Description: performs the rank 1 operation A := alphaxy' + A
 - Syntax: PGER( M, N, ALPHA, X, INCX, Y, INCY, A, LDA)
    - P:  S(single float), D(double float)

PSYR
 - Description: performs the symmetric rank 1 operation A := alphaxx' + A
 - Syntax: PSYR( UPLO,  N, ALPHA, X, INCX, A, LDA)
    - P: S(single float), D(double float)

PSPR -
 - Description: symmetric packed rank 1 operation A := alphaxx' + A
 - Syntax: PSPR( UPLO,  N, ALPHA, X, AP)
    - P: S(single float), D(double float)

PSYR2
 - Description: performs the symmetric rank 2 operation, A := alphaxy' + alphayx' + A
 - Syntax: PSYR2( UPLO,  N, ALPHA, X, INCX, Y, INCY, A, LDA)
    - P: S(single float), D(double float)

PSPR2
 - Description: performs the symmetric packed rank 2 operation, A := alphaxy' + alphayx' + A
 - Syntax: PSPR2( UPLO,  N, ALPHA, X, INCX, Y, INCY, AP)
    - P: S(single float), D(double float)

Level 3

PGEMM
 - Description: matrix matrix multiply
 - Syntax: PGEMM( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PSYMM
 - Description: symmetric matrix matrix multiply
 - Syntax: PTRSM( SIDE, UPLD, M, N, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PSYRK
 - Description: symmetric rank-k update to a matrix
 - Syntax: PSYR2K( UPLD, TRANSA, N, K, ALPHA, A, LDA, BETA, C, LDC)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PSYR2K
 - Description: symmetric rank-2k update to a matrix
 - Syntax: PSYR2K( UPLD, TRANSA, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PTRMM -
 - Description: triangular matrix matrix multiply
 - Syntax: PTRMM( SIDE, UPLD, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PTRSM
 - Description: solving triangular matrix with multiple right hand sides
 - Syntax: PTRSM( SIDE, UPLD, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

其它矩阵计算库

SparseLib++ --- Numerical Sparse Matrix Classes in C++

http://math.nist.gov/sparselib

SparseLib++ is a C++ class library for efficient sparse matrix computations across various computational platforms.  The software package consists of matrix objects representing several sparse storage formats currently in use (in this release: compressed row, compressed column and coordinate formats), providing basic functionality for managing sparse matrices, together with 
efficient kernel mathematical operations (e.g. sparse matrix-vector multiply).
Routines based on the Sparse BLAS are used to enhance portability and performance. Included in the package are various preconditioners commonly used in iterative solvers for linear systems of equations.  The focus is on computational support for iterative methods, but the sparse matrix objects 
presented here can be used on their own.

SparseLib++最新版是 v. 1.7. 最近更新时间是2008年(已经好久没更新了)

SparseLib++ 1.7 使用了complex.h等C99特性。 使用g++ v. 4.0.1以上版本能编译。Visual Studio对不支持所有C99特性,不能直接使用VS编译SparseLib++ 1.7(可以通过mingw编译)

PETSc  

https://www.mcs.anl.gov/petsc/

PETSc, pronounced PET-see (the S is silent), is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. It supports MPI, and GPUs through CUDA or OpenCL, as well as hybrid MPI-GPU parallelism. PETSc (sometimes called PETSc/Tao) also contains the Tao optimization software library.

SuitSparse

http://faculty.cse.tamu.edu/davis/suitesparse.html
SuiteSparse is a suite of sparse matrix algorithms,
另外, 网页[8]上列举了许多矩阵计算库

参考文献

[1] blas官网:http://www.netlib.org/blas/
[2] https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
[3] http://math-atlas.sourceforge.net/
[4] http://www.openblas.net/ 
[5] https://software.intel.com/en-us/intel-mkl/
[6] https://developer.nvidia.com/cublas 
[7] https://github.com/clMathLibraries/clBLAS   
[8] https://scicomp.stackexchange.com/questions/351/recommendations-for-a-usable-fast-c-matrix-library
[9] https://martin-thoma.com/solving-linear-equations-with-gaussian-elimination/
[10] https://developer.amd.com/amd-cpu-libraries/blas-library/

原文:CSDN cocoonyang

附录:相关链接

openblass的Github:https://github.com/xianyi/OpenBLAS/wiki/User-Manual
openblass作者的一次讲座:https://www.leiphone.com/news/201704/Puevv3ZWxn0heoEv.html

[转]BLAS简介的更多相关文章

  1. CUDA ---- CUDA库简介

    CUDA Libraries简介 上图是CUDA 库的位置,本文简要介绍cuSPARSE.cuBLAS.cuFFT和cuRAND,之后会介绍OpenACC. cuSPARSE线性代数库,主要针对稀疏矩 ...

  2. BLAS快速入门

    一.简介 BLAS[Basic Linear Algebra Subprograms,基础线性代数程序集]是一个应用程序接口[API]标准,用于规范发布基础基础线性代数操作的数值库[常用于向量或矩阵计 ...

  3. ASP.NET Core 1.1 简介

    ASP.NET Core 1.1 于2016年11月16日发布.这个版本包括许多伟大的新功能以及许多错误修复和一般的增强.这个版本包含了多个新的中间件组件.针对Windows的WebListener服 ...

  4. MVVM模式和在WPF中的实现(一)MVVM模式简介

    MVVM模式解析和在WPF中的实现(一) MVVM模式简介 系列目录: MVVM模式解析和在WPF中的实现(一)MVVM模式简介 MVVM模式解析和在WPF中的实现(二)数据绑定 MVVM模式解析和在 ...

  5. Cassandra简介

    在前面的一篇文章<图形数据库Neo4J简介>中,我们介绍了一种非常流行的图形数据库Neo4J的使用方法.而在本文中,我们将对另外一种类型的NoSQL数据库——Cassandra进行简单地介 ...

  6. REST简介

    一说到REST,我想大家的第一反应就是“啊,就是那种前后台通信方式.”但是在要求详细讲述它所提出的各个约束,以及如何开始搭建REST服务时,却很少有人能够清晰地说出它到底是什么,需要遵守什么样的准则. ...

  7. Microservice架构模式简介

    在2014年,Sam Newman,Martin Fowler在ThoughtWorks的一位同事,出版了一本新书<Building Microservices>.该书描述了如何按照Mic ...

  8. const,static,extern 简介

    const,static,extern 简介 一.const与宏的区别: const简介:之前常用的字符串常量,一般是抽成宏,但是苹果不推荐我们抽成宏,推荐我们使用const常量. 执行时刻:宏是预编 ...

  9. HTTPS简介

    一.简单总结 1.HTTPS概念总结 HTTPS 就是对HTTP进行了TLS或SSL加密. 应用层的HTTP协议通过传输层的TCP协议来传输,HTTPS 在 HTTP和 TCP中间加了一层TLS/SS ...

随机推荐

  1. HDU 1517 (累乘 找规律)

    题意:2 个人玩游戏,从 1 开始,轮流对数进行累乘,直到超过一个指定的值. 解题思路:如果输入是 2 ~ 9 ,因为Stan 是先手,所以Stan 必胜如果输入是 10~18 ,因为Ollie 是后 ...

  2. [转] Java中public,private,final,static等概念的解读

    作为刚入门Java的小白,对于public,private,final,static等概念总是搞不清楚,到底都代表着什么,这里做一个简单的梳理,和大家分享,若有错误请指正,谢谢~ 访问权限修饰符 pu ...

  3. POJ 2976 3111(二分-最大化平均值)

    POJ 2976 题意 给n组数据ai,bi,定义累计平均值为: 现给出一个整数k,要求从这n个数中去掉k个数后,最大累计平均值能有多大?(四舍五入到整数) 思路 取n−k个数,使得累计平均值最大. ...

  4. Fibonacci PKU logn 求斐波那契的快速方法!!!

    矩阵的快速幂 #include<cstdio> using namespace std; struct matrix { ][]; }ans,base; matrix multi( mat ...

  5. python网络编程基础(线程与进程、并行与并发、同步与异步、阻塞与非阻塞、CPU密集型与IO密集型)

    python网络编程基础(线程与进程.并行与并发.同步与异步.阻塞与非阻塞.CPU密集型与IO密集型) 目录 线程与进程 并行与并发 同步与异步 阻塞与非阻塞 CPU密集型与IO密集型 线程与进程 进 ...

  6. TensorFlow 常用的函数

    TensorFlow 中维护的集合列表 在一个计算图中,可以通过集合(collection)来管理不同类别的资源.比如通过 tf.add_to_collection 函数可以将资源加入一个或多个集合中 ...

  7. 三篇文章带你极速入门php(一)之语法

    本文适合阅读用户 有其他语言基础的童鞋 看完w3cschool语法教程来回顾一下的童鞋(传送门,想全面看一下php语法推荐这里) 毫无基础然而天资聪慧颇有慧根(不要左顾右看说的就是你,老夫这里有一本& ...

  8. 磁盘修复工具TestDisk

    磁盘修复工具TestDisk TestDisk一款免费的数据的恢复工具,可以用于还原丢失的磁盘分区,恢复磁盘驱动引导功能.它还能检测磁盘损坏的原因,如病毒感染.人为损坏.恶意软件等.该工具采用文本菜单 ...

  9. iOS 11开发教程(二十一)iOS11应用视图美化按钮之实现按钮的响应(1)

    iOS 11开发教程(二十一)iOS11应用视图美化按钮之实现按钮的响应(1) 按钮主要是实现用户交互的,即实现响应.按钮实现响应的方式可以根据添加按钮的不同分为两种:一种是编辑界面添加按钮实现的响应 ...

  10. Web前端性能优化——编写高效的JavaScript

    前言 随着计算机的发展,Web富应用时代的到来,Web 2.0早已不再是用div+css高质量还原设计的时代.自Gmail网页版邮件服务的问世开始,Web前端开发也开启了新的纪元.用户需求不断提高,各 ...