前言

最近要用C++折腾一些东西，涉及到矩阵运算，看了一下网上推荐的数学库，貌似MKL还是蛮不错滴，放到VS2013里面试试

国际惯例，来波地址

blas, cblas, openblas, atlas, lapack, mkl性能对比

Compiling and Linking Intel® Math Kernel Library with Microsoft* Visual C++*

Visual Studio 2013配置Intel MKL

Intel MKL 在VS中的配置与安装笔记

Getting Started with Intel® Math Kernel Library 2017 for Windows

Developer Reference for Intel® Math Kernel Library 2017 – C

Multiplying Matrices Using dgemm

官方mkl开发文档

安装

下载

MKL安装文件云盘共享：链接：http://pan.baidu.com/s/1qYRRIKs 密码：x9db

安装的时候还是得去官网申请序列号的，不然只能试用了。我的序列号刮开可见：

33RM-RDRJWB75

然后就是一直不断下一步就行了，安装完毕，会有这个目录C:\\Program Files (x86)\\IntelSWTools

这里写图片描述

我这个目录可能有点多，主要是因为在后期update了一下，可以发现有compilers_and_libraries_2017.0.109和compilers_and_libraries_2017.2.187，但是前缀都是一样的compilers_and_libraries_2017后面应该是新版的发布日期

安装

安装的话，主要按照官网的教程来，分为 Automatically和Manually两种方法，这里就尝试自动第一种自动方法吧，就两步搞定。

随便新建一个C++工程文件和源文件

这里写图片描述

然后右键test1->属性->Intel Performance Librarys->use Intel MKL，选择Parallel

这里写图片描述

在C/C++->代码生成->运行库，直接选择多线程(/MT)即可，也就是选择了lib静态链接库文件，如果是动态链接库文件，还得添加挺多lib文件的，以后遇到再补充。详细可以看看动态库和静态库的区别

这里写图片描述

测试

直接使用官网提供的代码Multiplying Matrices Using dgemm

实例做的运算是矩阵乘法

C : = α A * B + β C

C:=\\alpha A*B+\\beta C

调用函数是
cblas_dgemm，查官方文档第111页，得到参数列表

void cblas_dgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double
alpha, const double *a, const MKL_INT lda, const double *b, const MKL_INT ldb, const
double beta, double *c, const MKL_INT ldc);

各参数的意思也在112页有详细说明，这里简单说说

Layout:二维矩阵是以行为主，还是列为主
transa:指定对第一个输入矩阵的操作，也就是在与第二个矩阵相乘之前的变换，提供了三种参数,CblasNoTrans代表原封不动输入，CblasNoTrans代表转置再输入，CblasConjTrans代表共轭转置输入
transb:同transa，对矩阵的预处理操作
m:矩阵A和C的行数
n:矩阵B和C的列数，因为是矩阵相乘嘛，自己想想m*k与k*n的相乘结果
k:矩阵A的列数，矩阵B的行数
alpha：缩放因子
a、lda、b、ldb：针对前两个参数的输入均有不同的四种情况，具体看文档
c：针对行优先还是列优先有不同的输出
ldc：指定c矩阵是行优先还是列优先

具体使用方法，主要还是C++的基本步骤：声明变量，注意矩阵使用指针类型定义；然后用mkl_malloc开辟空间，接下来for循环初始化矩阵；调用cblas_dgemm运算；输出，并利用mkl_free释放内存。

/* C source code is found in dgemm_example.c */

#define min(x,y) (((x) < (y)) ? (x) : (y))

#include <stdio.h>
#include <stdlib.h>
#include "mkl.h"

int main()
{
    double *A, *B, *C;
    int m, n, k, i, j;
    double alpha, beta;

    printf("\\n This example computes real matrix C=alpha*A*B+beta*C using \\n"
        " Intel(R) MKL function dgemm, where A, B, and  C are matrices and \\n"
        " alpha and beta are double precision scalars\\n\\n");

    m = 2000, k = 200, n = 1000;
    printf(" Initializing data for matrix multiplication C=A*B for matrix \\n"
        " A(%ix%i) and matrix B(%ix%i)\\n\\n", m, k, k, n);
    alpha = 1.0; beta = 0.0;

    printf(" Allocating memory for matrices aligned on 64-byte boundary for better \\n"
        " performance \\n\\n");
    A = (double *)mkl_malloc(m*k*sizeof(double), 64);
    B = (double *)mkl_malloc(k*n*sizeof(double), 64);
    C = (double *)mkl_malloc(m*n*sizeof(double), 64);
    if (A == NULL || B == NULL || C == NULL) {
        printf("\\n ERROR: Can't allocate memory for matrices. Aborting... \\n\\n");
        mkl_free(A);
        mkl_free(B);
        mkl_free(C);
        return 1;
    }

    printf(" Intializing matrix data \\n\\n");
    for (i = 0; i < (m*k); i++) {
        A[i] = (double)(i + 1);
    }

    for (i = 0; i < (k*n); i++) {
        B[i] = (double)(-i - 1);
    }

    for (i = 0; i < (m*n); i++) {
        C[i] = 0.0;
    }

    printf(" Computing matrix product using Intel(R) MKL dgemm function via CBLAS interface \\n\\n");
    cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
        m, n, k, alpha, A, k, B, n, beta, C, n);
    printf("\\n Computations completed.\\n\\n");

    printf(" Top left corner of matrix A: \\n");
    for (i = 0; i<min(m, 6); i++) {
        for (j = 0; j<min(k, 6); j++) {
            printf("%12.0f", A[j + i*k]);
        }
        printf("\\n");
    }

    printf("\\n Top left corner of matrix B: \\n");
    for (i = 0; i<min(k, 6); i++) {
        for (j = 0; j<min(n, 6); j++) {
            printf("%12.0f", B[j + i*n]);
        }
        printf("\\n");
    }

    printf("\\n Top left corner of matrix C: \\n");
    for (i = 0; i<min(m, 6); i++) {
        for (j = 0; j<min(n, 6); j++) {
            printf("%12.5G", C[j + i*n]);
        }
        printf("\\n");
    }

    printf("\\n Deallocating memory \\n\\n");
    mkl_free(A);
    mkl_free(B);
    mkl_free(C);

    printf(" Example completed. \\n\\n");
    return 0;
}

最好在运行时候，看看#include"mkl.h"是否有智能提示，或者会不会有红线说找不到库文件等错误

运行结果

 This example computes real matrix C=alpha*A*B+beta*C using
 Intel(R) MKL function dgemm, where A, B, and  C are matrices and
 alpha and beta are double precision scalars

 Initializing data for matrix multiplication C=A*B for matrix
 A(2000x200) and matrix B(200x1000)

 Allocating memory for matrices aligned on 64-byte boundary for better
 performance

 Intializing matrix data

 Computing matrix product using Intel(R) MKL dgemm function via CBLAS int



 Computations completed.

 Top left corner of matrix A:
           1           2           3           4           5           6
         201         202         203         204         205         206
         401         402         403         404         405         406
         601         602         603         604         605         606
         801         802         803         804         805         806
        1001        1002        1003        1004        1005        1006

 Top left corner of matrix B:
          -1          -2          -3          -4          -5          -6
       -1001       -1002       -1003       -1004       -1005       -1006
       -2001       -2002       -2003       -2004       -2005       -2006
       -3001       -3002       -3003       -3004       -3005       -3006
       -4001       -4002       -4003       -4004       -4005       -4006
       -5001       -5002       -5003       -5004       -5005       -5006

 Top left corner of matrix C:
-2.6666E+009-2.6666E+009-2.6667E+009-2.6667E+009-2.6667E+009-2.6667E+009
-6.6467E+009-6.6467E+009-6.6468E+009-6.6468E+009-6.6469E+009 -6.647E+009
-1.0627E+010-1.0627E+010-1.0627E+010-1.0627E+010-1.0627E+010-1.0627E+010
-1.4607E+010-1.4607E+010-1.4607E+010-1.4607E+010-1.4607E+010-1.4607E+010
-1.8587E+010-1.8587E+010-1.8587E+010-1.8587E+010-1.8588E+010-1.8588E+010
-2.2567E+010-2.2567E+010-2.2567E+010-2.2567E+010-2.2568E+010-2.2568E+010

 Deallocating memory

 Example completed.

请按任意键继续. . .