Intel MIC
http://en.wikipedia.org/wiki/Intel_MIC
Intel MIC
Designer | Intel |
---|---|
Design | manycore extended x86/x64 design |
Registers | |
General purpose | Intel Architecture registers |
Floating point | 512-bit SIMD vector registers |
Intel Many Integrated Core Architecture or Intel MIC (pronounced Mike) is a multiprocessor computer architecture developed by Intel incorporating earlier work on the Larrabee many core architecture, the Teraflops Research Chipmulticore chip research project, and the Intel Single-chip Cloud Computer multicore microprocessor.
Prototype products codenamed Knights Ferry were announced and released to developers in 2010. A commercial release, codenamed Knights Corner to be built on a 22nm process was scheduled to go into production in late 2012.
In September 2011, the Texas Advanced Computing Center (TACC) announced it would use Knights Corner cards in their 10 PetaFLOPS "Stampede" supercomputer, providing 8 PetaFLOPS of computing power.
At the International Supercomputing Conference (2012, Hamburg), Intel announced the branding of the processor product family as Intel Xeon Phi.
In November 2012, Intel formally announced the first products citing claims of CPU-like versatile programmability, high performance and power efficiency.[1] The Green 500 list placed a system using these new products as the most power efficient computer in the world.[2]
In June 2013, the Tianhe-2 supercomputer at the National Supercomputing Center in Guangzhou (NSCC-GZ) was announced[3] as the world's fastest supercomputer. It utilizes Intel Ivy Bridge-EP Xeon and Xeon Phi processors to achieve 33.86 PetaFLOPS.[4]
Contents
[hide]
History[edit]
Background[edit]
The Larrabee microarchitecture (in development since 2006[5]) introduced very wide (512-bit) SIMD units to a x86 architecture based processor design, extended to a cache coherent multiprocessor system connected via a ring bus to memory; each core was capable of 4-way multi-threading. Due to the design being intended for GPU as well as general purpose computing the Larrabee chips also included specialised hardware for texture sampling.[6][7] The project to produce a GPU retail product directly from the Larrabee research project was terminated in May 2010.[8]
Another contemporary Intel research project implementing x86 architecture on a many-multicore processor was the 'Single Chip Cloud Computer', (prototype introduced 2009.[9]), a design mimicking a cloud computing computer datacentre on a single chip with multiple independent cores - the prototype design included 48 cores per chip with hardware support for selective frequency and voltage control of cores to maximize energy efficiency, and incorporated a mesh network for interchip messaging. The design lacked cache coherent cores and focused on principles that would allow the design to scale to many more cores.[10]
The Teraflops Research Chip (prototype unveiled 2007[11]) was an experimental 80 core chip with two floating point units per core implementing not x86 but a 96-bit VLIW architecture.[12] The project investigated intercore communication methods, per-chip power management, and achieved 1.01 TFLOPS at 3.16 GHz consuming 62 W of power.[13][14]
Knights Ferry[edit]
Intel's MIC prototype board, named Knights Ferry, incorporating a processor codenamed Aubrey Isle was announced 31 May 2010. The product was stated to be a derivative of the Larrabee project and other Intel research including the Single-chip Cloud Computer.[15][16]
The development product was offered as a PCIe card with 32 in-order cores at up to 1.2 GHz with 4 threads per core, 2 GB GDDR5 memory,[17] and 8 MB coherent L2 cache (256 kB per core with 32 kB L1 cache), and a power requirement of ~300 W,[17] built at a 45 nm process.[18] In the Aubrey Isle core a 1,024-bit ring bus (512-bit bi-directional) connects processors to main memory.[19] Single board performance has exceeded 750 GFLOPS.[18] The prototype boards only support single precision floating point instructions.[20]
Initial developers included CERN, Korea Institute of Science and Technology Information (KISTI) and Leibniz Supercomputing Centre. Hardware vendors for prototype boards included IBM, SGI, HP, Dell and others.[21]
Knights Corner[edit]
The Knights Corner product line is expected to be made at a 22 nm process size, using Intel's Tri-gate technology with more than 50 cores per chip, and is expected to lead to commercial products.[15][18]
In June 2011, SGI announced a partnership with Intel to utilize the MIC architecture in its high performance computing products.[22] In September 2011, it was announced that the Texas Advanced Computing Center (TACC) will use Knights Corner cards in their 10 PetaFLOPS "Stampede" supercomputer, providing 8 PetaFLOPS of the compute power.[23] According to "Stampede: A Comprehensive Petascale Computing Environment" the "second generation Intel (Knights Landing) MICs will be added when they become available, increasing Stampede's aggregate peak performance to at least 15 PetaFLOPS."[24]
On November 15, 2011, Intel showed an early silicon version of a Knights Corner processor.[25][26]
On June 5, 2012, Intel released open source software and documentation regarding Knights Corner.[27]
In June 2012, Cray announced it would be offering 22 nm 'Knight's Corner' chips (branded as 'Xeon Phi') as a co-processor in its 'Cascade' systems.[28][29]
In June 2012, ScaleMP announced it will provide its virtualization software to allows using 'Knight's Corner' chips (branded as 'Xeon Phi') as main processor transparent extension. The virtualization software will allow 'Knight's Corner' to run legacy MMX/SSE code and access unlimited amount of (host) memory without need for code changes.[30]
The Knight's Corner chip was announced as being rebranded as 'Xeon Phi' at the 2012 Hamburg International Supercomputing Conference.[31][32]
Knights Landing[edit]
Code name for the second generation MIC architecture product from Intel.[24] Intel officially first revealed details of its second generation Intel Xeon Phi products on June 17, 2013.[4] Intel said that the next generation of Intel MIC Architecture-based products will be available in two forms, as a coprocessor or a host processor (CPU), and be manufactured using Intel's 14nm process technology. Knights Landing products will include integrated on-package memory for significantly higher memory bandwidth. Knights Landing will support AVX-512.[33]
Xeon Phi[edit]
On June 18, 2012, Intel announced that Xeon Phi will be the brand name used for all products based on their Many Integrated Core architecture.[34][35][36][37][38]
On September 11, 2012, it was announced that a supercomputer called Stampede will be based on the Xeon Phi.[39] Stampede will be capable of 10 petaflops.[39]
On November 12, 2012, Intel announced two Xeon Phi coprocessor families which are the Xeon Phi 3100 and the Xeon Phi 5110P.[40][41][42] The Xeon Phi 3100 will be capable of more than 1 teraflops of double precision floating point instructions with 240 GB/sec memory bandwidth at 300 W.[40][41][42] The Xeon Phi 5110P will be capable of 1.01 teraflops of double precision floating point instructions with 320 GB/sec memory bandwidth at 225 W.[40][41][42] The Xeon Phi 7120P will be capable of 1.2 teraflops of double precision floating point instructions with 352 GB/sec memory bandwidth at 300 W.
The Xeon Phi uses the 22 nm process size.[40][41][42] The Xeon Phi 3100 will be priced at under US$2,000 while the Xeon Phi 5110P will have a price of US$2,649 and Xeon Phi 7120 at US$4129.00.[40][41][42] On June 17, 2013, the Tianhe-2 supercomputer was announced[3] by TOP500 as the world's fastest. It uses Intel Ivy Bridge Xeon and Xeon Phi processors to achieve 33.86 PetaFLOPS.
Design[edit]
The cores of Intel MIC are based on a modified version of P54C design, used in the original Pentium.[43] The basis of the Intel MIC architecture is to leverage x86 legacy by creating a x86-compatible multiprocessor architecture that can utilize existing parallelization software tools.[18] Programming tools includeOpenMP, OpenCL,[44] Cilk/Cilk Plus and specialised versions of Intel's Fortran, C++[45] and math libraries.[46]
Design elements inherited from the Larrabee project include x86 ISA, 4-way SMT per core, 512-bit SIMD units, coherent L2 cache, and ultra-wide ring bus connecting processors and memory.
The Knights Corner instruction set documentation is available from Intel.[47][48]
Competitors[edit]
- Nvidia Tesla, direct competitor in the HPC market.[49]
See also[edit]
http://www.zdnet.com/sc13-intel-reveals-knights-landing-high-performance-cpu-7000023393/
SC13: Intel reveals Knights Landing high-performance CPU
Summary: Once a niche, high-performance computing has become a key growth area for the tech industry. Intel’s announcements at Supercomputing 13 today---including new details of a completely redesigned Many Integrated Core processor—show just how important technical computing has become.
By John Morris for Laptops & Desktops | November 19, 2013 -- 21:36 GMT (13:36 PST)
High-performance computing, once a niche area catering to academia and government, has become a key growth area for the tech industry as countries battle to develop the first exascale supercomputers and companies adopt the technology. Intel’s announcements at SC13 today---including new details of a completely redesigned Many Integrated Core processor—show just how important technical computing has become.
Intel released its first Xeon Phi in late 2012 and expanded the product line in June 2013. Known as Knights Corner and manufactured on a 22nm process, these are all co-processors, meaning they must be used with a host x86 processor (generally a Xeon server chip) connected over a PCI-Express bus much like Nvidia Tesla and AMD FirePro accelerators.
The Xeon Phi co-processor is already used in Tianhe-2, the world’s fastest supercomputer andone of 13 systems on the Top500 list that now employ Intel’s Knights Corner. Hazra said that what is “perhaps more exciting” is that in addition to some wins on the Top500, Xeon Phi is also starting to be adopted more broadly in mainstream high-performance applications.
What's Hot on ZDNet
Intel is clearly fast-tracking the development of its Many-Integrated Core architecture. The next version, code-named Knights Landing, will not only be manufactured on a more advanced 14nm process, but it will also include significant changes to the core and other parts of the chip designed to increase performance and improve efficiency.
“It’s a major transition from Knights Corner,” said Raj Hazra, Vice President of the Data Center Group and General Manager of the Technical Computing Group. “You can think of Many-Core as a tock, tock, tock cadence” referring to the so-called tick-tock cadence that Intel uses to introduce major changes to its mainstream Core architecture every other year. To translate, Intel will use the extra transistors provided by Moore’s Law to make big changes.
The biggest of these is that Knights Landing will be a standalone many-core CPU that will fit into standard rack architecture and run its own operating system without needing a separate host CPU. That means Knights Landing can be used as a homogenous, many-core processor in everything from workstations to massive supercomputer clusters without having to develop for heterogeneous systems that offload certain data to accelerators.
“It will have the performance of an accelerator but you will view it as a software developer as a CPU,” Hazra said. “It’s the best of both worlds.” Though Intel is clearly emphasizing its use as a CPU, Knights Landing will also be available in a PCI-Express card as a drop-in replacement for Knights Corner.
Near Memory
The second big change is in the memory architecture. Knights Landing will have a relatively large pool of high-bandwidth “Near Memory” in the CPU package, in addition to the standard DDR memory on the board (aka “Far Memory”). The addition of the Near Memory is meant to boost the performance on memory-bound workloads.
Hazra said this isn’t a new memory hierarchy since developers can treat it as one flat memory space and leave everything up the system software, but Intel also plans to offer developer tools to further optimize applications for the extra high-bandwidth memory. Intel did not say exactly how much extra memory will be in the chip package, but Hazra said it will have “enough capacity to hold meaningful workloads.”
Hazra also talked a bit about how Intel is “opening the door” to customer requests for more customized products. This goes beyond system-level customization to the developments of chips with different types of cores, operating frequencies or thermal envelopes designed for specific sorts of tasks.
Competitor AMD has also talked extensively about developing semi-custom SoCs, but so far neither company has provided examples of real-world products.
Software efforts
Intel has also intensified its efforts in software for high-performance computing. Intel has thousands of software engineers, and is already a big contributor to the Linux kernel and the Android ecosystem. Intel’s Boyd Davis, Vice President of the Data Center Group and General Manager of the Datacenter Software Division, said that modular hardware and open software is driving a lot of growth not only in the cloud and high-performance computing, but also “bleeding over” into the enterprise.
These open-source projects are so disruptive, Davis said, that Intel felt it had to develop its own software for the cloud and HPC. That began with the acquisition last year of Whamcloud, one of the key players behind the Lustre parallel file system used in many of the world's top supercomputers, and the release of Intel Enterprise Edition for Lustre.
At SC13 today Intel announced an HPC Distribution for Apache Hadoop (which runs on Intel Enterprise Edition for Lustre), Cloud Edition for Lustre running on Amazon Web Services Marketplace, and turnkey hardware and software solutions for Enterprise Edition for Lustre from several partners (Advanced HPC, Aeon Computing, Atipa, Boston Ltd., Colfax, E4 Computer Engineering, NOVATTE and System Fabric Works).
Earlier in the day, in the opening keynote address of SC13, Dr. Genevieve Bell, an Intel Fellow and Director of User Experience Research, gave the sort of wide-ranging talk on big data that you’d expect from anthropologist. She defined big data as the combination of data, visualization, analytics and algorithms and talked about some of the earliest examples reaching all the way back to theDomesday Book.
More powerful systems may enable us to analyze larger sets, but big data has been around a long time. “Computers didn’t invent big data. Humans did,” she said. “We are the people who build, we are the people make it, we are the people who use it.”
Bell said that big data holds extraordinary potential in areas such as climate change, energy, medicine, education, and it will be limited not by technology but only by the human imagination.
Topics: Processors, Intel
Intel MIC的更多相关文章
- Intel processor brand names-Xeon,Core,Pentium,Celeron----Xeon
http://en.wikipedia.org/wiki/Comparison_of_Intel_processors Processor Series Nomenclature Code Name ...
- linux内核更新前后配置文件的比较
说明:这里先给出一个比较的结果,作为记录,后续会给出内核配置差异的详细解释. [root@xiaolyu linux-4.7.2]# diff .config .config_bak 3c3< ...
- 第一个 MIC shared_memory 程序
设置Intel编译器的运行环境 在terminal中执行编译器的环境脚本 compilervars.sh: source <install-dir>/bin/compilervars.sh ...
- MIC性能优化策略
MIC性能优化主要包括系统级和内核级:系统级优化包括节点之间,CPU与MIC之间的负载均衡优化:MIC内存空间优化:计算与IO并行优化:IO与IO并行优化:数据传递优化:网络性能优化:硬盘性能优化等. ...
- Intel CPUs
http://en.wikipedia.org/wiki/Intel_cpus List of Intel Atom microprocessors List of Intel Xeon microp ...
- Intel主板芯片组
写这个的初衷还是由于linux内核本身就是硬件的抽象,如果你对硬件的相关发展,机制以及架构不了解,实际你也是看不懂linux内核代码以及看不懂linux很多命令输出的结果的,如果你看内核代码就会发现内 ...
- Intel Media SDK H264 encoder GOP setting
1 I帧,P帧,B帧,IDR帧,NAL单元 I frame:帧内编码帧,又称intra picture,I 帧通常是每个 GOP(MPEG 所使用的一种视频压缩技术)的第一个帧,经过适度地压缩,做为随 ...
- [Intel Edison开发板] 05、Edison开发基于MRAA实现IO控制,特别是UART通信
一.前言 下面是本系列文章的前几篇: [Intel Edison开发板] 01.Edison开发板性能简述 [Intel Edison开发板] 02.Edison开发板入门 [Intel Edison ...
- [Intel Edison开发板] 04、Edison开发基于nodejs和redis的服务器搭建
一.前言 intel-iot-examples-datastore 是Intel提供用于所有Edison开发板联网存储DEMO所需要的服务器工程.该工程是基于nodejs和redis写成的一个简单的工 ...
随机推荐
- Java面试——HashCode的作用原理和实例解析
,也就是说,我们先通过 HashCode来判断两个类是否存放某个桶里,但这个桶里可能有很多类,那么我们就需要再通过 equals 在这个桶里找到我们要的类. 请看下面这个例子 : public cla ...
- x86保护模式 任务状态段和控制门
x86保护模式 任务状态段和控制门 每个任务都有一个任务状态段TSS 用于保存任务的有关信息 在任务内权变和任务切换时 需要用到这些信息 任务内权变的转移和任务切换 一 ...
- 【LeetCode】Count and Say(报数)
这道题是LeetCode里的第38道题. 题目要求: 报数序列是一个整数序列,按照其中的整数的顺序进行报数,得到下一个数.其前五项如下: 1. 1 2. 11 3. 21 4. 1211 5. 111 ...
- Selenium WebDriver高级用法
Selenium GitHub地址 选择合适的WebDrvier WebDriver是一个接口,它有几种实现,分别是HtmlUnitDrvier.FirefoxDriver.InternetExplo ...
- phpstorm 修改头部注释
点击“setting”->"File Templates" ->"PHP File Header"
- 【转】Eric's并发用户数估算与Little定律的等价性
转自:http://www.cnblogs.com/hundredsofyears/p/3360305.html 在国内性能测试的领域有一篇几乎被奉为大牛之作的经典文章,一个名叫Eric Man Wo ...
- UITableView加载几种不同的cell
@import url(http://i.cnblogs.com/Load.ashx?type=style&file=SyntaxHighlighter.css);@import url(/c ...
- BZOJ 4823 [Cqoi2017]老C的方块 ——网络流
lrd的题解:http://www.cnblogs.com/liu-runda/p/6695139.html 我还是太菜了.以后遇到这种题目应该分析分析性质的. 网络流复杂度真是$O(玄学)$ #in ...
- OS X 下iso刻录U盘(系统安装启动盘)
1. 查看盘 $diskutil list /dev/disk0 #: TYPE NAME SIZE IDENTIFIER : GUID_partition_scheme *320.1 GB disk ...
- JConsole手册
一篇Sun官方网站上介绍JConsole使用的文章,前段时间性能测试的时候大概翻译了一下以便学习,今天整理一下发上来,有些地方也不知道怎么翻,就保留了原文,可能还好理解点,呵呵,水平有限,翻的不好,大 ...