Center for Strategic Assessment and forecasts

Autonomous non-profit organization

Home / Science and Society / New in Science / Articles
The Japanese have unveiled a prototype processor for exaflops supercomputer: what's inside the chip
Material posted: Publication date: 30-08-2018
Earlier we talked about the most powerful Japanese supercomputer for research in nuclear physics. Now in Japan exaflops create a supercomputer Post-K — the Japanese will be one of the first to start up the machine with a computational capacity.

Commissioning is scheduled for 2021.

Last week, Fujitsu said about the technical characteristics of the A64FX chip, which will form the basis of a new "machine". Tell me more about the chip and its capabilities.

Specifications A64FX

It is expected that computing power Post-K is almost ten times exceeds indicators of the most powerful existing supercomputers , the IBM Summit (as of June 2018).

Similar performance supercomputer must A64FX chip on the Arm architecture. This chip consists of 48 cores for carrying out computing operations and four cores to manage them. All of them are equally divided into four groups — Core Memory Groups (CMG).

Each group has 8 MB of L2 cache. It is associated with the memory controller and interface NoC ("network on chip"). NoC connects different CMG c controllers, PCIe and Tofu. The latter is responsible for communication of the processor with the rest of the system. The controller Tofu has ten ports with a bandwidth of 12.5 GB/s.

Scheme of the chip as follows:

The total amount of memory HBM2 the processor is 32 GB, and its capacity is equal to 1024 GB/s. At Fujitsu say that the processor performance on floating point operations amounts to 2.7 teraflops for 64-bit operations 5.4 teraflops for 32 bit and 10.8 teraflops — to 16-bit.

The creation of the Post-K was being watched by the resource editors of the Top500, which make up the list of the most powerful computing systems. In their estimation, to achieve performance in one exaflops in the supercomputer used by more than 370 thousand processors A64FX.

In the device for the first time applies the technology of vector extensions, called Scalable Vector Extension (SVE). It differs from other SIMD-architectures that do not limit the length of the vector registers, and sets them to valid range. SVE supports vectors with length from 128 to 2048 bits. So any program can be run on other processors that support SVE, without the need for recompilation.

With SVE (as it is a SIMD-function), the processor can simultaneously carry out calculations with multiple data arrays. Here is an example of one of these instructions for function NEON that was used for the vector calculations for other architectures, Arm processors:

vadd.i32 q1, q2, q3

It adds four 32-bit integers from 128-bit register q2 with corresponding numbers of 128-bit register q3 and writes the resulting array in q1. The equivalent of this operation in C looks like this:

for(i = 0; i < 4; i++) a[i] = b[i] + c[i];

Additionally, SVE supports the function of automatic vectorizing. Automatic vectorizer analyzes loops in your code and, if possible, he uses vector registers to perform them. This increases the performance of your code.

For example, a function in C:

void vectorize_this(unsigned int *a, unsigned int *b, unsigned int *c)
 unsigned int i; 
 for(i = 0; i < SIZE; i++) 
 a[i] = b[i] + c[i]; 

It will be compiled as follows (for a 32-bit Arm processor):

104cc: ldr.w r3, [r4, #4]!
104d0: ldr.w r1, [r2, #4]!
104d4: cmp r4, r5 
104d6: add r3, r1 
104d8: str.w r3, [r0, #4]!
104dc: bne.n 104cc <vectorize_this+0xc> 

If you use the automatic tracing, it will look like this:

10780: vld1.64 {d18-d19}, [r5,:64] 10784: adds r6, #1 10786: cmp r6, r7 10788: add.w r5, r5, #16 1078c: vld1.32 {d16-d17}, [r4] 10790: vadd.i32 q8, q8, q9 10794: add.w r4, r4, #16 10798: vst1.32 {d16-d17}, [r3] 1079c: add.w r3, r3, #16 107a0: bcc.n 10780 <vectorize_this+0x70>

There is a download of the SIMD registers q8 and q9 to the data from the arrays referenced by r5 and r4. After that, the vadd instruction adds four 32-bit integers at a time. This increases the amount of code, but it is processed much more data for each iteration of the loop.

Who else creates exaflops supercomputers

Creating exaflops supercomputers are engaged not only in Japan. For example, work is also underway in China and the United States.

In China has created Tianhe-3 (Tianhe-3). Its prototype already tested at the National supercomputer center in Tianjin. The final version of the computer is planned to finish in 2020.

/ photo O01326 CC / Supercomputer Tianhe-2's predecessor, Tianhe-3

Based on Tianhe-3 are Chinese processors Phytium. The device contains 64 cores, has a capacity of 512 gigaflops and memory bandwidth in of 204.8 GB/s.

Working prototype created for the machine from the Sunway. He tested at the National supercomputer center in Jinan. According to the developers, the computer now has about 35 applications — is a biomedical simulation applications for processing big data, and programs to study climate change. It is expected that work on the computer will be completed in the first half of 2021.

As for the United States, the Americans are planning to create your exaflops computer by 2021. The project is called Aurora A21, and working on Argonne national laboratory U.S. Department of energy, as well as Intel and Cray.

This year researchers have selected ten projects for the Aurora program Early Science Program, which will be the first to benefit from new high performance system. Among them was the program to create a map of brain neurons, the study of dark matter and the development of a simulator of a particle accelerator.

Exaflops computers will make it possible to build complex models for research, so the establishment of such cars expect many research projects. One of the most ambitious Human Brain Project (HBP), the purpose of which is to create a full model of human brain research, and neuromorphic computing. How say scientists from the HBP, the use of new exaflops systems are from the very first days of their existence.


RELATED MATERIALS: Science and Society
Возрастное ограничение