B&C: Hi Tamás, first of all, thank you for this interview. Can you explain to our readers what your position and roles are at FinalWire?
Tamás Miklós: I am the managing director of FinalWire, and I’m also responsible for leading the development of our flagship product, AIDA64.
B&C: Hyper Threading/Logical Core versus Native Core versus Clustered Multi Threading (CMT): what kind of problems have you encountered with the last AIDA64 revision? What do you think about them? Which is the most programmer-friendly or the best performer?
TM: From a hardware diagnostic software perspective, it’s not a problem to handle either of them. But when it comes to processor benchmarks, coping with such differences could be a real challenge. We are in a fortunate position of having a number of veteran benchmark experts in our team, and for them, it’s quite normal to handle very different processor architectures, ranging from in-order single-threaded simple CPUs to multi-socket + multi-core + multi-threaded beasts like Intel Westmere-EX servers. As for best performance, as always, it depends on the actual workload, the type of application you are running. We do not really feel comfortable about judging either solution: we are trying to stay neutral, and squeeze out every possible bit of performance from AMD, Intel and VIA processors, regardless of their architecture.
B&C: Have you encountered some unexpected results during the development, testing and debugging of AIDA64 3.00 with different operating systems (Windows XP, Windows Vista, Windows 7, etc)?
TM: In AIDA64 v3.00, we have introduced multi-threaded cache and memory bandwidth benchmarks, with automatic calibration scheme. On several older systems, we had to face difficulties during the development of this module, especially the calibration part. We had to make a lot of exceptions and use special code paths on such systems, which made the development process very long, spanning over 18 months. But in the end, the whole module got very sophisticated, and thanks to the long testing hours, the final revision provides the expected results, in line with the theoretical performance of the measured systems. The only exception is using Intel Haswell processors under Windows XP, where AVX extensions cannot be activated. With Haswell you can only utilize the full potential of the cache and memory subsystem if you have AVX enabled, which requires Windows 7 SP1 or later.
B&C: In the developing of AIDA64 did you utilize preexisting tools or libraries set or did you create ones of your own? If you used a preexistent one, can you tell us what is the most useful to take advantage of the modern CPU features?
TM: With AIDA64 we try to develop everything in-house to make sure that every little piece, every module in the software is properly done, and up to our quality standards. Very rarely do we have to make exceptions. For example, our CPU ZLib data compression benchmark is based on the ZLib library, and the FPU VP8 benchmark is based on Google’s WebM codec. Of those external libraries we are most impressed by the WebM codec, which is being constantly developed and improved by Google’s experts, and now even have a worthy successor in VP9.
B&C: The actual software applications have very little optimization in order to take advantage of the modern CPUs. Do you think that it's necessary some new sets of library or a new programming language to help the software developers obtaining that? Is the HSA (Heterogeneous System Architecture) platform a good solution to solve this problem?
TM: Libraries are definitely not the way to go. HSA is an interesting concept, but it has a number of drawbacks which are rarely advertised. The major issue is latency: it is simply impossible to issue a short compute task and get the results without a considerable overhead caused by the OpenCL framework and the video driver. Then there’s the issue of the whole OpenCL approach: all video and CPU drivers have to include an OpenCL compiler, which is very convenient for developers, but it also means that expected performance and latencies can greatly vary between driver updates and between manufacturers. Things that work with the current video driver will not necessarily work with the next video driver release. With direct architecture programming – the way AIDA64 CPU and FPU benchmarks work – there are no drivers and latencies to worry about, so you can use extreme optimizations and utilize the latest technologies at their full potential.
With video games, as John Carmack suggested in an interview, we would have to go back to direct programming to eliminate the huge latencies caused by video drivers and DirectX. The original Doom was a perfect example of how you can utilize the full potential of the hardware to ensure maximum video experience – you simply cannot do that today, since it’s not possible to program the GPU directly. But, in my opinion, this will not change unless a major change happens in video rendering, or one of the big video companies comes up with a feasible direct programming solution, and stops revamping the whole GPU architecture at every second or third GPU generation update.
B&C: In your experience, what are the most problematic features to optimize in the modern CPUs?
TM: The biggest challenge would have to be multi-threading. You need to optimize very different workloads and tasks, and unfortunately not all of them can be easily broken up into 4 or more threads. Then there’s the issue of using vector extensions: if you cannot use AVX or FMA instructions in your code, and the compiler doesn’t automatically do the optimizations for you, then it’s very tough to make your software excel on modern CPUs. If your code still uses only SSE or SSE2 optimizations, or does not rely on such optimizations at all, then your software will not perform much better on a brand new Intel Haswell CPU than on a 3-year-old Lynnfield.
B&C: Do you think that hardware makers provide adequate documentation for their own products? E.g., we can see that nVidia site for developers is almost abandoned.
TM: Since the recession began, and especially since the iPad was unveiled, the PC market has been in a sad and constant decline. With GPUs moving into the CPU package, consoles becoming ever more popular, and Intel capturing more and more video adapter market share from AMD and nVIDIA, those companies are quickly losing ground. They try desperately to protect their existing intellectual property by limiting access to documentation, and – although this is just a hunch – they also seem to spend less money on their online presence and maintaining their relationship with the development community. If those companies want to survive, they have to put more efforts into supporting developers.
B&C: We have seen that the cache and DRAM benchmark results in AIDA 64 3.00 are much better than the results you got with AIDA64 2.85 or older versions. Does the much higher memory bandwidth you can measure with AIDA64 translate into performance gains in real-world applications (like Maya, Photoshop, etc)?
TM: It depends on the type of application in question. The classic, legacy AIDA64 cache and memory benchmarks used only a single thread, so the performance they reflected were typical of old single-threaded applications and many 3D games. These days, in the middle of the multi-core CPU era, we can see more and more applications and games optimized for multi-threading. The performance measured by the new multi-threaded benchmarks in AIDA64 could be easily utilized by such applications – while with single-threaded programs you simply cannot use up the whole memory bandwidth.
B&C: Do you think that AIDA64 could be a good tool to test CPUs for specific scopes (farming, cloud, hpc, etc)?
TM: We offer a diverse set of CPU benchmarks, so they cover many usage scenarios, although obviously not all possible usage models. Farming, cloud and HPC markets are quite special, so for such usage models AIDA64 may not offer useful benchmark methods. What we have in AIDA64 is a set of benchmarks that are useful for measuring mainstream desktop and mobile PCs, and 1 or 2 socket servers and workstations. We develop AIDA64 as an easy-to-use software that both average PC users and enthusiasts can use to start benchmarks with a few clicks only and get the results in a few seconds. Specialized industry tools used to measure HPC and high-end server performance are usually much more complicated to set up and use, they run for a longer period of time, and they are simply too difficult and quirky for average PC users.
B&C: Do you think that in the future we'll see an integrated benchmark for GPGPU in AIDA64 (opencl, c++amp, cuda, d3dcompute, etc)?
TM: The AIDA64 OpenCL GPGPU benchmarks are already under development. We haven't specified a release date yet, simply because with those benchmarks we keep running into new issues, virtually every day. OpenCL compilers and video drivers, and even the underlying GPU hardware aren’t as mature and bulletproof as one would expect from such big companies as AMD, Intel or nVIDIA. The current set of AIDA64 GPGPU benchmarks are already capable of revealing such dirty secrets that the manufacturers would prefer to keep concealed. Overheating and throttling video cards, very slow and buggy OpenCL compilers are just a few examples :)
B&C: A question about the built-in AIDA64 stress test. When I use it my rig reaches 150 watts, but when I use Prime95 or OCCT my rig reaches 190 watts. What kind of stress test does AIDA64 perform?
TM: AIDA64 System Stability Test has a set of subtests, and you should carefully enable them, depending on the type of test you wish to run. If you’re looking for hardware errors, then you should enable all of them – which is what most users do. But if you want to put the most thermal stress on your system, then having only the FPU subtest enabled is the way to go. Thanks to SSE, SSE2, AVX and FMA optimizations, the FPU subtest alone will put the most demanding workload on the processor. You can also enable the GPU subtest as well, if you’ve got a discrete video adapter or an AMD APU. In the upcoming releases of AIDA64 we will completely revamp the System Stability Test to make it easier to use, and to put even more stress on the system.
B&C: Today ARM is everywhere, we even start to have notebooks with ARM chips. Will we see AIDA64 for Android or Windows RT in the future? Do you think that the results of the two different platforms (x86 and ARM) are comparable when using the same benchmarking software (3D Mark, for example)?
TM: Currently, we have no plans for an ARM port, since we believe Windows RT is already dead and buried. With Haswell Intel continues closing the gap in power efficiency, and they will keep delivering better and better products this year (Silvermont) and especially next year (Broadwell). If you look back on the past 30 years, Intel and x86 have beaten almost everything out there, including DEC Alpha, MIPS, Motorola m68k, and put many companies – like Cyrix, Harris, NEC, SiS, TI, Transmeta, UMC – out of the CPU business. Thanks to the 64-bit extensions developed by AMD, x86 has even killed IA-64 and Itanium, although Intel will never admit it. Many people hate Intel for x86, and try to make it look like an outdated, previous-century technology, but there’s still a lot of potential in it, and millions of guys stick to x86 for backwards compatibility. ARM may just become the next Alpha, or live next to Intel and x86 – we’ll see in the coming years.
B&C: Based on your experiences, can you give some advice to operating systems developers, to developers in general and to hardware manufactures on how they can do better software?
TM: For every company there’s always some sort of constraint, either time, money or human resources. So when you develop software or hardware, it can never be perfect, since there’s always a specific deadline or limitation you have to live with. If you have the luxury of being able to roll out your product only “when it’s done”, then you have a chance to make something great. But lately we’ve seen too many compromises, bad products, buggy products, and judging by that trend, the future looks quite dim. For all those companies, I’d recommend spending more time on researching various alternatives, validating the technologies, and most importantly listening to consumer feedback before releasing their new products.
B&C: Thank you Tamás, it has been a pleasure talking with you.
TM: Thank you, it’s my pleasure.