Finally, after ~4 years of development our code - Piernik MHD went public. There is also a brand new site:
http://piernik.astri.umk.pl
powered by Lighty and driven by DokuWiki and Trac.
niedziela, 1 listopada 2009
poniedziałek, 17 sierpnia 2009
Low-end vs High-End
Over the past few days I've been configuring and playing with my new toy - Janus (as all our clusters bear the names of mythological polycephalous creatures). Janus will work as a test bed and development platform for our MPI+CUDA codes. It consists of two nodes equipped with:
While googling today I found site of NCSA's GPU cluster along with results of standard test from CUDA SDK:
If we compare shear power: Tesla is capable of 936 Gflops using 180W of energy under load, while GTX295 2*894=1788 Gflops using 330W! Furthermore, the difference in price is enormous! Tesla costs 1500€, while GTX295 - 400€.
I slowly begin to wonder why people use Tesla C1060 at all? Maybe cause it's easier to program single GPU card with lots of memory (Tesla has 4GB DDR3), than put a little effort into developing MPI+CUDA codes... Time will show which strategy will prevail.
- mobo: ASRock X58 SC
- cpu: Intel core i7 920
- ram: Kingston DDR3 12288MB PC3-8500 1066MHz (6x2048)
- gpu: 2x Asus ENGTX295 Geforce GTX295 1792MB DDR3
- psu: Tagan 1100W PipeRock
- hdd: 2x WD Caviar 1.5TB SATA300 Green Power 32MB
- case: Chieftec BX-02B-B-B (black) ATX
- net: Mellanox InfiniHost III Lx (10Gb/s) (borrowed from N. Copernicus Astronomical Centre), 2x 1Gb/s
While googling today I found site of NCSA's GPU cluster along with results of standard test from CUDA SDK:
../../bin/linux/release/reduction --kernel=5 --n=16384which we can compare with Janus:
Reducing array of type int.
Using Device 0: "Tesla C1060"
16384 elements
128 threads (max)
64 blocks
Average time: 0.025320 ms
Bandwidth: 2.588309 GB/s
../../bin/linux/release/reduction --kernel=5 --n=16384it's better! (This is a moment when we can give big yay for Nehalem technology :] )
Reducing array of type int.
Using Device 0: "GeForce GTX 295"
16384 elements
128 threads (max)
64 blocks
Average time: 0.021630 ms
Bandwidth: 3.029865 GB/s
If we compare shear power: Tesla is capable of 936 Gflops using 180W of energy under load, while GTX295 2*894=1788 Gflops using 330W! Furthermore, the difference in price is enormous! Tesla costs 1500€, while GTX295 - 400€.
I slowly begin to wonder why people use Tesla C1060 at all? Maybe cause it's easier to program single GPU card with lots of memory (Tesla has 4GB DDR3), than put a little effort into developing MPI+CUDA codes... Time will show which strategy will prevail.
środa, 12 sierpnia 2009
No more IDL...
Recently I've got fed up with IDL 6.2. The list of things that have eventually pushed me over the edge is as follows:
- lack of HDF5-1.8 support (not entirely true, if you browse thoroughly hdfgroup's server you can find updated DLMs )
- lack of HDF5 compression support (unfortunately the previous remark does not apply)
- IDL's direct graphics requires archaic version of ''libGL.so'', which I can provide in 32bit and still have working x86_64 system, but as a result:
- I get SegFault every time sb is exiting from IDL, which nicely contaminates system's logs
- I can allocate only 4GB of memory (wall that Dominik hit just recently)
I've reached the point when I couldn't visualize my data anymore. But after some googling I've found a solution: Pytables & Matplotlib. The results are astounding and come with ease, not mentioning that now I have plots as a vector graphics... See for yourselves and sense the difference ;)
Subskrybuj:
Posty (Atom)