Author Archives: nkb130

The Rocket HPC service is NOW LIVE

We are very happy to announce that the Rocket HPC service is NOW LIVE!

This means:

Many thanks to everyone who has filled in detailed project registrations, worked with us to solve teething problems and/or contributed feedback on the service. We are still working to improve the service, and recent updates include:

– Matlab is now available, including the parallel toolbox that permits multi-threaded jobs and other standard toolboxes. The Matlab Distributed Computing Server is not available.

– you may now have up to 10,000 ‘active’ jobs (queued, running or recently finished) in SLURM, and array jobs can contain up to 10,000 elements (0-9999).

– temporary storage on /scratch is now usable on all nodes and $TMPDIR is set to this location.

For projects that are waiting for their registration to be completed: we are completing registrations when we can and most delays are related to software requests. Please bear with us, we know you are waiting.

As always, most of the information about Rocket is on the HPC web pages at http://www.ncl.ac.uk/itservice/research/hpc/ . Please get in touch via the service desk if you have any other queries.

Rocket HPC service due to be LIVE on 31st October

The Rocket HPC pilot phase is due to end on 31st October and the service will then be considered LIVE. This has been a team effort and we are more than grateful to all the NUIT staff, pilot users and other people who have helped us get to this point.

For projects that have not yet registered to use Rocket or that are waiting for their registration to be completed, there will be no change: we are completing registrations when we can and most delays are related to software requests. Please bear with us, we know you are waiting.

For projects that have already been added to Rocket, the move to a live service means:

– project PIs will be able to add more members to their group and instructions will be available on http://www.ncl.ac.uk/itservice/research/hpc. Please do not try this just yet – it may not work well.

– the service will no longer be subject to interruptions at short notice.

– queries and requests will be handled as service desk tickets.

As always, most information about Rocket is on the web pages at http://www.ncl.ac.uk/itservice/research/hpc. Please get in touch if you have any another queries.

Pilot phase underway: time to register your projects for access

The Rocket HPC service now has its first pilot users registered on the system and starting to run jobs. Some issues have already come to light and we are working to resolve those. Our thanks go to the pilots for working with us on this.

The pilot phase is intended to be fairly brief and aims to assess the system’s performance as well as unearthing issues. We then expect to start widening access towards the end of the month. To help us plan for this, you may now register projects for the HPC service at http://www.ncl.ac.uk/itservice/research/hpc/hpcregistration/

Registering now will let you tell us about your needs (e.g. for software applications) so that we can plan the work we need to do, and will help speed your access onto the system.

As always, please keep an eye on the HPC web pages at http://www.ncl.ac.uk/itservice/research/hpc for information about the service and how to use it, and get in touch if you have any queries or feedback.

Update on Rocket’s progress

Over the past month, NUIT have worked with our suppliers to configure Rocket and integrate it into the University’s network and systems, and to test its performance. Once NUIT have completed some more of the work needed on our side to manage Rocket as a service, we will be running a short pilot phase with four volunteers who will be helping us iron out any initial problems and assess the system’s performance for a variety of ‘real’ job types.

During the pilot phase we will also start accepting HPC project registrations. When the service is live, all access to Rocket will be controlled by membership of projects, which may be established research projects, taught programs or other groupings – they need not be funded research programmes. Project registration will be via an online form: see http://www.ncl.ac.uk/itservice/research/hpc for more details.

We hope to start widening access to the cluster in September. Watch this space.

The HPC project enters a new phase

Hardware delivered
The HPC hardware has now been delivered to Newcastle and is being configured by the suppliers, OCF. This is a new phase for the project team, who have been working with OCF on the details of the system configuration, and with the HPC steering group on the overall design of the HPC service and policies for its use.

New staff
The HPC team welcomes Gary Wright, who has joined NUIT as HPC system administrator. Gary will be responsible for the day-to-day administration of the HPC system.

Service name
The HPC service will be known as Rocket, in honour of the iconic steam locomotive that was designed by George and Robert Stephenson and was built in Newcastle. The Rocket locomotive was built for speed and brought together a number of innovations, winning the Rainhill speed trials and setting the design template for future steam locomotive engines. It marks the role of Newcastle and local people, particularly George Stephenson, in the innovations and drive that became the railway revolution, whose impact is still felt around the world.

Service information
Look for the developing web pages on Rocket at http://www.ncl.ac.uk/itservice/research/hpc/

Good news!

The system has been ordered

An order for the new HPC cluster has now been placed with suppliers OCF.  OCF, working in partnership with manufacturers Huawei,  will provide a cluster with:

500 TB main filestore and 20 TB filestore for home directories

Mellanox EDR (100Gbits/s) high-speed interconnect

120 compute nodes, each with two Intel Xeon E5-2699 v4 22-core processors, (5280 cores in total) of which:

  • 110 ‘standard’ compute nodes each have 128 GB memory and 600 GB local storage
  • 6 ‘medium memory’ nodes each have 256 GB memory and 1.2 TB local storage
  • 4 ‘large memory’ nodes each have 512 GB memory and 8 TB local storage

An additional 2 ‘extra large memory’ nodes each have 1.5 TB memory and 9.6 TB local storage.  These two nodes each have four Intel Xeon E74830 v4 14-core processors (56 cores per node).

The cluster will run CentOS 7.  Job scheduling and use of the cluster resources will be managed by SLURM.

For programmers, Intel Parallel Studio XE Cluster Edition, and PGI (Portland) and GNU compilers will be provided.

The system should be commissioned by the beginning of July.

People

The HPC project team has been joined by a new member, Karen Bower, who has taken up the role of HPC research computing analyst.  Karen will be working to help shape the new system into a HPC service and to make sure that people have the tools and skills they need to use that service effectively.