Concluding the story

OK, it’s taken me a few months to get around to finishing the story, but it turns out that it definitely wasn’t fewer changes that helped us get through the backlog quicker last September. Looking at the number of changes for the whole month of September, we can see that there were actually more this latest time around.

Chart showing monthly updates to Grouper groups, by stem, from August 2016 to September 2018

Chart showing monthly updates to Grouper groups, by stem, from August 2016 to September 2018

This leads me to conclude that my theory about PSP having less to do is most likely correct.

I don’t dream of Grouper, but if I did …

A couple of weeks ago I said we had nothing to lose by trying out a new Grouper to Active Directory group membership provisioning mechanism. Well, it turns out we actually had a lot to gain. The new solution has worked better than I could’ve dreamt. (For the record, I don’t dream of Grouper; I have been asked!)

Despite another huge number of corporate data changes at the start of September, we have not had any membership changes waiting more than a day to be provisioned and now, on day 11, we actually have no backlog at all so PSP is back to “real time” provisioning.

Chart showing monthly updates to Grouper groups, by stem, from March 2016 to September 2018

Chart showing monthly updates to Grouper groups, by stem, from March 2016 to September 2018

The fact that we have got through the entire backlog at least a week sooner than last year has surprised me, but I have a theory as to why this might be.

The obvious conclusion, which you could jump to by looking at the chart, is simply that there have not been as many changes for PSP to process this year.

There might be something in that but I also think the monthly view could be slightly misleading. When broken down by week, you can see that last year’s peak is not that much taller than this year’s – it’s just that we’re only part way through September at the moment.

Chart showing weekly updates to Grouper groups, by stem, from September 2017 to September 2018

Chart showing weekly updates to Grouper groups, by stem, from September 2017 to September 2018

My theory is that the highly effective new method of updating AD group memberships with changes in Grouper, that we’ve been using since the end of August, has allowed PSP to run through its backlog far quicker, as it’s actually had less work to do.

The PSP provisioning technology works in a three step process: ‘calc’, ‘diff’, ‘sync’.

  1. ‘Calc’: It firstly, calculates how the AD group should look, after the change from the change log has been applied.
  2. ‘Diff’: It then works out the difference between how the AD group should look and how it does look, and what needs to be done so that there is no difference.
  3. ‘Sync’. Finally, it synchronises the groups by applying the output from the the ‘diff’ step.

This is all relatively time-consuming. By using the new solution to synchronise group memberships before PSP gets around to trying, PSP has less to do as it only has to complete the ‘calc’ and ‘diff’ steps for each change and can therefore race through the change log at a much faster pace.

Nothing to lose

As a wise sage so profoundly opined, “when you ain’t got nothing, you got nothing to lose”. Thankfully, unlike Dylan’s unfortunate protagonist, we don’t have nothing but, like Miss Lonely, we do have nothing to lose.*

In case you haven’t yet realised (and I appreciate it might not be too obvious), I’m alluding to plans for lessening the impact of the corporate data changes in Grouper at the nominal changeover of the academic years.

Like a rolling stone gathering no moss, so PSP will carry on regardless. And thus we do not have nothing. But we know from experience that what we do have will be very slow for a while. And so we have nothing to lose by implementing a new, complementary method of synchronising group memberships between Grouper and Active Directory. After extensive testing and a few improvements, we believe we are now in the favourable position of being able to put this new solution into production.

This means that membership changes to existing Grouper groups will be reflected in AD much sooner than we would have otherwise expected. We know that this new method is going to be a lot quicker. What we do not know is exactly how long it will all take. We have not been able to test the sheer volume of group membership changes that will be required. We will endeavour to maintain lines of communication to keep anyone who is interested informed with progress.

We’re not selling any alibis and we’ve got no secrets to conceal, so I think it’s also worth mentioning what this new solution will not do: it will not create any new groups in Active Directory and it will not process changes to group names or descriptions. These will have to wait for PSP to catch up.

Also, as a spin-off from this new solution, I’ve developed a nice little means of synchronising the membership for a specific group. This should prove to be much quicker and more reliable than the existing mechanism for doing this.

————-

*(Sorry, I really shouldn’t write these things late at night.)

A New Hope

With the new academic year fast approaching, we were hoping to be able to avoid a return of the delays in Grouper to Active Directory provisioning we’ve suffered for the last two years. Salvation seemingly lay in the hands of Grouper’s next generation provisioning technology but, following a saga longer than a pod race and more twisted than Darth Vader’s mind, we’ve concluded that PSP-NG is still not quite production-ready.

But was that our last hope? No, there is another.

I’ve recently begun working on something I’d been thinking about for a while. It’s not a replacement for the PSP technology but I believe it can complement it and significantly alleviate the impact of the inevitable provisioning backlog at the start of September.

Using Talend, the force behind much of the Institutional Data Feed Service, I plan to interrogate the Grouper change log to find out which groups that are provisioned to AD have had membership changes. Then, for each of those groups, I can query the Grouper database to find the complete current membership list for those groups. After a bit of jiggery-pokery, I can then push the full list of members into the corresponding group in AD.

More testing is required but I’m confident that this will be a good addition to our resistance to the problem; perhaps the most powerful weapon in our arsenal of workarounds.

This is just a prequel; you can expect the next episode before the end of the month, where we will let you know whether or not we are in a position to make this new weapon fully operational.

A delay, you say?

It’s been pointed out to me that the impact of the Grouper to AD provisioning delays is actually not widely understood. I’ll try to sum it up here as explicitly as I can but please feel free to ask if I haven’t explained anything as well as you would’ve liked!

The delay is in provisioning new groups, changes to groups and changes to group memberships from Grouper to AD. There is no impact to existing groups and group memberships.

So, what could be affected by this?

Anything that uses AD groups in the ‘GrouperGroups’ OU for access control could be affected. Some of the things these groups are used directly to control access to include (but are not limited to) shared filestore, mailing lists, wifi, calendars, printers, PCs, software and the new Rocket HPC system (currently in pilot, I believe).

Additionally, AD’s GrouperGroups groups show up as a Shibboleth attribute which can be used to restrict access to any resources protected by the Login Gateway to a specific group of people. Known uses for this include Microsoft Imagine (formerly Dreamspark), some internal websites and some holiday booking systems. There could, of course, be others.

And what is definitely not affected?

There is no delay internal to Grouper so group memberships within Grouper are up to date. Anything relying on Grouper groups directly or via data feeds, such as some features of the mobile app or Chubb access to some buildings, will be fine.

What does all this actually mean?

Firstly, I hope that anyone who has chosen to use (or inherited) Grouper as an access control component of their service knows and understands how Grouper fits into their particular picture; if it includes AD then access to the service could be affected for new users.

It’s something to consider if an end user reports an access issue. For example, if a new member of staff can’t connect to their team’s shared filestore or they’re not receiving emails to a mailing list they should be on then the chances are they are a victim of the delay. If, however, they find that their smartcard won’t let them into Merz Court then that will be caused by something else.

I hope this has helped to clear up the impact of the delay but if not please let us know!

New academic year (and the trouble it brings)

It’s now the middle of September so I feel I owe you an update on the issues we’re facing around provisioning Grouper group memberships to AD. Since 1st September we’ve not had any “real-time” provisionsing to AD. We have, again, offered some workarounds to alleviate the impact of the most urgent cases. I’d like to thank the Operations team for their help with this.

This wasn’t unexpected, of course; it happened last year and we knew it would happen again. Michael sent a couple of warning emails out in advance and I tried to prepare everyone I spoke to.

I’ll try to explain the reasons why we see this delay in Grouper to AD provisioning: it’s purely to do with the volume of membership changes that occur at the change over of academic year (August to September).

Chart showing monthly updates to Grouper groups, by stem, from March 2016 to September 2017

Chart showing monthly updates to Grouper groups, by stem, from March 2016 to September 2017

You can see from the chart that there have been even more changes to group memberships this year than last year, meaning that the provisioning delay has been longer. We’re currently only half way through September but there have already been over a million membership changes! Most of these are in the student Corporate Data, particularly module enrolment groups.

There are two main reasons why there are substantially more changes this year than last: firstly, Grouper usage has increased considerably over the last year; and secondly, the SAgE reorganisation necessitated the addition of many new student Corporate Data groups.

The issue around the volume of changes is further exacerbated by a bug in Grouper’s provisioning technology which means that every change in Grouper must be processed even though it’s only changes in the Applications stem which are actually pushed through to AD.

We were hoping to be able to escape the delays this year by upgrading Grouper to take advantage of their next generation provisioning technology, which is much faster and also doesn’t suffer from that bug. Unfortunately, after extensive testing and collaboration with the developers, we decided that it was not quite production-ready for deployment in an enterprise environment. I’m very hopeful that we will be able to upgrade before this time next year!

So, onto the backlog … Well, once we’d seen the number of changes to process, estimates started off at well over 20 days. Over the last week, however, we’ve seen processing times increase considerably. We believe this is due to the work of ISG in modernising the AD domain controller infrastructure. This means that my current estimate is that the backlog should be cleared sometime on Monday … just in time for registration!

Grouper to AD provisioning backlog

If you’re reading this you’re probably already aware that we have been suffering from a delay in provisioning group memberships from Grouper to AD. Those of you with long memories may recognise that this kind of thing has happened before, at the academic year switch over, in 2016.

First, the good news: this backlog is not as severe as we experienced in September last year. In fact, I’ve updated my chart to prove it! At the time of writing this, I expect the backlog to be cleared completely over the weekend.

Chart showing monthly updates to Grouper groups, by stem, from March 2016 to August 2017

Chart showing monthly updates to Grouper groups, by stem, from March 2016 to August 2017

The spike in July 2017 that caused the backlog is due to an unusual level of activity in preparation for the reorganisation of the SAgE faculty, on 31st July. It’s also worth remembering that there is only 2 days of data included in the August bar.

Now for the bad news: we are likely to experience another substantial delay in AD provisioning at the start of September.

We’re still hopeful of being able to upgrade Grouper to use it’s new provisioning technology and avoid any kind of backlog but we are fearful that the new technology is not quite production-ready yet. This is something that Michael and I are working on at the moment, with some help from the Grouper developers.

If we can’t solve this problem with an upgrade, we’ll mitigate the impact as best we can. You’ll hear more soon.

Creating new groups

I thought I’d already written about this but it seems I haven’t! (I can’t find it at any rate, which is just as bad.)

This post is for people who create groups in the Applications stem of Grouper and want them to be provisioned to AD or to be available as Shibboleth attributes.

There is a known error in this version of Grouper whereby the automatic provisioning only works if at least one member is added to the group before the provisioning service makes its first attempt to provision to AD; empty groups will not be provisioned.

In practice, this means that you must add a member to a group within about 45 seconds of creation of that group or provisioning (and subsequent updates) will fail for that group.

My tip here is simply to add yourself immediately to any new group you create in the Applications stem. This will ensure that it is picked up by the provisioning process. You can then correct the membership at leisure.

If you do happen to fall foul of this trap, don’t fret; we can fix it for you. Just contact the Service Desk (or log your own ticket in NU Service) explaining what’s happened and please include the full ID path of the group.

I’d also like to take this opportunity to remind you not to use spaces and slashes in group and folder IDs; please replace them with underscores.

Good news for staff onboarding

One of the most common queries (and, dare I say, criticisms) we get with Grouper is about why a new member of staff is not yet in Grouper*. Expectations of our IT systems and services are so high that everyone expects everything to be set up and available ready for new staff the moment they walk through the door on their first day.

Previously, due to the timing that we receive ‘current’ staff data from SAP HR**, we weren’t loading new staff into Grouper until the night after their first day.

Well, not any more!

I’ve rewritten the way we consume SAP HR data so we now identify ‘current’ staff through a different mechanism, without the need to refer to our previous source of this information. This means that we now know, from midnight each night, who is a current member of staff for the forthcoming day. This also applies to internal staff moves; we should always have the correct, current data relating to department, job title, etc. This data is now available to any new data feeds that need up to date information on current staff***.

I’ve rewritten the job to load subjects into Grouper to take advantage of this so now any new members of staff should be in Grouper before they arrive, raring to go, on their first morning. The impact of this can be quite impressive if they happen to be in a department which really takes advantage of the power of Grouper. For example, if you use corporate data to control access to team mailing lists, shared filestores and internal websites (or anything else) then any new staff member will be able to access all of that information from the moment they log in to their shiny new PC.

(And, by the way, let’s not forget that we were already pretty hot here! Getting access to all your resources on your second day is not to be sniffed at; when a certain ex-colleague turned up for his new job at a well-known multinational software company providing open-source software products to the enterprise community, he was asked to bring in a book to read for the rest of the week!)


* I’ve just realised that we’ve never actually documented this “issue”, despite having told many, many people about it many, many times. We did, however, cover it in our ‘Getting to Grips with Grouper’ session.

** Each evening we get the details of staff whose contracts are active that day.

*** I’ll get around to reimplementing the definition of ‘current’ for People Search and NU Service, in due course. They are both also using day old definitions of ‘current’ at the moment. UPDATE: Both People Search and NU Service are now taking advantage of this development.

Grouper performance update

On Monday, the Grouper UI was, at times, unusably slow. This, understandably, was a great inconvenience to several people. For that, I apologise. It was unforseen but we now understand the reasons and have formulated a plan of action.

Firstly, the reason – why did this happen? Well, the eagle-eyed amongst you may have spotted that Monday was the first day of teaching for the majority of our students. OK, I hear you say, but what does that have to do with Grouper? Well, it’s actually down to the phenomenal popularity of the Newcastle University mobile app. The way the app is currently architected, there is a web service call to the Grouper API every time a student logs in to the app or they refresh their news feed.

Chart showing a large spike in load on the mobile app at start of term.

Chart showing a large spike in logins to the mobile app at start of term

The chart above shows the spike on Monday, when new students downloaded the app for the first time and returning students logged in to see their timetables. The resulting spike in calls to the Grouper API were too much for the poor little server to handle, with the database process maxing out the CPU.

So, what are we doing about it? Well, we’re not just going to stand here and do nothing but apologise. Our approach is three-pronged (rather like a fancy dessert fork):

  1. We’re going to add more CPU to our Grouper server.
  2. We’re working with Mike, Mike, Andy and Marc to redesign the data architecture behind the app, using purpose-built RESTful web services from IDFS and removing Grouper from the equation.
  3. We’re continuing with our Grouper upgrade plans.