Cloud mqtt outage
Apologies to anyone who had mqtt issues this evening.
It looks like (atm) that we had a some sort of DOS attack on mqtt authentication.
I will be reviewing stuff, but it caused major internal issues and I may need to decouple the mqtt feed further from our core systems.
This will inevitably make the mqtt more delayed and less reliable.
I will advise you before I make any major changes.
Comments
Sorry to say we have had another attack - I have disabled mqtt so I can get some sleep - I will revisit in the am.
@clivee thanks for confirming I wasn't going mad. I have a process that's always reading my MQTT data.
I read a report on the Home Assistant forum that one users feed started back up again at 2 a.m. My data has not resumed yet, can you advise when MQTT should be back up please?
I really thought I'd messed up. I was playing with the AUTH section of the API trying to get a process of an automatic token renewal working when around 21:25 I was not getting anything back via MQTT. The app was also really struggling and couldn't see my usage or my devices and I thought I'd seriously broken something until I saw this around 23:00 and the panic subsided!
@clivee Thanks for the notification. Does this second attack force your hand over your comments on August 24th:
"I will be reviewing stuff, but it caused major internal issues and I may need to decouple the mqtt feed further from our core systems. This will inevitably make the mqtt more delayed and less reliable."
If so what does this mean to the end users? e.g. removal or delay on things like "instant power"?
'If so what does this mean to the end users? e.g. removal or delay on things like "instant power"?'
There may be another second or so before you get the data
My feed resumed at 09:45
@clivee Thanks for all your efforts on this Clive. I use the API as a backup to MQTT, so was able to get most of the import meter data (just not export meter, like you can with mqtt).
Thanks for your work on this @clivee - I'll keep an eye on this.
@DMoore - What are you using to extract the data via the API, I'd like to implement something similar as a failover so that if MQTT goes offline, I can spin up a node to extract it via the API and publish it back myself to my internal MQTT broker for Home Asistant to pickup, then when data comes back from Glow via MQTT it will turn off...
@mrstreeter I use the API GET/resource/{id}/meterread as I only need this information for usage every five minute (I calculate the difference between the 1st and 2nd reads to get actual import kWh). It's good for Import, but doesn't support Export Meter Read (yet)..
Thanks Clive for your communication on this (and for the recent email).
Like others I am using the MQTT service as part of my home automation, so it would be helpful to have a clear understanding (when you have it) of how any changes might affect the current access or functionality.
Thanks Clive.
I did narrow it down to the MQTT part of the information chain when I saw the data wasn’t updating in OpenHAB last night but hadn’t tried to troubleshoot it further. Good to know there was a reason behind the outage and that you are looking to prevent recurrence.
Are things offline again? Cant seem to connect to the mqtt server.
no I think that's just you - certainly working for me
Bagh! Yes, all me!
Some numbskull (me again) forgot that he's had his electric turned off on one circuit today (which includes the CAD)...
Doh!
Hahahahahahahahah
I have NEVER done anything like that - no siree ;-)
@clivee Have you rate limited or similar the MQTT feeds? I have noticed in the last 48 hours or so that instead of near instant updates but now they are taking on average 30-45 seconds and at worst about 250 seconds.
I have just switched to another near identical MQTT broker (same config as far as I can see) but trying to rule out it not being your end before I start picking it apart this end.
Nope- nothing has changed here.
I've just done a quick couple of captures and got the following:
msg recieved, seconds since previous
11:10:06
11:10:20 14s
11:10:32 12s
11:10:45 12s
11:10:58 13s
11:11:11 13s
11:11:24 13s
11:11:36 12s
11:11:49 13s
Thanks to both of you for that.
I'll have a nosy and see what's different this end
Hi - is MQTT still down, or should it be working again? I'm a new user so just need to know if my connection issue is because I'm doing something wrong or not!
mqtt is working fine it was down for a few hours weeks ago.
I would suggest the issue is yours
Has there been any sort of TLS updates done on the MQTT node?
I can connect ok from a system client (MQTT Explorer on OSX) however am now getting TLS errors when trying to connect using the
eclipse-mosquitto:2.0.12
docker image which I use to provide a bridge to Home Assistant...Performing the following command within that container now gets the TLS error:
Note: this is using the
--insecure
flag which should disable all TLS verification - but still gets an error, I have tried with the2.0.12
image,2.0.10
and2.0.8
(Latest, My last known working and a few before that)Noted this stopped working as of ~0100 on 2nd October, this is when data stopped arriving into Home Assistant via the bridge. I was away until today so was unable to debug intil now.
I've reverted my mosquitto conf to use port
1883
instead of the TLS port, which has got it working again, but be interesting to determine what may have caused it.Verifiction of the certificate seems to work as far as I could tell whilst on a planning call with work, dates are ok... unsure at this time.
It will be (probably) the change in root cert for letsencrypt
'https://scotthelme.co.uk/lets-encrypt-old-root-expiration/'
Make sure your root ca store is up to date.
Yeah, that'll be it for sure!
Ok, As this is a mainstream image I'll await updates from the maintainer as apposed to try and roll my own fix.
I've just rebuilt the
eclipse-mosquitto
image with a differing base alpine image of their latest3.14.2
and that doesnt fix it... so doesnt look like alpine have updated their base image with the updated TLS - which should be unlikely as you'd like to think this was picked up... but right now (still on my call) I've not got the focus available to really look at it in depth.Thanks though - I'll keep my eyes on things, and its running in Non-TLS on 1883 at the moment ok.
Has this failed again ? It seems that data stopped around 8pm and there is nothing showing on the Bright App and I have an IHD.
I too have lost the MQTT feed at 8 p.m. on 12 October
Ditto for me too.
I can confirm this is also not working for me - both MQTT & Bright App.
Same here...last reading via MQTT at 8pm