Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Events no longer sent to application service, and no way to debug #13142

Closed
aggieben opened this issue Jun 30, 2022 · 16 comments
Closed

Events no longer sent to application service, and no way to debug #13142

aggieben opened this issue Jun 30, 2022 · 16 comments
Labels
A-Application-Service Related to AS support T-Other Questions, user support, anything else.

Comments

@aggieben
Copy link

Description

Copying from #5957 (comment)

I'm trying to write an appservice, and when I first started I was getting transaction PUTs. The request would fail (the AS I'm writing is very much WIP), and I'd see logs emitted by the scheduler indicating increased waiting periods for trying again. But now I'm in a state where it doesn't even try, and I haven't been able to figure out how to make it try. #1834 isn't my issue as appservice_stream_position.event_ordering is the same as events.stream_ordering in my case.

After poking around in the DB on my own trying to make sense of it, I no longer have a row in application_service_state (similarly to #5957 (comment)), even though synapse does still recognize it at startup. Restarting synapse hasn't made any difference.

2022-06-27 00:16:34,137 - synapse.config.appservice - 90 - INFO - main - Loaded application service: ApplicationService: {'token': '<redacted>', 'url': 'http://localhost:5000', 'hs_token': '<redacted>', 'sender': '@neofarkas:cit.chat', 'namespaces': {'users': [Namespace(exclusive=True, regex=re.compile('neofarkas.*'))], 'aliases': [], 'rooms': []}, 'id': 'dc863c3be4a6b4eac0d881b2df244920', 'ip_range_whitelist': None, 'supports_ephemeral': False, 'msc3202_transaction_extensions': False, 'protocols': set(), 'rate_limited': True}

I haven't yet even figured out a way to begin troubleshooting, really.

One update from the original comment: I left everything running for a couple days and after I came back to it there was one more attempted PUT to my AS, but it failed and there haven't been any further attempts, and no logs indicate what the problem is.

Steps to reproduce

I have a homeserver that is configured mostly with defaults; no identity service is running, and I don't even understand most of the commented-out elements in the configuration file.

I did create and configure a registration for my appservice, and the homeserver appears to be recognizing the registration YAML correctly.

Homeserver

cit.chat

Synapse Version

1.61.0

Installation Method

Debian packages from packages.matrix.org

Platform

Azure VM, Standard B1ms (1 vcpu, 2 GiB memory)
Ubuntu 20.04

Synapse running under systemd. Current status:

● matrix-synapse.service - Synapse Matrix homeserver
     Loaded: loaded (/lib/systemd/system/matrix-synapse.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2022-06-28 20:02:08 UTC; 1 day 20h ago
   Main PID: 26621 (python)
      Tasks: 20 (limit: 2289)
     Memory: 353.3M
     CGroup: /system.slice/matrix-synapse.service
             └─26621 /opt/venvs/matrix-synapse/bin/python -m synapse.app.homeserver --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matr>

Jun 28 20:02:05 cit-matrix-chatty matrix-synapse[26615]: Found file in config directory that does not end in '.yaml': '/etc/matrix-synapse/conf.d/slack-regis>
Jun 28 20:02:07 cit-matrix-chatty matrix-synapse[26621]: This server is configured to use 'matrix.org' as its trusted key server via the
Jun 28 20:02:07 cit-matrix-chatty matrix-synapse[26621]: 'trusted_key_servers' config option. 'matrix.org' is a good choice for a key
Jun 28 20:02:07 cit-matrix-chatty matrix-synapse[26621]: server since it is long-lived, stable and trusted. However, some admins may
Jun 28 20:02:07 cit-matrix-chatty matrix-synapse[26621]: wish to use another server for this purpose.
Jun 28 20:02:07 cit-matrix-chatty matrix-synapse[26621]: To suppress this warning and continue using 'matrix.org', admins should set
Jun 28 20:02:07 cit-matrix-chatty matrix-synapse[26621]: 'suppress_key_server_warning' to 'true' in homeserver.yaml.
Jun 28 20:02:07 cit-matrix-chatty matrix-synapse[26621]: --------------------------------------------------------------------------------
Jun 28 20:02:07 cit-matrix-chatty matrix-synapse[26621]: Config is missing macaroon_secret_key
Jun 28 20:02:08 cit-matrix-chatty systemd[1]: Started Synapse Matrix homeserver.

Relevant log output

2022-06-27 00:16:34,137 - synapse.config.appservice - 90 - INFO - main - Loaded application service: ApplicationService: {'token': '<redacted>', 'url': 'http://localhost:5000', 'hs_token': '<redacted>', 'sender': '@neofarkas:cit.chat', 'namespaces': {'users': [Namespace(exclusive=True, regex=re.compile('neofarkas.*'))], 'aliases': [], 'rooms': []}, 'id': 'dc863c3be4a6b4eac0d881b2df244920', 'ip_range_whitelist': None, 'supports_ephemeral': False, 'msc3202_transaction_extensions': False, 'protocols': set(), 'rate_limited': True}

Anything else that would be useful to know?

Important to remember that the appservice I'm working on is in development, so it's expected that sometimes requests from the homeserver are going to fail.

@anoadragon453
Copy link
Member

Hmm, do you have any rows containing your appservice ID in application_services_txns? This table contains any unsent transactions towards your AS.

If you are sending events that should indeed be forwarded to your appservice, but none are being sent, then my theory is that unsent transactions in this table may be blocking new ones.

Additionally, have you defined any experimental_features options in your homeserver config? There are a few known problems with those related to sending ephemeral events to appservices.

@anoadragon453 anoadragon453 added the X-Needs-Info This issue is blocked awaiting information from the reporter label Jul 4, 2022
@aggieben
Copy link
Author

aggieben commented Jul 4, 2022

I have nothing in application_services_txns at all, nor do I have anything in experimental_features. I didn't have a row in application_services_state either until recently when I inserted a row manually for my appservice, like this:

insert into application_services_state (as_id, "state")
values ('dc863c3be4a6b4eac0d881b2df244920', 'up')

This hasn't caused any change in behavior.

@aggieben
Copy link
Author

aggieben commented Jul 7, 2022

What else can I do to diagnose?

@DMRobertson
Copy link
Contributor

I'd suggest turning Synapse's logging up to DEBUG level, if you haven't already.

@aggieben
Copy link
Author

aggieben commented Jul 8, 2022

Yep, I've had it on DEBUG.

image

@aggieben
Copy link
Author

Any other things I can do? Still stuck.

@squahtx squahtx removed the X-Needs-Info This issue is blocked awaiting information from the reporter label Jul 13, 2022
@DMRobertson DMRobertson added the A-Application-Service Related to AS support label Jul 14, 2022
@DMRobertson
Copy link
Contributor

DMRobertson commented Jul 14, 2022

Can you do the following please?

  1. Restart Synapse with your appservice connected
  2. Create a new event in a room (e.g. send a new message)
  3. Wait for 60 seconds

If you can then send us the logs from the restart up to the end of step 3, we can take a look.

(Given that they're debug logs, they will likely contain sensitive information that you may not want to publish. If so, please email them to davidr@element.io or send over matrix at @dmrobertson:matrix.org.)

@DMRobertson DMRobertson added the X-Needs-Info This issue is blocked awaiting information from the reporter label Jul 14, 2022
@DMRobertson
Copy link
Contributor

Also:

After poking around in the DB

was this just read-only poking, or were you altering/adding/delting rows during your investigation?

@aggieben
Copy link
Author

@DMRobertson mostly read-only. At the very beginning when I started to experience this issue, there was a row in the application_services_state for my appservice, so I deleted it to see if that would cause Synapse to "re-initialize" anything. It apparently did not 😅

So after being stuck for a while longer, I inserted a row there again to see if anything would change, and it didn't. Other poking was read-only.

I'll try the steps you suggested above.

@aggieben
Copy link
Author

Sent you the log and related info in chat. Thanks for your willingness to help!

@DMRobertson
Copy link
Contributor

Thanks for the logs. I'm really not familiar with how application services work, but this looks to be the function which pushes events to ASes. There are no matches for push_bulk in your log sample, but there should be at least one for each push_bulk call from these three branches:

try:
await self.put_json(
uri=uri,
json_body=body,
args={"access_token": service.hs_token},
)
if logger.isEnabledFor(logging.DEBUG):
logger.debug(
"push_bulk to %s succeeded! events=%s",
uri,
[event.get("event_id") for event in events],
)
sent_transactions_counter.labels(service.id).inc()
sent_events_counter.labels(service.id).inc(len(serialized_events))
sent_ephemeral_counter.labels(service.id).inc(len(ephemeral))
sent_todevice_counter.labels(service.id).inc(len(to_device_messages))
return True
except CodeMessageException as e:
logger.warning(
"push_bulk to %s received code=%s msg=%s",
uri,
e.code,
e.msg,
exc_info=logger.isEnabledFor(logging.DEBUG),
)
except Exception as ex:
logger.warning(
"push_bulk to %s threw exception(%s) %s args=%s",
uri,
type(ex).__name__,
ex,
ex.args,
exc_info=logger.isEnabledFor(logging.DEBUG),
)

This is called by send. So we need to understand the conditions under which send is(n't) called.

@DMRobertson
Copy link
Contributor

We can see that Synapse is looking to see if there are any events it should send to ASes:

* 2022-07-14 16:47:12,113 - synapse.storage.txn - 651 - DEBUG - notify_interested_services-0 - [TXN START] {get_new_events_for_appservice-1cb}
* 2022-07-14 16:47:12,114 - synapse.storage.SQL - 371 - DEBUG - notify_interested_services-0 - [SQL] {get_new_events_for_appservice-1cb} SELECT e.stream_ordering, e.event_id FROM events AS e WHERE (SELECT stream_ordering FROM appservice_stream_position)     < e.stream_ordering AND e.* stream_ordering <= ? ORDER BY e.stream_ordering ASC LIMIT ?
* 2022-07-14 16:47:12,114 - synapse.storage.SQL - 379 - DEBUG - notify_interested_services-0 - [SQL values] {get_new_events_for_appservice-1cb} (23437, 100)

Let's assume for now that this does pick up an event to send to your AS. Each such event is only sent to appservices which are "interested" in that event.

async def _get_services_for_event(
self, event: EventBase
) -> List[ApplicationService]:
"""Retrieve a list of application services interested in this event.
Args:
event: The event to check.
Returns:
A list of services interested in this event based on the service regex.
"""
services = self.store.get_app_services()
# we can't use a list comprehension here. Since python 3, list
# comprehensions use a generator internally. This means you can't yield
# inside of a list comprehension anymore.
interested_list = []
for s in services:
if await s.is_interested_in_event(event.event_id, event, self.store):
interested_list.append(s)
return interested_list

The logic which controls whether or not an AS is interested in an event is here.

@cached(num_args=1, cache_context=True)
async def is_interested_in_event(
self,
event_id: str,
event: EventBase,
store: "DataStore",
cache_context: _CacheContext,
) -> bool:
"""Check if this service is interested in this event.
Args:
event_id: The ID of the event to check. This is purely used for simplifying the
caching of calls to this method.
event: The event to check.
store: The datastore to query.
Returns:
True if this service would like to know about this event, otherwise False.
"""
# Check if we're interested in this event's sender by namespace (or if they're the
# sender_localpart user)
if self.is_interested_in_user(event.sender):
return True
# additionally, if this is a membership event, perform the same checks on
# the user it references
if event.type == EventTypes.Member and self.is_interested_in_user(
event.state_key
):
return True
# This will check the datastore, so should be run last
if await self.is_interested_in_room(
event.room_id, store, on_invalidate=cache_context.invalidate
):
return True
return False

@DMRobertson
Copy link
Contributor

In short an AS will only get told about events:

  • if the AS is interested in the event's sender;
  • if the event is a membership change for a user the AS is interested in; or
  • if the event is in a room that the AS is interested in.

From the configuration posted in this issue's description:

2022-06-27 00:16:34,137 - synapse.config.appservice - 90 - INFO - main -
 Loaded application service: ApplicationService: 
{
  'token': '<redacted>',
  'url': 'http://localhost:5000',
  'hs_token': '<redacted>',
  'sender': '@neofarkas:cit.chat',
  'namespaces': {
    'users': [
      Namespace(exclusive=True,
      regex=re.compile(
      'neofarkas.*'
      ))
    ],
    'aliases': [],
    'rooms': []
  },
  'id': 'dc863c3be4a6b4eac0d881b2df244920',
  'ip_range_whitelist': None,
  'supports_ephemeral': False,
  'msc3202_transaction_extensions': False,
  'protocols': set(),
  'rate_limited': True
}

This is an application service which is interested in

  • Its sender user, @neofarkas:cit.chat
  • Any user whose matrix ID matches the regex neofarkas.*. (It claims to have control over this namespace!)
  • No rooms.

So I think you're not seeing events sent to that application service, because your application service hasn't registered an interest in seeing events for anyone whose ID doesn't begin with neofarkas. Try changing the config so that e.g. your app service listens to all users on cit.chat with

namespaces:
  users:
    exclusive: false
    regex: ".*:cit.chat"

and let me know if that helps.

(The docs https://matrix-org.github.io/synapse/latest/application_services.html could probably be improved. It speaks of a group_id which I don't recognise from the source code??!)

@DMRobertson DMRobertson added this to the Revisit: Next Week milestone Jul 17, 2022
@DMRobertson
Copy link
Contributor

@aggieben: Have you managed to make progress here by changing the config?

@aggieben
Copy link
Author

@DMRobertson yep, I updated the config as follows:

namespaces:
  users:
    - exclusive: true
      regex: "neofarkas.*"
  aliases:
    - exclusive: false
      regex: "#test:cit.chat"
  rooms:
    - exclusive: false
      regex: "!HeFbnNRpcFVuWrWLOI:cit.chat"

The alias and rooms regexes both indicate the same #test room. Messages from that channel are successfully posted to my application service now. Also, your suggestion (from DM) of using this query (UPDATE application_services_state SET state = 'up' where as_Id = '<SERVICE_ID>';) to make sure the messages are posted while I'm actively developing was helpful.

@DMRobertson DMRobertson removed this from the Revisit: Next Week milestone Jul 28, 2022
@DMRobertson
Copy link
Contributor

Great, thanks for following up.

@DMRobertson DMRobertson added T-Other Questions, user support, anything else. and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels Jul 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Application-Service Related to AS support T-Other Questions, user support, anything else.
Projects
None yet
Development

No branches or pull requests

4 participants