Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converge to CloudLaunch? #6

Open
afgane opened this issue Aug 10, 2016 · 32 comments
Open

Converge to CloudLaunch? #6

afgane opened this issue Aug 10, 2016 · 32 comments
Labels

Comments

@afgane
Copy link

afgane commented Aug 10, 2016

I just came across this repo and was wondering if it would be desirable to channel the effort going into this to the new version of CloudLaunch that has basically the same feature: https://beta.launch.usegalaxy.org/public_appliances

@bgruening
Copy link
Member

Definitely - can you read my mind? :)
At the current state it's just a test and I'm collection feedback. I wanted to discuss this at first with @tnabtaf before announcing it publicly and maybe moving it into the Galaxy organisation.

I have a few ideas how we can use these informations, e.g. for finding training resources (human and metal) for GTN or generating the Public-Galaxy Server wiki page out of it. One explicit aim is to include all Galaxy instances, not only public ones. So we can generate a knowledge map.

If this idea is taking off we can create some mini-ontology to automatically filter this json file for Cloudman, GTN, ELIXIR, de.NBI and so on. I can imagine ELIXIR is interested to see how many Galaxy instances are deployed in Europe.

I hope I have not produced something redundant here, I was not aware of a geojson collection of already known Galaxy servers.

@afgane
Copy link
Author

afgane commented Aug 10, 2016

I was hoping we can go the other way and replace the public galaxy servers page with a page like this one that's more descriptive and interactive. In addition to the ideas you listed, I'd really like to see a cross-Galaxy tool search. I keep wanting to chat to @martenson about this idea since he's been working with the search functionality (@martenson thoughts?).

I guess we can ingest data from here into the CloudLaunch at some point so effort wouldn't need to be duplicated. As the features are added to CloudLaunch, the idea is that any user can add their instance by just filling out a form there. I guess we'd then have tags to include public instance, themed instances, instances accessible only to the given user, etc.

@hexylena
Copy link
Member

hexylena commented Aug 10, 2016

I think GRT is the solution to all of the problems :)

  • Serves as a Galaxy directory
  • Has a list of tools used in partcipating Galaxies
  • (As of this morning) has code for collecting IP address and galaxy name / description

What more could we want? (Serious question)

By default galaxy ships with part of a GRT configuration. Cloudlaunch could automatically register an instance for users and automatically apply tags to that instance like "public" or "private", and "cloudlaunch" and maybe even infrastructure run on would be nice to have tagged.

@martenson
Copy link
Member

martenson commented Aug 10, 2016

@afgane more than a year ago I added this to the API: https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/webapps/galaxy/api/tools.py#L67 which allows to cheaply query remote Galaxy for a tool presence. It might be around that time when we can assume a decent number of public instances is updated to have this functionality and we can build some aggregated search UI.

My idea was that you search on top of the Tool Shed list of tools with fulltext and then with the tool_id(s) you query all known public Galaxies to see where you can run the tool(set of tools).

GTR has a bit different approach as it plans to fetch all tools and then perform the search locally (right, @erasche ?).

@hrhotz
Copy link
Collaborator

hrhotz commented Aug 11, 2016

sounds very good, but what about Galaxies behind firewalls?

On Wed, Aug 10, 2016 at 7:57 PM, Eric Rasche notifications@github.com
wrote:

I think GRT is the solution to all of the problems :)

  • Has a list of tools used in partcipating Galaxies
  • (As of this morning) has code for collecting IP address and galaxy
    name / description

What more could we want?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKjM6i8hfNaKsYcA4IRGrFqL1e3Z3K9Zks5qehD0gaJpZM4JhIA5
.

@bgruening
Copy link
Member

I was hoping we can go the other way and replace the public galaxy servers page with a page like this one that's more descriptive and interactive.

You have my vote on this, but I don't want to be so disruptive again ;)

In addition to the ideas you listed, I'd really like to see a cross-Galaxy tool search. I keep wanting to chat to @martenson about this idea since he's been working with the search functionality (@martenson thoughts?).

Many thoughts :) but I think this is a different discussion I think. The entire federation idea is awesome and something we should heading to, tool search would be the logical first step I think.

I currently try to push the ELIXIR registry to implement a Galaxy tool search so if you like I would try to organise a meeting with @joncison and see where we can join forces.

I guess we can ingest data from here into the CloudLaunch at some point so effort wouldn't need to be duplicated. As the features are added to CloudLaunch, the idea is that any user can add their instance by just filling out a form there. I guess we'd then have tags to include public instance, themed instances, instances accessible only to the given user, etc.

Sounds great!

@bgruening
Copy link
Member

What more could we want? (Serious question)
By default galaxy ships with part of a GRT configuration. Cloudlaunch could automatically register an instance for users and automatically apply tags to that instance like "public" or "private", and "cloudlaunch" and maybe even infrastructure run on would be nice to have tagged.

I'm sceptical about this, haven't we learnt that any kind of automatically "call home" features, no matter how transparent it is - get's a bad reputation?
I don't want to force users/admins to submit data, and they will never be anonymous. Private tools, workflows and so on can tell a lot about your instance. Implementing filter rules can solve this, but do we want to invest so much time here and do we want everyone to configure this so extensively in the end?

For tool-search I don't think a pushing mechanism will work, we create a single-point of failure for such an important feature. I think crawling is the way to go here, Google is quite successful with this I have heard ;) - but again I had the same discussion with @joncison from ELIXIR-registry fame.

Don't get me wrong I like the original GRT idea and I think for this it is ideally suited. Some big, very public instances can collect data (and communicate this to their users) and we can improve Galaxy with it. I don't see anyone using this in the future, for this we have to many private and highly secure instances (without any internet connection, behind firewalls, etc..).

Registering (manually) pure Geo data and say "Here I'm, you can not see my Galaxy instance but I'm a Galaxy expert, get in touch!" is the lowest barrier what we can offer.

Just my 2cents, I'm completely fine with abandon this idea in favour of GRT or something else. It was a test and I appreciate the discussion - that was the aim before going public :)

@bgruening
Copy link
Member

@martenson

My idea was that you search on top of the Tool Shed list of tools with fulltext and then with the tool_id(s) you query all known public Galaxies to see where you can run the tool(set of tools).

We could use the URL from the geojson file to get all public instances. We need to communicate this properly though,

@hexylena
Copy link
Member

I'm sceptical about this, haven't we learnt that any kind of automatically "call home" features, no matter how transparent it is - get's a bad reputation?
I don't want to force users/admins to submit data, and they will never be anonymous.

Now that we also collect metadata, I'll add a flag for not submitting job/tool data. They shouldn't be forced to submit any data they don't want to, I don't want this either.

It could be that it is because I work on it and am biased in favour of GRT, but it feels much less like an automated call home, and much more like an opt-in registry. Perhaps the language needs to be updated to reflect this better.

Private tools, workflows and so on can tell a lot about your instance. Implementing filter rules can solve this, but do we want to invest so much time here and do we want everyone to configure this so extensively in the end?

Is this really such a concern? That other people learn that I have version X of tool Y? Are tool IDs and names really sensitive data? (Are there any examples of tools that someone really would do anything to not have exposed?)

For tool-search I don't think a pushing mechanism will work, we create a single-point of failure for such an important feature.

You create an SPOF either way. Either with a pulling crawler and its site to display results, or with the point that data is pushed to.

I do not see the difference until we move to a scale with multiple servers and failover/handoff logic. At that scale, neither of them suffer from the SPOF problem, but we are a long ways from that.

I think crawling is the way to go here, Google is quite successful with this I have heard ;) - but again I had the same discussion with @joncison from ELIXIR-registry fame.

A push model additionally gets us around the firewall problem. Internal-to-university galaxies can still advertise their tools (to their own users) despite the rest of the world not being able to access them.

Don't get me wrong I like the original GRT idea and I think for this it is ideally suited. Some big, very public instances can collect data (and communicate this to their users) and we can improve Galaxy with it. I don't see anyone using this in the future, for this we have to many private and highly secure instances (without any internet connection, behind firewalls, etc..).

I hear what you are saying, that there are high security instances with different priorities. In this case, GRT is no better than the manual galaxy-maps/registry + tool search engine approach here. So, we ignore these cases because they are not relevant to the discussion. For every other galaxy, that is public, that is willing to be indexed, that is willing to have a pin on a map, GRT is a fairly convenient web-form place to do this. (I'll add a pin selector today rather than just lat/lon fields, given how much people struggle with geojson's choices there.)

Registering (manually) pure Geo data and say "Here I'm, you can not see my Galaxy instance but I'm a Galaxy expert, get in touch!" is the lowest barrier what we can offer.

Registering just the location of a trainer (and not their associated galaxyies) seems to be different than what was originally proposed. Is this in scope?

Speaking of scope, should it be expanded to:

  • include the identity of the admin/trainer?
  • include contact information for them?

Just my 2cents, I'm completely fine with abandon this idea in favour of GRT or something else. It was a test and I appreciate the discussion - that was the aim before going public :)

My apologies on this front Björn, for attacking your idea outright, but yes, discussion is what we were going for, hope this did not come off too aggressively/in a mean spirit, not my intention.

@bgruening
Copy link
Member

I'm sceptical about this, haven't we learnt that any kind of automatically "call home" features, no matter how transparent it is - get's a bad reputation?
I don't want to force users/admins to submit data, and they will never be anonymous.

It could be that it is because I work on it and am biased in favour of GRT, but it feels much less like an automated call home, and much more like an opt-in registry. Perhaps the language needs to be updated to reflect this better.

Not sure about the difference ;)

Private tools, workflows and so on can tell a lot about your instance. Implementing filter rules can solve this, but do we want to invest so much time here and do we want everyone to configure this so extensively in the end?

Is this really such a concern? That other people learn that I have version X of tool Y? Are tool IDs and names really sensitive data? (Are there any examples of tools that someone really would do anything to not have exposed?)

Yes. I know about people that strictly don't want to share anything not even what they are working on, because this can give others a clue on which new technologies they are working. I'm not allowed to fill github issues for tools I will work on and such stuff :(

For tool-search I don't think a pushing mechanism will work, we create a single-point of failure for such an important feature.

You create an SPOF either way. Either with a pulling crawler and its site to display results, or with the point that data is pushed to.

I do not see the difference until we move to a scale with multiple servers and failover/handoff logic. At that scale, neither of them suffer from the SPOF problem, but we are a long ways from that.

Searching on demand (crawling) does not need such a failover it just searches, no storing needed.

I think crawling is the way to go here, Google is quite successful with this I have heard ;) - but again I had the same discussion with @joncison from ELIXIR-registry fame.

A push model additionally gets us around the firewall problem. Internal-to-university galaxies can still advertise their tools (to their own users) despite the rest of the world not being able to access them.

Now you are talking about local GRTs, isn't it? Then it's the same for a local search crawlers.

Don't get me wrong I like the original GRT idea and I think for this it is ideally suited. Some big, very public instances can collect data (and communicate this to their users) and we can improve Galaxy with it. I don't see anyone using this in the future, for this we have to many private and highly secure instances (without any internet connection, behind firewalls, etc..).

I hear what you are saying, that there are high security instances with different priorities. In this case, GRT is no better than the manual galaxy-maps/registry + tool search engine approach here.

Let's not mix up the search discussion with the "Galaxy registry discussion" here.
Galaxy-maps or the wikipage with the public Galaxy servers are just there to indicate resources (human or metal), no matter if they are private, public, secure, non-secure ....

So, we ignore these cases because they are not relevant to the discussion.

For me they are. As stated in the readme, I really want to register and map instances that are not able to submit data to the public (no matter in which way :))

This projects aims in collection all Galaxy servers across the earth. We stricly encourage all Galaxy
Admins to add their server to this map, no matter if the server is public or not. We will use this map
to guide users to you, make people and funding agencies aware of your awesome service, connect
Admins between instances more closely and build a network of potential training instances.

For every other galaxy, that is public, that is willing to be indexed, that is willing to have a pin on a map, GRT is a fairly convenient web-form place to do this. (I'll add a pin selector today rather than just lat/lon fields, given how much people struggle with geojson's choices there.)

I don't think Galaxy should overtake the registry part, for this we have the ELIXIR registry, that does way more than the registration of Galaxy instances or tools and we should support them.

Registering (manually) pure Geo data and say "Here I'm, you can not see my Galaxy instance but I'm a Galaxy expert, get in touch!" is the lowest barrier what we can offer.

Registering just the location of a trainer (and not their associated galaxyies) seems to be different than what was originally proposed. Is this in scope?

Speaking of scope, should it be expanded to:

  • include the identity of the admin/trainer?
  • include contact information for them?

My initial aim was to collect instances and finally creating one giant map of Galaxies :)

@hexylena
Copy link
Member

It could be that it is because I work on it and am biased in favour of GRT, but it feels much less like an automated call home, and much more like an opt-in registry. Perhaps the language needs to be updated to reflect this better.

Not sure about the difference ;)

The technical difference is non-existent. But clearly you had a negative perception of "automated call-home."

Yes. I know about people that strictly don't want to share anything not even what they are working on, because this can give others a clue on which new technologies they are working. I'm not allowed to fill github issues for tools I will work on and such stuff :(

I'm sorry to hear that, that is unfortunate. Science should not (have to?) be so secretive.

I think this is solved in both cases, the crawling scenario would only see the public api/tools list, the GRT scenario would use a blacklist.

Searching on demand (crawling) does not need such a failover it just searches, no storing needed.

Oh, wow, you mean dynamically searching? Whenever someone puts in a query for "bowtie", some services talks to every galaxy in the universe at once to find out who has bowtie? I'm sure the ELIXIR people have plans for this that are hopefully more sophisticated than this.

A push model additionally gets us around the firewall problem. Internal-to-university galaxies can still advertise their tools (to their own users) despite the rest of the world not being able to access them.

Now you are talking about local GRTs, isn't it? Then it's the same for a local search crawlers.

No, I wasn't. If you have internet access + a university firewall + no external access to inside, then GRT would continue to function since you can push metadata out to the central GRT with your galaxy's information.

And if you don't wish to send job logs, GRT supports just registering the name / location.

Let's not mix up the search discussion with the "Galaxy registry discussion" here.
Galaxy-maps or the wikipage with the public Galaxy servers are just there to indicate resources (human or metal), no matter if they are private, public, secure, non-secure ....

Sorry for this, it comes up with GRT since that encompasses both of these functionalities in one project.

Registering human resources not attached to a Galaxy would be a strong point in favour of a separate project to track human resources, but I am worried that it is the minority case. Most of us are admins + trainers and our training is integrally related to our administration and the presence of a public Galaxy instance on which we train people.

For me they are. As stated in the readme, I really want to register and map instances that are not able to submit data to the public (no matter in which way :))

This projects aims in collection all Galaxy servers across the earth. We stricly encourage all Galaxy
Admins to add their server to this map, no matter if the server is public or not. We will use this map
to guide users to you, make people and funding agencies aware of your awesome service, connect
Admins between instances more closely and build a network of potential training instances.

I will be curious to see how popular such a thing would be. My hypothesis is: If their galaxy is not internet connected, how often are they trying to direct people to them as a training resource? I believe that will be a minority of cases. People trying to do training would more likely have some form of public galaxy that their trainees can use, or would not have a galaxy they admin but instead just want to register them self as a human resource.

GRT supports this: registering, mapping (pub + priv), guiding users (website), central bragging point for their galaxy to funding agencies (badges with "we run #1 most jobs out of all galaxies" (a really important point, imo, GRT has user/job statistics, so it can share your ranking, and you can share that with funding agencies)), and (possibly) connecting admins. Especially if we added the admin's info to the GRT page.

I don't think Galaxy should overtake the registry part, for this we have the ELIXIR registry, that does way more than the registration of Galaxy instances or tools and we should support them.

ELIXIR was just tool registry, or ...? What new features are they adding?

I do think galaxy should handle galaxy registry. It seems very strange to be sending all of our galactic instance metadata to a completely separate organisation, separate funding, to track Galaxies across the universe.

@bgruening
Copy link
Member

I think this is solved in both cases, the crawling scenario would only see the public api/tools list, the GRT scenario would use a blacklist.
Searching on demand (crawling) does not need such a failover it just searches, no storing needed.

Oh, wow, you mean dynamically searching? Whenever someone puts in a query for "bowtie", some services talks to every galaxy in the universe at once to find out who has bowtie? I'm sure the ELIXIR people have plans for this that are hopefully more sophisticated than this.

We are mixing things up here. Please let's move the search discussion to a different thread.
ELIXIR registry is a registry storing things with an ontology in a database, searchable for everyone, you can also search in it. The Galaxy Federation idea is different imho, and searching should be also different.

This is about registering Galaxy instances, private and public ones and reusable for what ever comes to our mind because it is structure in a standard format (geojson).

Now you are talking about local GRTs, isn't it? Then it's the same for a local search crawlers.

No, I wasn't. If you have internet access + a university firewall + no external access to inside, then GRT would continue to function since you can push metadata out to the central GRT with your galaxy's information.

Ah got it!

Let's not mix up the search discussion with the "Galaxy registry discussion" here.
Galaxy-maps or the wikipage with the public Galaxy servers are just there to indicate resources (human or metal), no matter if they are private, public, secure, non-secure ....

Sorry for this, it comes up with GRT since that encompasses both of these functionalities in one project.

Back to the SPOF :)

Registering human resources not attached to a Galaxy would be a strong point in favour of a separate project to track human resources, but I am worried that it is the minority case. Most of us are admins + trainers and our training is integrally related to our administration and the presence of a public Galaxy instance on which we train people.

An other argument is that a trainer not strictly can configure Galaxy to send this data or that you need to convince your admin to send data ...

I will be curious to see how popular such a thing would be. My hypothesis is: If their galaxy is not internet connected, how often are they trying to direct people to them as a training resource? I believe that will be a minority of cases. People trying to do training would more likely have some form of public galaxy that their trainees can use, or would not have a galaxy they admin but instead just want to register them self as a human resource.

I know a few instances that are restricted and not public, yet they advertise it and offering service for others. Required VPN access etc. But training aside, it's also to state that there is someone with Galaxy experience, maybe an Admin to whom I can talk to.

GRT supports this: registering, mapping (pub + priv), guiding users (website), central bragging point for their galaxy to funding agencies (badges with "we run #1 most jobs out of all galaxies" (a really important point, imo, GRT has user/job statistics, so it can share your ranking, and you can share that with funding agencies)), and (possibly) connecting admins. Especially if we added the admin's info to the GRT page.

Eric I'm not arguing against GRT here :) It is a great project and very useful for some use cases.

I don't think Galaxy should overtake the registry part, for this we have the ELIXIR registry, that does way more than the registration of Galaxy instances or tools and we should support them.

ELIXIR was just tool registry, or ...? What new features are they adding?

Workflows, Galaxy instances, they have a large ToDo list afaik.

I do think galaxy should handle galaxy registry. It seems very strange to be sending all of our galactic instance metadata to a completely separate organisation, separate funding, to track Galaxies across the universe.

They register services and tools, Galaxy is one service under many and this fits nicely.
But again we mixing topics here. Sending _all_ metadata will most likely not happen for many instances for given reasons. I'm not sure I'm allowed to do this and I consider myself as very liberal maintainer ;). Sending parts of very restricted metadata might happen, but this can happen also with other projects that have dedicated funding for it, with an ontology and people dedicated for this.

Not discussing the search idea here - what counts for me and some kind of map project is to get as many people on board as possible, as easily as possible. I don't see this happen with GRT quickly. Especially not if I assume that not everyone will activate this for given reasons. Do you intend to activate it by default?
GRT could also just import this geojson file and add instances that are not registered and vice-versa :)

@hexylena
Copy link
Member

We are mixing things up here. Please let's move the search discussion to a different thread.
ELIXIR registry is a registry storing things with an ontology in a database, searchable for everyone, you can also search in it. The Galaxy Federation idea is different imho, and searching should be also different.

This is about registering Galaxy instances, private and public ones and reusable for what ever comes to our mind because it is structure in a standard format (geojson).

Ok, sure, that's fine. Not talking about federation either, that's completely separate.

Back to the SPOF :)

We have these everywhere. We are not talking about building and deploying infinitely scalable services to AWS + GCP with multi-region failover, why bring this up? Is this such a big concern?

An other argument is that a trainer not strictly can configure Galaxy to send this data or that you need to convince your admin to send data ...

But a trainer can register their galaxy. This does not require admin access to a galaxy, or server access at all. You can register your galaxy with whatever subset of data you want (name, description, location) and not send job logs. This is fine by GRT.

The data would have to be updated manually, but that is no different from galaxy-maps, just through a web interface instead of hand edited files and PRs.

I know a few instances that are restricted and not public, yet they advertise it and offering service for others. Required VPN access etc. But training aside, it's also to state that there is someone with Galaxy experience, maybe an Admin to whom I can talk to.

Very interesting! People do such strange things!

I would argue that this case is covered in GRT, through the "register through the website and do nothing else" case.

Eric I'm not arguing against GRT here :) It is a great project and very useful for some use cases.

I know this Björn, I just strongly believe that GRT completely covers this precise use case here, and possibly Cloudlaunch's as well.

They register services and tools, Galaxy is one service under many and this fits nicely.
But again we mixing topics here. Sending all metadata will most likely not happen for many instances for given reasons.

Sure. That's fine, we don't ask for all of their metadata. You opt-in to providing as much as you want during 1) registration, and 2) regular crontab sending, if and only if you are wishing to submit job run logs.

I'm not sure I'm allowed to do this and I consider myself as very liberal maintainer ;). Sending parts of very restricted metadata might happen, but this can happen also with other projects that have dedicated funding for it, with an ontology and people dedicated for this.

Not discussing the search idea here - what counts for me and some kind of map project is to get as many people on board as possible, as easily as possible. I don't see this happen with GRT quickly. Especially not if I assume that not everyone will activate this for given reasons. Do you intend to activate it by default?

This is the current state.

utvalg_999 024

Again, no activation necessary, it's a website you can sign up at and register your galaxy.

GRT could also just import this geojson file and add instances that are not registered and vice-versa :)

I'm already using it as test-data ;)

@hexylena
Copy link
Member

Update: https://oc.hx42.org/grt/galaxy/

Internally I've exposed this as an API endpoint in GRT which can show the geojson data for all or one galaxy (depending on which map you wish to place, no need for you to fetch data about the entire world). You could embed these maps or use the GeoJSON any other way that you want, much like you were suggesting.

@bgruening
Copy link
Member

Back to the SPOF :)
We have these everywhere. We are not talking about building and deploying infinitely scalable services to AWS + GCP with multi-region failover, why bring this up? Is this such a big concern?

Since when is "We have/use this everywhere" an excuse to introduce even more of this ;)
Me me it always is/was an concern and if we can avoid it, I would try to do so.

But a trainer can register their galaxy. This does not require admin access to a galaxy, or server access at all. You can register your galaxy with whatever subset of data you want (name, description, location) and not send job logs. This is fine by GRT.
The data would have to be updated manually, but that is no different from galaxy-maps, just through a web interface instead of hand edited files and PRs.

I see where you are heading, if you add more and more things to GRT thats fine and awesome. It's not what I had GRT in mind - as a collection of Job metadata. But if you try to get more and more of these features in and support the community this is great!!!
I will follow this (as all of your projects) and as soon as it gets the momentum that we can use it for training we can kill this little maps-project.

I'm still concerned about the overlap with the ELIXIR project and would like that both projects talk to each other and not replicating work. There is already https://github.com/C3BI-pasteur-fr/ReGaTE very similar to what GRT is now becoming (or was since ever ... :))

@hexylena
Copy link
Member

hexylena commented Aug 15, 2016

Since when is "We have/use this everywhere" an excuse to introduce even more of this ;)

Fair point.

With me it always is/was an concern and if we can avoid it, I would try to do so.

Ah, we're treating github as not a SPOF. This is reasonable.

I see where you are heading, if you add more and more things to GRT thats fine and awesome. It's not what I had GRT in mind - as a collection of Job metadata. But if you try to get more and more of these features in and support the community this is great!!!

This was not my original plan either, but it grew very naturally:

  • GRT collects job logs
    • therefore we are collecting metadata about tools
      • therefore we can use this to display things like popular galaxies / popular tools / make badges
    • and metadata about Galaxies
      • if we're collecting this, we can use it to survey galaxies as well, similar to the admin survey that happens every year orso
      • if we're collecting this, we're also collecting a point of contact for each galaxy, maybe we should collect training contacts. I'll add this feature next week ("Galaxy Training Resources" and their associations with various galaxies / a page for that / etc)

I will follow this (as all of your projects) and as soon as it gets the momentum that we can use it for training we can kill this little maps-project.

So, keep status quo? Sounds good to me. I have a command to import the geojson from this repo, so will continue to do that going forward until we see if GRT reaches momentum/dies.

I'm still concerned about the overlap with the ELIXIR project and would like that both projects talk to each other and not replicating work. There is already https://github.com/C3BI-pasteur-fr/ReGaTE very similar to what GRT is now becoming (or was since ever ... :))

yeah, that is something else to work out. Thanks for the link.

@hexylena
Copy link
Member

hexylena commented Aug 15, 2016

@afgane I seem to have scared you off, any follow up comments on what cloudlaunch might wish to do? :)

@bgruening
Copy link
Member

Ah, we're treating github as not a SPOF. This is reasonable.

Yes we do, as we all have backups, what ever comes next will have a github importer (see google-code). But I guess this comment was just peeveing ;)

@hrhotz
Copy link
Collaborator

hrhotz commented Aug 16, 2016

@erasche

So, keep status quo? Sounds good to me. I have a command to import the geojson from this repo,
so will continue to do that going forward until we see if GRT reaches momentum/dies.

well, I like the GRT, it is a great idea, but it should be something where people (ie "Galaxy Servers") can sign up themselve. So I don't like the idea of your script to import the geojson from this repository and create a "Listing of public Galaxy instances". There are at least two servers on your list which are not public.

@hexylena
Copy link
Member

@hrhotz you're right, that is misinformation from that bit of text, since it is a listing of public + private* instances.

* private here meaning not open to public registration, but the fact that they exist is public, otherwise they would not have been mentioned in this geojson file.

Would that be preferable? Or would you rather that I do not import those into GRT at all?

@joncison
Copy link

Just briefly folks (its ELIXIR deliverable time :-/) I'm including @hmenager who is leading the parts of ELIXIR / bio.tools work concerning Galaxy integration (broadly), including https://github.com/C3BI-pasteur-fr/ReGaTE. We'd be happy to talk more of course, in due course, on how we can play nice with GRT, CloudLaunch etc. Cheers!

@hrhotz
Copy link
Collaborator

hrhotz commented Aug 17, 2016

@hrhotz https://github.com/hrhotz you're right, that is misinformation
from that bit of text, since it is a listing of public + private* instances.

  • private here meaning not open to public registration, but the fact that
    they exist is public, otherwise they would not have been mentioned in this
    geojson file.

That's ok. Can you please make the corresponding changes on your GRT page -
thanks

Would that be preferable? Or would you rather that I do not import those
into GRT at all?

Well, the geojson file is on github you can do with this file whatever you
want.
However, IMHO, I would prefer if people can sign up on the GRT page
themselves. Or, if you go for an automatic way, provide a reference to the
source.

@afgane
Copy link
Author

afgane commented Aug 22, 2016

any follow up comments on what cloudlaunch might wish to do?

The general idea behind CloudLaunch is to facilitate access to Galaxy instances (really, any application service), whether they are on a cloud, a laptop, running in a container somewhere, as a public instance or pretty much anything in between. As far as more specific features goes, the version being developed continues to allow launching instances on the clouds but it also allows (or will allow) linking to existing instances (on or off the cloud), searching for tools across those instances, sharing of instances, viewing and controlling cloud resources. The idea is that public instances get listed in a similar fashion to how @tnabtaf maintains https://wiki.galaxyproject.org/PublicGalaxyServers but that individuals can also register their own instances at free will by logging in and filling out a form. All instance listings won't need to be be public either but can be private or shared so when someone logs in, they see a list of instances they have access to, i.e. - that are public, have been shared with them or added by them (e.g., launched cloud instances).

BTW, the name CloudLaunch comes from the original version of the app where the goal was to exclusively launch Galaxy on the Cloud instances on AWS and, later, OpenStack clouds. With that, the name CloudLaunch may imply too much of a cloud-centric view, and we can certainly change the name if it would add to the clarity of the app's purpose. At the same time, with the cloud becoming more omnipresent and Galaxy instances trending toward support for bursting and federation, everything will be coming from some version of the cloud before long.

Linking all of this back to my understanding of the GRT, it feels like CloudLaunch and GRT could really be merged into one project: (1) the app would allow listing of instances (public, private or shared); (2) querying across all of those (either for specific tools or by asking questions: "If I'm mapping a 32 Gb FastQ dataset against a 1Mbp genome, what are the likely minimum/optimal compute requirements"); and (3) launching new instances if a suitable one does not already exist. Comments about that thought?

@hexylena
Copy link
Member

The general idea behind CloudLaunch is to facilitate access to Galaxy instances (really, any application service), whether they are on a cloud, a laptop, running in a container somewhere, as a public instance or pretty much anything in between.

If this is so, then there might be a branding issue. Edit: ok, you mention this later.

As far as more specific features goes, the version being developed continues to allow launching instances on the clouds but it also allows (or will allow) linking to existing instances (on or off the cloud), searching for tools across those instances, sharing of instances, viewing and controlling cloud resources.

I did not know cloudlaunch was picking up the complete feature set of GRT. That is all in the roadmap?

This would mean we have four cross-galaxy tool search efforts? @martenson's, GRT's, ELIXIR's, and cloudlaunch's?

The idea is that public instances get listed in a similar fashion to how @tnabtaf maintains https://wiki.galaxyproject.org/PublicGalaxyServers but that individuals can also register their own instances at free will by logging in and filling out a form. All instance listings won't need to be be public either but can be private or shared so when someone logs in, they see a list of instances they have access to, i.e. - that are public, have been shared with them or added by them (e.g., launched cloud instances).

Yes, this was one of the (recently added) goals of GRT as well, minus the "all instances that they have access to" portion.

BTW, the name CloudLaunch comes from the original version of the app where the goal was to exclusively launch Galaxy on the Cloud instances on AWS and, later, OpenStack clouds. With that, the name CloudLaunch may imply too much of a cloud-centric view, and we can certainly change the name if it would add to the clarity of the app's purpose. At the same time, with the cloud becoming more omnipresent and Galaxy instances trending toward support for bursting and federation, everything will be coming from some version of the cloud before long.

Yes, definitely. Cloud is one of those meaningless business-y terms anyway that has not so much information content.

Linking all of this back to my understanding of the GRT, it feels like CloudLaunch and GRT could really be merged into one project: (1) the app would allow listing of instances (public, private or shared); (2) querying across all of those (either for specific tools or by asking questions: "If I'm mapping a 32 Gb FastQ dataset against a 1Mbp genome, what are the likely minimum/optimal compute requirements"); and (3) launching new instances if a suitable one does not already exist. Comments about that thought?

I'm killing the django frontend to GRT as it is. I was separating it out into a react-js project, but if you want to re-implement the frontend in CloudLaunch and consume our API, I won't say no to that.

But I am amazed that this is all in scope for cloudlaunch. This seems like a lot of functionality that has not been discussed before as a priority/goal for the project? Looking at the roadmap, not a single one of these features that GRT provides, that you discuss being in the interest of cloudlaunch, is on there. galaxyproject/galaxy#1928

@afgane
Copy link
Author

afgane commented Aug 22, 2016

This would mean we have four cross-galaxy tool search efforts? @martenson's, GRT's, ELIXIR's, and cloudlaunch's?

Seems to be trending that way... I guess things are ripe for something like this. More specifically, I have not looked at the details yet but my understanding of @martenson's effort was that it is a set of API endpoints that would enable external Galaxy tool searches - it's just needs a UI. ELIXIR just came up on my radar and prompted this issue. Since GCC and before this discussion, my understanding of the GRT was that it's primarily a job/data collection engine.

As far as the roadmap goes, this got bunched under the "All new CloudLaunch" bullet item (you may remember that the cloud branch of the project got no discussion at the team meeting); I just realized it's not linked there but some of the specifics were outlined in this issue: galaxyproject/cloudlaunch#49. Given that issue prompted no discussion, the idea evolved a bit since then without everything being documented (but those ideas were presented at GCC).

Whatever the service is called, it really seems natural to aggregate instances launched in the cloud with the ones that exist permanently so that users can create their own lists of instances they use and access them from one place. In the long run, I feel that we and the community would be better served if the efforts unify and converge. In the short term though, this would slow things down to figure out the proper architecture and for everyone to get familiarized with it all. As far as the timeline goes, GRT seems to be chugging along; CloudLaunch has not seen any visible development since GCC and will largely be on the back burner for the upcoming month. What would you like to see?

@hexylena
Copy link
Member

Since GCC and before this discussion, my understanding of the GRT was that it's primarily a job/data collection engine.

It was until this repo opened up and I realised "hey, we've already got infrastructure in every galaxy ≥16.07 for doing this."

As far as the roadmap goes, this got bunched under the "All new CloudLaunch" bullet item (you may remember that the cloud branch of the project got no discussion at the team meeting); I just realized it's not linked there but some of the specifics were outlined in this issue: galaxyproject/cloudlaunch#49. Given that issue prompted no discussion, the idea evolved a bit since then without everything being documented (but those ideas were presented at GCC).

Thanks for linking this, had forgotten about that issue. CloudLaunch needed more attention during that meeting, you're right. I've always paid less attention to Cloudlaunch because I don't directly use clouds for any of my work.

Whatever the service is called, it really seems natural to aggregate instances launched in the cloud with the ones that exist permanently so that users can create their own lists of instances they use and access them from one place.

agreed. This sounds like a nice feature.

In the long run, I feel that we and the community would be better served if the efforts unify and converge.

somewhat agreed.

What would you like to see?

I'm not sure, so I'm going to outline my thoughts as they come to me.

  • I think you have a great plan for cloudlaunch, and the mentioned feature of logging in to the galaxies they have access to sounds really nice, especially for people who launch so many galaxy instances in the cloud. (I'm not one of those people).
  • Both projects have similar architecture, django backend (<3!! thank you for using it in cloud launch), JS frontend. If they need to, they could easily be merged.
  • As it stands, I don't know what message it sends to end users of these things. My goals with GRT originally was just job log analysis. Galaxy directory grew out of that naturally, and it dovetails nicely, and lets me do badges for galaxies which admins will like for branding and bragging. On that note, I think I got the branding right ("Radio Telescope" peering into galaxies across the universe) on this since it applies to all galaxies.
  • From my (clearly incorrect) original impressions of cloudlaunch, these two projects serve very different niches. I still think this is true to some extent:
    • GRT: job logs, analysis, galaxy directory, tool directory, popularity contest badges.
    • CloudLaunch: launching Galaxies, galaxy directory
  • This leads me to feel that they different needs and different target audiences that maybe they shouldn't converge on the frontend.
  • However we do still share the galaxy directory, so maybe the backends could merge. Would this appeal in any way? I feel like it is "polluting" the cloudlaunch database to some extent.
  • So, given the above, and that it's just a REST API, maybe cloudlaunch should consume the data from GRT?
  • Basically, I think I'm not convinced that they should merge completely. Would consuming data from one another make sense?
  • I threw together GRT without a lot of planning or direction. I do this with a lot of projects because I'm one person and being "disruptive" (as much as I dislike that word), is often a shortcut to getting reactions and progress made on items the community cares about. CloudLaunch was much better architected and thought out. But with all of these efforts, maybe we should find time to discuss and figure out the best way forward for everyone involved. We have a number of features, and users could benefit from more inter-linking and interconnectedness in the frontends for these efforts. Do we need all of these things, or is there one super-service which accomplishes everyone's needs? What would that look like?

@afgane
Copy link
Author

afgane commented Aug 22, 2016

I kind of feel there is really more overlap than there are differences: everything revolves around a directory of Galaxy instances, static or temporary ones. CloudLaunch adds the ability to provision additional ones while GRT aggregates job data. Badges, flavors, search etc. are more of a UI feature, which is enabled by the directory concept and supported by data from the GRT.
So from the end-user perspective, I feel all of these really fit well together to create a one-stop-shop to: discover, access, create, get (job performance) feedback (eg, as separate tabs). Let's see if anyone else chimes in (ping @nuwang) but I think I'm leaning on the merge side. Other than a possibly very large database of job info (which will exist anyhow), what are the drawbacks of merging?

@hexylena
Copy link
Member

So from the end-user perspective, I feel all of these really fit well together to create a one-stop-shop to: discover, access, create, get (job performance) feedback (eg, as separate tabs)...what are the drawbacks of merging?

While I agree that the data is really related and belongs in a single store...I quite strongly do not think the audiences for those data are the same.

Use Case Audience
discover/access majority of end users
create separate subgroup of end users from discover/access
get job performance admins, cluster admins
tool usage statistics tool authors, maybe cluster admins

(Speculation: those who can create, will, rather than be stuck in a queue with normal users looking to discover/access open galaxies that might have less hardware than the create-users can afford?)

This is why I would hesitate on voting merge, because GRT has a very distinct audience. It would be strange to say "hey, tool devs and cluster admins of your private local galaxy, go to CloudLaunch, the galaxy service to launch cloud images, that's where our job data is." That feels like two completely unrelated things shoved into one project, at least given the history of CloudLaunch.

I somewhat feel that given their separate audiences, that at least the frontends could stay separate with no great loss, and we'd all have the same backend and be able to take advantage of that data.

From the GRT side, until I picked up some of the GTN goals by adding maps, I had zero interest in end users. I only cared about admins and tool devs.
If we shared a backend, I'd strongly consider splitting out the training stuff into a third frontend that the GTN maintains and updates to help link their resources to people who are looking for them, or maybe cloudlaunch would take that over, but again, end users looking for training resources, looking to connect with Galaxies that have the tools they need, most I know think of cloudlaunch as that place where people who are comfortable installing tools and that have nice grants that give them cloud resources go for galaxy.

@joncison
Copy link

Hey folks, a couple of notes explaining the scope of the ELIXIR registry (dev.bio.tools), in case it informs the discussion.

We're focused on "discovery" of tools and services (that means find, understand, compare, select) by providing basic (but supporting quite comprehensive) description of tools. Also "interoperability" as a secondary concern (mostly boiling down to annotation of supported data formats, and providing some information about service endpoints and command-line spec - but not straying deep into the later - we want to transform / support CWL).

We won't be collating data on tool usage, job performance, popularity, or be a repo for code or job data, but we'd like in due course to expose the results of scientific benchmarking of tools and technical service monitoring (up time etc.) - this is a separate concern within ELIXIR, which bio.tools will expose. bio.tools won't provide facility for running tools either (although a service broker has been mooted within ELIXIR)

The major task for bio.tools in 1st instance is producing - and maintaining (through a distributed curation effort) a high quality and comprehensive set of tool descriptions. By tool I mean all types of application software broadly. Once we have that, we can then link unique tools to the various online services where they can be used, that includes of course Galaxy instances. So information about such servers and the tools they contain is very interesting, hence regate and similar such efforts. Yes, it would be awesome to search for "bowtie" and get basic information about it, including all the places you can run it. Lot of work to get there, though!

I think it's OK for different portals serving different needs / audiences, we just want (obviously) to avoid redundant efforts where we can. At the very least, share data and try our best to coordinate.

Best of luck with all the efforts here!

@nuwang
Copy link
Member

nuwang commented Aug 23, 2016

Took me a bit of time to catch up with the discussion :-) These all seem like very good ideas, with the main issue being the apparent overlap. To what extent can these be broken apart into different, standalone web-services (micro-services if you will), which can then be aggregated as desired?

For example, I think that @erasche makes a good point about not putting too much of this stuff into CloudLaunch, it may expand scope significantly to the point where it becomes hard to manage. Looking at the stated project scope of each, some of this functionality doesn't seem to fit in all that well with either CloudLaunch or GRT. Perhaps the thing to do is to expand on @bgruening's idea and simply create a separate service altogether for this, dedicated to aggregating and filtering available Galaxy servers?

For CloudLaunch specifically, it may make more sense to simply query a remote URL, and fetch a list of available Galaxy Servers. Whether this list comes from a geojson file hosted on Github, or whether it comes from GRT, or from some other service altogether, won't matter so much as long as the data is in a documented JSON response.

The only complication is that CloudLaunch should also allow people to register a newly launched instance as a publicly available server if they so desire. That makes the idea of a programmable web service with CRUD operations more attractive, as opposed to a simpler geojson file which needs to be manually updated (although I'm certainly not immune to the charms of a geojson file - it's simple and reliable).

Users can be redirected to this third project if they wish to find servers with specific characteristics, tools etc. Alternatively, cloudlaunch could consume the service directly where it makes sense.

As it stands now, CloudLaunch already contains this web service, with a django-rest-framework browsable API, so the code can directly be spun off as a separate project should this route seem more attractive.

However, it's not clear to me to what extent a separate service would benefit GRT, or whether it makes sense to fold everything into GRT etc. etc.

I do think that it's too risky to fold GRT and CloudLaunch into one project - the scope seems too expansive.

@afgane
Copy link
Author

afgane commented Aug 23, 2016

I appear to be on the thin end here rooting for the merge. Some of my thinking comes from the fact I see CloudLaunch being less and less as just a service for launching Galaxy cloud instances and more of a hub for accessing and discovering services/tools (despite its name, but I've discussed that already above; again, I'm perfectly fine with changing the name). From the cloud perspective, an increasingly large number of new hardware installations are starting to be managed by cloud middleware (e.g., campus clusters). Then, there are academic clouds (e.g., Jetstream, NeCTAR, EGI Federated Cloud), all of which are typically 'free' to use. So I think a growing number of people will want to deploy Galaxy for the Cloud and either launch longer running, shared instances or point their users to a launcher for self-provisioning instances. This would cause the number of running instances to expand and users will probably want to be able to discover those, group them and share them. With that, I feel the launch process will become this seemingly minor thing that happens as a sideline and automatically behind the scenes while the discovery and service groups are the focus for users. For example, a user sees a public instance that has the right toolset but the instance has quotas or other access restrictions. The user can then use the flavor launcher to automatically create a clone of that instance on their own infrastructure (and make it available to the rest of their group or use it for training). Although still a bit far out, to me, these speak that a launch and discovery belong together from the user's perspective. The performance piece comes in to help or, ideally, automatically decide what infrastructure to use for the deployment (both, for dedicated cloud instances or for bursting workers on a long-running instances). (Not knowing what the user may use a dedicated instance for, this may be putting the carriage in front of the horse but I can think we could work with that.)

Technically, I feel that if more of us put a joint effort towards a single app, its development will likely move along faster than if we develop three separate apps. Particularly, three apps that are largely based on the same framework. With that, I'd be interested in seeing a single backend with the multiple UI interfaces (ideally, those would use the same technology so components can be interchanged but that may be a stretch).

@hexylena
Copy link
Member

hexylena commented Sep 5, 2016

Sorry for the delayed reply @afgane

I appear to be on the thin end here rooting for the merge.

It feels like a lot of work for marginal returns. And if the backend of GRT is part of cloudlaunch, you have this 90% orthogonal service as part of the codebase. The vast majority of my PRs to update GRT wouldn't apply to any cloudlaunch devs, and vice versa. That feels weird to me, but that isn't really a quantifiable/useful statement, so let us ignore it.

So I think a growing number of people will want to deploy Galaxy for the Cloud and either launch longer running, shared instances or point their users to a launcher for self-provisioning instances.
For example, a user sees a public instance that has the right toolset but the instance has quotas or other access restrictions.

I can definitely sympathise with this desire, I really do like your vision of this!

Although still a bit far out, to me, these speak that a launch and discovery belong together from the user's perspective.

Sure, from a user perspective this is ideal. And I understand that having the full list of Galaxies within the cloudlaunch codebase would be easier for you to track which a user has access to, more so than the tenuous connections that happen over REST. I can definitely see that.

Technically, I feel that if more of us put a joint effort towards a single app, its development will likely move along faster than if we develop three separate apps.

(Talked to Björn, his comments suggested that just two would make sense, trainers do not want a separate app.)
So, thankfully, down to two apps.

With that, I'd be interested in seeing a single backend with the multiple UI interfaces (ideally, those would use the same technology so components can be interchanged but that may be a stretch).

Ok, I had a look at doing this, and will PR my models because this discussion has gone on long enough, this will get GRT's backend deployed much sooner than waiting on someone to deploy it for me on servers I don't have cli access to, and everything will be happy-enough, I've seen the light ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants