Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removed email. expect unsorted dump and sort using big-sorter. added … #3

Merged
merged 3 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 10 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Unless required by applicable law or agreed to in writing, software distributed

1. This program downloads the complete list of public INSDC sequence accessions and their last updated dates (if available) for the requested type using the ENA Portal API.
e.g. for sequence:
https://www.ebi.ac.uk/ena/portal/api/search?dataPortal=ena&result=sequence&fields=accession,last_updated
https://www.ebi.ac.uk/ena/browser/api/livelist/sequence?fields=accession,last_updated

2. It then compares the new list against a previous list of the same type and generates 2 lists.
list of new or updated records since the last time
Expand All @@ -37,26 +37,24 @@ You need to provide 3 mandatory arguments i.e datatype, previousSnapshot & outpu
3. outputLocation : Local folder where the new complete report and the 2 change lists are to be created. Ensure there's
enough disk space available.

4. email : optionally provide an email address to be notified when the process is complete

5. query : optionally provide a query string to filter the contents of your snapshot. Use the Query page in the
4. query : optionally provide a query string to filter the contents of your snapshot. Use the Query page in the
https://www.ebi.ac.uk/ena/browser/advanced-search wizard to build up the query. This should be the same query for
every execution of the tool for a specific snapshot. e.g. --query=dataclass="CON"

6. downloadData : (Optional) If value is true, the tool will also fetch the data for the new/updated records and save
5. downloadData : (Optional) If value is true, the tool will also fetch the data for the new/updated records and save
them in a .dat file.

7. format : (Optional) Used only if downloadData=true. Request embl flatfile format (default) or fasta format for
6. format : (Optional) Used only if downloadData=true. Request embl flatfile format (default) or fasta format for
downloaded data.

8. annotationOnly : (Optional) Used only if downloadData=true and format=embl. Download only the annotations, excluding
7. annotationOnly : (Optional) Used only if downloadData=true and format=embl. Download only the annotations, excluding
sequence lines.

9. includeParentAccession : (Optional) Valid only for coding & noncoding. If true,get parent_accession also from API and
8. includeParentAccession : (Optional) Valid only for coding & noncoding. If true,get parent_accession also from API and
include in the output list files.

e.g. 1
java -jar [path]/snapshot-change-lister-1.1.0.jar --dataType=CODING --previousSnapshot=[path]/coding_20210701.tsv --outputLocation=[path] --email=email@email.com
java -jar [path]/ena-snapshot-tool-1.3.0.jar --dataType=CODING --previousSnapshot=[path]/coding_20210701.tsv --outputLocation=[path]

If this program were run on 2021-08-03, it would create 3 new files in the outputLocation folder.

Expand All @@ -67,7 +65,7 @@ coding_20210803_new-or-updated.tsv
coding_20210803_deleted.tsv

e.g. 2
java -jar [path]/snapshot-change-lister-1.1.0.jar --dataType=SEQUENCE --previousSnapshot=[path]/sequence_20220220.tsv --outputLocation=[path] --query=dataclass="HTG" --downloadData=true --format=embl --annotationOnly=true"
java -jar [path]/ena-snapshot-tool-1.3.0.jar --dataType=SEQUENCE --previousSnapshot=[path]/sequence_20220220.tsv --outputLocation=[path] --query=dataclass="HTG" --downloadData=true --format=embl --annotationOnly=true"

If this program were run on 2022-02-23, it would create 4 new files in the outputLocation folder.

Expand All @@ -82,10 +80,10 @@ sequence_20220223_deleted.tsv

Example for running in LSF:

bsub -n 2 -M 10000 -J coding-snapshot-change-lister -o /path/snapshot-changes/output-20211210.log java -jar snapshot-change-lister-1.0.0.jar --email=email@email.com --dataType=CODING --previousSnapshot=/path/coding_20211028.tsv --outputLocation=/path/
bsub -n 2 -M 10000 -J coding-ena-snapshot-tool -o /path/snapshot-changes/output-20211210.log java -jar ena-snapshot-tool-1.3.0.jar --dataType=CODING --previousSnapshot=/path/coding_20211028.tsv --outputLocation=/path/



# Support

Direct any questions/issues to https://www.ebi.ac.uk/ena/browser/support with snapshot-change-lister in the subject
Direct any questions/issues to https://www.ebi.ac.uk/ena/browser/support with ena-snapshot-tool in the subject
5 changes: 4 additions & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ plugins {
}

group = 'uk.ac.ebi.ena.dcap'
version = '1.2.2'
version = '1.3.0'
sourceCompatibility = '1.8'

configurations {
Expand All @@ -35,6 +35,9 @@ repositories {

dependencies {
implementation 'org.apache.commons:commons-lang3:3.4'
implementation group: 'com.fasterxml.jackson.core', name: 'jackson-core', version: '2.16.1'
implementation group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.16.1'
implementation group: 'com.github.davidmoten', name: 'big-sorter', version: '0.1.25'
implementation group: 'org.apache.commons', name: 'commons-collections4', version: '4.4'
implementation group: 'commons-io', name: 'commons-io', version: '2.7'
implementation group: 'org.apache.httpcomponents', name: 'httpclient', version: '4.5.13'
Expand Down
2 changes: 1 addition & 1 deletion settings.gradle
Original file line number Diff line number Diff line change
@@ -1 +1 @@
rootProject.name = 'snapshot-change-lister'
rootProject.name = 'ena-snapshot-tool'
67 changes: 1 addition & 66 deletions src/main/java/uk/ac/ebi/ena/dcap/scl/MainRunner.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,26 +18,14 @@
import lombok.NoArgsConstructor;
import lombok.SneakyThrows;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.exception.ExceptionUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.CommandLineRunner;
import org.springframework.mail.javamail.JavaMailSender;
import org.springframework.mail.javamail.MimeMessagePreparator;
import org.springframework.stereotype.Component;
import uk.ac.ebi.ena.dcap.scl.model.DataType;
import uk.ac.ebi.ena.dcap.scl.model.DiffFiles;
import uk.ac.ebi.ena.dcap.scl.service.MainService;

import javax.mail.Message;
import javax.mail.MessagingException;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;
import java.io.File;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;

@Component
@Slf4j
Expand All @@ -55,9 +43,6 @@ public class MainRunner implements CommandLineRunner {
@Value("${outputLocation}")
public String outputLocationPath;

@Value("${email:#{null}}")
public String email;

@Value("${query:#{null}}")
public String query;

Expand All @@ -76,60 +61,10 @@ public class MainRunner implements CommandLineRunner {
@Autowired
private MainService mainService;

@Autowired
JavaMailSender mailSender;

@SneakyThrows
@Override
public void run(String... args) {
DataType dataType = DataType.valueOf(dataTypeStr.toUpperCase());
File prevSnapshot = new File(previousSnapshotPath);
assert prevSnapshot.exists();
File outputLocation = new File(outputLocationPath);
assert outputLocation.canWrite();
if (includeParentAccession && !(dataType == DataType.CODING || dataType == DataType.NONCODING)) {
throw new IllegalArgumentException("includeParentAccession can be true only for coding & noncoding");
}

String name = dataType.name().toLowerCase() + "_" + DATE_FORMAT.format(new Date());
try {
File newSnapshot = mainService.writeLatestSnapshot(dataType, outputLocation, name, query,
includeParentAccession);
final DiffFiles diffFiles = mainService.compareSnapshots(prevSnapshot, newSnapshot, outputLocation, name);
if (downloadData) {
mainService.downloadData(diffFiles.getNewOrChangedList(), format, annotationOnly);
}
if (StringUtils.isNotBlank(email)) {
sendMail(email, dataTypeStr + " change lister completed",
"Compared " + prevSnapshot + " & " + newSnapshot + " in " + outputLocation);
}
} catch (Exception e) {
log.error("error:", e);
if (StringUtils.isNotBlank(email)) {
sendMail(email, dataTypeStr + " change lister failed", ExceptionUtils.getStackTrace(e));
}
}
mainService.fetchSnapshotAndCompare(dataTypeStr, previousSnapshotPath, outputLocationPath, query, includeParentAccession, format, annotationOnly, downloadData);
}

public void sendMail(String email, String subject, String body, String... args) throws MessagingException {
MimeMessagePreparator preparator = new MimeMessagePreparator() {

public void prepare(MimeMessage mimeMessage) throws Exception {

mimeMessage.setRecipient(Message.RecipientType.TO,
new InternetAddress(email));
mimeMessage.setFrom(new InternetAddress("datalib@ebi.ac.uk"));
mimeMessage.setText(body
+ System.lineSeparator() + StringUtils.join(args, " "));
mimeMessage.setSubject(subject);
}
};

try {
mailSender.send(preparator);
} catch (Exception ex) {
// simply log it and go on...
log.error("Error sending email", ex);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,5 @@ public static void main(String[] args) {
SpringApplication.run(SnapshotChangeListerApplication.class, args);
}

@Bean
public JavaMailSender javaMailService() {
JavaMailSenderImpl javaMailSender = new JavaMailSenderImpl();
javaMailSender.setHost("smtp.ebi.ac.uk");
javaMailSender.setPort(25);
return javaMailSender;
}

}
9 changes: 8 additions & 1 deletion src/main/java/uk/ac/ebi/ena/dcap/scl/model/DataType.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,14 @@
package uk.ac.ebi.ena.dcap.scl.model;

public enum DataType {
ANALYSIS,
ASSEMBLY,
SAMPLE,
SEQUENCE,
STUDY,
READ_RUN,
CODING,
NONCODING;
NONCODING,

TLS_SET, TSA_SET, WGS_SET;
}
17 changes: 12 additions & 5 deletions src/main/java/uk/ac/ebi/ena/dcap/scl/model/Line.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.SneakyThrows;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;

import java.text.DateFormat;
Expand All @@ -27,6 +28,7 @@
@Getter
@AllArgsConstructor
@EqualsAndHashCode
@Slf4j
public class Line {

public static Line POISON = new Line(null, null);
Expand All @@ -36,11 +38,16 @@ public class Line {

@SneakyThrows
public static Line of(String s, DateFormat df) {
final String[] split = StringUtils.split(s);
if (split.length == 2) {
return new Line(split[0], df.parse(split[1]));
} else {
return new Line(split[0], null);
try {
final String[] split = StringUtils.split(s);
if (split.length == 2) {
return new Line(split[0], df.parse(split[1]));
} else {
return new Line(split[0], null);
}
} catch (Exception e) {
log.error("Error in line:{}", s, e);
throw e;
}
}
}
69 changes: 69 additions & 0 deletions src/main/java/uk/ac/ebi/ena/dcap/scl/service/CountClient.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
package uk.ac.ebi.ena.dcap.scl.service;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.SneakyThrows;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Component;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

@Component
@Slf4j
public class CountClient {
static final String PORTAL_API_COUNT_URL = "https://www.ebi.ac.uk/ena/portal/api/count?result=%s&format=json";

@SneakyThrows
public long getCountFromResults(String result, String query) {
// Your JSON array as a string

// Create ObjectMapper
ObjectMapper objectMapper = new ObjectMapper();

// Parse JSON array
JsonNode objNode = objectMapper.readTree(getJson(PORTAL_API_COUNT_URL, result, query));

// Iterate through objects in the array
long codingValue = objNode.get("count").asLong();
return codingValue;
}

@SneakyThrows
private String getJson(String portalApiResultsUrl, String result, String query) {

String urlStr = String.format(PORTAL_API_COUNT_URL, result);
if (StringUtils.isNotBlank(query)) {
urlStr += "&query=" + query;
}
log.info("getting count from:{}", urlStr);
// Create URL object
URL url = new URL(urlStr);

// Open a connection
HttpURLConnection connection = (HttpURLConnection) url.openConnection();

// Set the request method to GET
connection.setRequestMethod("GET");

// Get the response code
int responseCode = connection.getResponseCode();

// Read the response from the input stream
StringBuilder response;
try (BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()))) {
response = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
response.append(line);
}

}

return response.toString();
}

}
Loading
Loading