Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new little requested features #875

Merged
merged 9 commits into from
Apr 24, 2024

Conversation

pavanvidem
Copy link
Contributor

  • Flake8 passes (flake8 . --exclude=.venv,.build,planemo_test_env,build --ignore=E501,F403,E402,F999,F405,E712)
  • Local tests pass (py.test hicexplorer --doctest-modules)

@pavanvidem
Copy link
Contributor Author

pavanvidem commented Oct 24, 2023

Added the following features:

  • Allow chicAggregateStatistic.py to to extract the aggregated data from the views.hdf based on differential.hdf or differential_target.bed. Now the BED may have the target name in the 4th column. In that case, the aggregation is done per target.
  • Allow hicCorrectMatrix.py to write filtered out regions to a BED file

@pavanvidem
Copy link
Contributor Author

Tests need to be added

@joachimwolff
Copy link
Collaborator

Please provide test cases and make sure it works with current python and numpy. Thanks!

@pavanvidem
Copy link
Contributor Author

I might have some test data but have to find some time.

@pavanvidem
Copy link
Contributor Author

@joachimwolff @bgruening please review

for line in file.readlines():
if line.startswith('#'):
continue
_line = line.strip().split('\t')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the variable name _line confusing. They are now elements or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with the remaining project I used _line. The same line of code can be found several times.

@@ -548,6 +551,7 @@ def filter_by_zscore(hic_ma, lower_threshold, upper_threshold, perchr=False):
to avoid introducing bias due to different chromosome numbers

"""
print("filtering by z-score")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove? Or use logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logged

@@ -658,9 +662,22 @@ def main(args=None):
restore_masked_bins=False)

assert matrix_shape == ma.matrix.shape
for idx in outlier_regions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

with open(args.filteredBed, 'w') as f:
for outlier_region in set(outlier_regions):
interval = ma.cut_intervals[outlier_region]
f.write('{}\t{}\t{}\t.\t{}\t.\n'.format(interval[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using an f-string here makes it much easier to read I suspect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used a plain join because the interval should anyways contain 4 elements

# mask filtered regions
ma.maskBins(outlier_regions)
total_filtered_out = set(outlier_regions)
print(outlier_regions, "Bins that are MAD outliers ({:.2f}%) "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

if x != y:
count = sum(1 for a, b in zip(x, y) if a != b)
if count > pDifference:
equal = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you break here the entire loop, correct? than you can also return and remove the equal altogether

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to return true or false. Again copied this part from somewhere in the project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return True, return False ;)

for line in file.readlines():
if line.startswith('#'):
continue
_line = line.strip().split('\t')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as my comment above

try:
chrom, start, end = _line[:4]
except ValueError:
_line = line.strip().split()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a line has less then 3 columns, this exception is raised, and then you are trying the same again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, first split was with tabs. This one with white spaces, if the input BED file is not tab seperated.

chrom, start, end = _line[:4]
except ValueError:
_line = line.strip().split()
chrom, start, end, gene = _line[:4]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the 4 wrong here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should contain 4 columns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a column, its really not clear that you are parsing here two different file-types.

requirements.txt Outdated
@@ -23,6 +23,6 @@ future >= 0.18
tqdm >= 4.66
hyperopt >= 0.2.7
python-graphviz >= 0.20
scikit-learn >= 1.3.1
scikit-learn == 1.3.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pin looks to strict to me, is >=1,3,2,<1.4 better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there is no other version between 1.3.2 and 1.4. I switched it to 1.3,1.4

setup.py Outdated
@@ -116,7 +116,7 @@ def checkProgramIsInstalled(self, program, args, where_to_download,
"tqdm >= 4.66",
"hyperopt >= 0.2.7",
"graphviz >= 0.20",
"scikit-learn >= 1.3.1",
"scikit-learn == 1.3.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here?

@bgruening bgruening merged commit 67a4f37 into deeptools:master Apr 24, 2024
7 checks passed
@lldelisle
Copy link
Collaborator

Hi there,
It seems that this was a new release but:

As a consequence a conda installation of 3.7.4 gives as version 3.7.3...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants