Skip to content

Fix bug for integer compressed images with blanks#16550

Merged
saimn merged 4 commits into
astropy:v6.1.xfrom
rlwastro:rlwastro-compressionFix
Jun 28, 2024
Merged

Fix bug for integer compressed images with blanks#16550
saimn merged 4 commits into
astropy:v6.1.xfrom
rlwastro:rlwastro-compressionFix

Conversation

@rlwastro
Copy link
Copy Markdown
Contributor

@rlwastro rlwastro commented Jun 10, 2024

Description

This pull request is to address a bug that raises an exception when some compressed images are read.

In _tiled_compression.py:decompress_image_data_section(), blanked pixels are assigned the value np.nan. That is incorrect for integer datatypes and raises an exception because np.nan cannot be converted to an integer.

The fix is to use the BLANK keyword from the header. If no BLANK or ZBLANK keyword is found for integer datatypes, the most negative integer (as determined by ZBITPIX) is used as a fill value. That should not happen if pixel-blanking is specified (otherwise why is ZBLANK there at all?). An AstropyUserWarning is issued in that case.

This also fixes a bug in decompress_image_data_section() for the case where BLANK is specified in the header but ZBLANK is not. The FITS standard recommends that BLANK should be used instead of ZBLANK for integer images. For images with BLANK but not ZBLANK, the current code would not work correctly. The new version handles integer compressed images that specify either BLANK or ZBLANK (or both).

In the compressed.py module, integer images with blanks are converted to floats with NaN values. Previously that conversion happened only if BZERO/BSCALE keywords were also specified. The new code does the conversion even if BZERO and BSCALE are not defined. The new behavior is consistent with the handling of these keywords in uncompressed image HDUs.

Test code is included.

Fixes #15236

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@rlwastro rlwastro requested a review from saimn as a code owner June 10, 2024 14:21
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 10, 2024

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to Astropy 👋 and congratulations on your first pull request! 🎉

A project member will respond to you as soon as possible; in the meantime, please have a look over the Checklist for Contributed Code and make sure you've addressed as many of the questions there as possible.

If you feel that this pull request has not been responded to in a timely manner, please send a message directly to the development mailing list. If the issue is urgent or sensitive in nature (e.g., a security vulnerability) please send an e-mail directly to the private e-mail [email protected].

@neutrinoceros
Copy link
Copy Markdown
Contributor

This would supersede #15756 and fix #15236
@rlwastro please make sure formatting is green (you may comment pre-commit.ci autofix here or fix it locally and push), and please include a changelog entry (see docs/changes/README.rst)

@rlwastro rlwastro changed the title Fix bug for integer compressed images Fix bug for integer compressed images with blanks Jun 10, 2024
@rlwastro
Copy link
Copy Markdown
Contributor Author

This would supersede #15756 and fix #15236 @rlwastro please make sure formatting is green (you may comment pre-commit.ci autofix here or fix it locally and push), and please include a changelog entry (see docs/changes/README.rst)

I fixed the formatting and added a changelog entry. Let me know if you see other changes that are required.

I'm working on creating a small compressed image that can be used for testing.

@rlwastro
Copy link
Copy Markdown
Contributor Author

@neutrinoceros I'm testing this fix on various images and am beginning to wonder whether I have the correct fix for this issue.

Currently if we read a FITS file with an uncompressed integer image that has a BLANK keyword in the header, the image gets converted from an integer datatype to a float32, and the blanked pixels are set to np.nan. I could argue with that behavior (perhaps a masked integer array would be better than a float32 array). But that is the way it acts now, and as far as I know people are OK with it.

With my version of the code, a FITS file with a compressed integer image that has a BLANK keyword in the header does not get converted to a float. Instead it is kept as an integer array, with the blank values retained as the special integer value.

I think for consistency with the uncompressed image behavior, this should instead convert the array to float32 and replace the blanks with np.nan. If you agree, I can withdraw this pull request and submit a new one. But let me know if you think it needs more discussion before doing that.

This was a bit tricky to discover because the Pan-STARRS images in the original bug report also have BZERO and BSCALE keywords, and it turns out they do in the end get converted to float32 with np.nan for blanks. So the change I'm proposing will in the end give the same result for the PS images. But I'm pretty sure the new approach would be better for other images.

I do think this is important to fix soon! Currently it simply dies with an exception when one of these images is opened. That is clearly bad.

@neutrinoceros
Copy link
Copy Markdown
Contributor

I think for consistency with the uncompressed image behavior, this should instead convert the array to float32 and replace the blanks with np.nan. If you agree, I can withdraw this pull request and submit a new one. But let me know if you think it needs more discussion before doing that.

I can see both ways as satisfying solutions, albeit non ideal. I think @saimn and/or @astrofrog should weigh in here.

@rlwastro
Copy link
Copy Markdown
Contributor Author

@neutrinoceros I figured out the right place to make this change. The code that applies the BSCALE/BZERO scaling and replaces BLANK values with NaN (in compressed.py:CompImageHDU._scale_data()) was skipped if the BSCALE and BZERO values were omitted. It is now executed if BLANK is specified (even in the absence of BSCALE/BZERO).

I updated the description to reflect these additional changes. This version should have the correct behavior (as similar to uncompressed image HDUs as possible).

There is room for improvement to the handling of errors in compressed image headers. I have not tried to make these changes because the code in header.py is pretty scary. My changes do not make these problems any worse at least!

  • For integer images, if the ZBLANK header keyword is specified but there is no BLANK keyword, ZBLANK should be copied to BLANK (with a warning).
  • For integer images, if neither ZBLANK nor BLANK appears in the header, but there is a ZBLANK column in the table, then BLANK should be set to a default value (with a warning).

@rlwastro
Copy link
Copy Markdown
Contributor Author

@neutrinoceros I made some further improvements in the code that cleaned up some remaining issues with unusual file formats. I believe this patch is complete now.

I added new tests in the test_tiled_compression.py module that I believe should test all the added code. I added some additional (small) test files as well to support those tests. There are still some warnings about code coverage. I'm not sure what the problem is there. I can't tell whether the new tests have actually been run yet or not.

I believe this is very close to complete, so if you or others can help get it through the final stages, that would be very helpful.

@pllim pllim added this to the v6.1.1 milestone Jun 14, 2024
@pllim pllim requested review from Cadair and astrofrog June 14, 2024 09:09
@rlwastro
Copy link
Copy Markdown
Contributor Author

Thanks @pllim ! I fixed some filename typos in my test code, and now the patch coverage is 100% complete. All checks now pass.

@pllim
Copy link
Copy Markdown
Member

pllim commented Jun 14, 2024

Alas, we just updated branch protection rules due to matplotlib bump. Would you mind rebasing? Thanks!

@rlwastro
Copy link
Copy Markdown
Contributor Author

@pllim Ah crap, I think I screwed that up. This is very much out of my comfort zone in git! I'll see if I can fix it. If not I may have to start over.

@pllim
Copy link
Copy Markdown
Member

pllim commented Jun 14, 2024

Maybe try this?

git fetch upstream main
git rebase upstream/main

And see if the history looks better before you force push?

@rlwastro rlwastro force-pushed the rlwastro-compressionFix branch from ecbaf19 to d468389 Compare June 14, 2024 20:45
@rlwastro
Copy link
Copy Markdown
Contributor Author

Whew, thanks! That did the trick. I think things look like they will be OK now. I'm just waiting to confirm the checks are all still OK.

Copy link
Copy Markdown
Contributor

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of, I want to point out this seems like great work to me, though again I'm no authority when it comes to io.fits, but your work is greatly appreciated.

Since you added tests (which is great) alongside some binary files, it would be important to also include the code that was used to generate them if possible. See
#16371 for context.

Comment thread astropy/io/fits/hdu/compressed/compressed.py Outdated
@rlwastro
Copy link
Copy Markdown
Contributor Author

First of, I want to point out this seems like great work to me, though again I'm no authority when it comes to io.fits, but your work is greatly appreciated.

Since you added tests (which is great) alongside some binary files, it would be important to also include the code that was used to generate them if possible. See #16371 for context.

@neutrinoceros @pllim
I can do that, and I agree it seems like a good idea. But I don't know where to put the script. I tried putting it in the data/ directory, but the pre-commit check complains that it needs an __init__.py file. I'm pretty sure we don't want that. I could put it in tests/, but then pytest might try to run it? I poked around in the astropy directories and did not see any obvious place to use for this sort of thing.

I have the code ready to check in but need some guidance before I can complete the process.

@saimn saimn modified the milestones: v6.1.1, v6.1.2 Jun 17, 2024
@rlwastro rlwastro force-pushed the rlwastro-compressionFix branch from 9870dab to 1080b7d Compare June 17, 2024 22:00
@rlwastro rlwastro requested a review from nstarman June 26, 2024 17:52
@rlwastro rlwastro changed the base branch from main to v6.1.x June 26, 2024 17:53
@rlwastro
Copy link
Copy Markdown
Contributor Author

I squashed everything into a single commit and changed the fix to be against the v6.1.x branch. Thanks @neutrinoceros for the helpful tips! I hope everything was done correctly. The files changed look correct. There is a check failure for the change log file. I'm not sure whether that is something I need to fix or not.

Comment thread astropy/io/fits/hdu/compressed/_tiled_compression.py Outdated
rlwastro and others added 3 commits June 26, 2024 14:22
Change the value used for the BLANK keyword to -16384 instead of using
the default value of -32768. That could catch errors in the default
assignment.

Add an additional test using an image with a ZBLANK column plus a BLANK
keyword in the header. That is a standard configuration and is processed
without any warnings.
@pllim pllim added skip-basebranch-check Skip base branch check for direct backports and removed backport-v6.1.x labels Jun 27, 2024
@pllim
Copy link
Copy Markdown
Member

pllim commented Jun 27, 2024

I haven't been following the convo. The failure isn't from change log, @dhomeier . It is because this PR is opened against v6.1.x and not main. If that is intentional, then close and re-open the PR to re-run the checks.

So, this means that this patch is only going into the next v6.1 series and WILL NOT be in v7.0.0. Is that correct?

@dhomeier dhomeier closed this Jun 27, 2024
@dhomeier dhomeier reopened this Jun 27, 2024
@dhomeier
Copy link
Copy Markdown
Contributor

Yes, it is intentional – see #16550 (comment) for the rationale.
Apparently needed reopening to make the pre-CI checks aware of the changed target, thanks!

Copy link
Copy Markdown
Contributor

@saimn saimn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @rlwastro !

@dhomeier
Copy link
Copy Markdown
Contributor

Thank you for your contribution and your patience throughout our review process, @rlwastro! FITS code is something we approach only with great care and caution. ;-)
Thanks also to everyone who helped get this into Astropy 6!

@rlwastro rlwastro deleted the rlwastro-compressionFix branch June 28, 2024 15:22
saimn added a commit to saimn/astropy that referenced this pull request Oct 1, 2024
saimn added a commit to saimn/astropy that referenced this pull request Oct 2, 2024
saimn added a commit that referenced this pull request Oct 10, 2024
Backport #16550 to main: Fix bug for integer compressed images with blanks
kkrutkowski pushed a commit to kkrutkowski/astropy that referenced this pull request Apr 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug io.fits skip-basebranch-check Skip base branch check for direct backports

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New FITS compression code breaks loading Pan-STARRS images

7 participants