Skip to content

Track color directives in the DVI parser and expose the results#31292

Open
MaddieM4 wants to merge 20 commits intomatplotlib:mainfrom
MaddieM4:dvi-read-color
Open

Track color directives in the DVI parser and expose the results#31292
MaddieM4 wants to merge 20 commits intomatplotlib:mainfrom
MaddieM4:dvi-read-color

Conversation

@MaddieM4
Copy link

PR summary

This is part of a series of work originally motivated by #6724, where the long-term goal is for various MPL backends to be able to correctly render (La)Tex strings which have color directives inside them. A lot of the previous conversation on this project has happened on #31234, which was my first attempt at a fix within the existing "use dvipng to render things" paradigm. Little did I know that MPL was pivoting to rendering (La)Tex "in-house" with AGG, instead of using dvipng. So, while #31234 will need a substantial/complete rewrite at some point to work with the new in-house approach, it brought up a pretty important side quest: at the time, the code in matplotlib.dviread could not understand color directives, so it couldn't pass them along to backends.

During the conversation, I offhandedly mentioned it would be nice to break the existing code into a more layered approach, such that DVI op parsing could happen (and be inspected) without any state tracking. I hadn't actually planned to do that, I figured it would make the PR harder to accept, until @anntzer said:

It's probably reasonable to split the Dvi class implementation into a reader part that just yields a list of opcodes, and a second "virtual machine" part that applies them, if that helps.

So, given that unexpected permission, I went for it, and that's why the first half of this commit series is a refactor of the existing code with (hopefully) no behavioral changes. There's probably still a bit more room for improvement, but I find that being able to tap in for debugging with finer granularity is really nice, while still presenting a backwards-compatible high-level interface. I think the biggest win might be the way the Vf class has changed thanks to the options opened up with the new layering. Stuff that used to be some hairy state tracking of packet boundaries, is now a much cleaner "parse the whole packet as a single command, then run the payload through an inner VM" technique.

The second half is extending the existing Text and Box types to support color information, using @anntzer's recommendations for making the new versions backwards-compatible. This honestly ends up being a small part of the total diff, pretty self-contained, and I feel reasonably confident in it after the amount of crash-coursing the DVI format that I've done in the last couple days. Color specials don't really have one syntax to rule them all, it's driver-dependent, and I don't want to restrict people from using color expressions that tie into a specific backend (for example, having a handful of company-branded toner colors to stick to, where the PDF MPL backend would care about having pristine source information) - for that reason, I'm exposing the textual color descriptions verbatim as strings, and leaving it to backends to do further interpretation in their own ways (and outside the scope of this PR).

Please don't be shy about any requests for changes or further documentation, if needed!

AI Disclosure

Don't use it. Not a fan, personally.

PR checklist

@scottshambaugh
Copy link
Contributor

@QuLogic FYI, want this as part of your pending text-overhaul package or can this defer?

@MaddieM4
Copy link
Author

For my part, I'm happy to rebase to wherever this makes the most sense for other people!

As for CI errors, it might take a few round trips to clear everything up, as there are some tools (lookin' at you, mypy) that just don't seem to want to work right on my machine. Based on stub tests, it probably makes sense to make all the .op_foo() methods on Vf and VM private, which is maybe not a bad incentive pressure, much as I like having the ability to cleanly drive those state changes from outside.

Really the only problem I see so far that can't be solved by patiently banging my head against the wall until the automation is happy, is that I have some tests that run against a .dvi file I generated with a simple script, and those tests encode some over-specific font path dependencies which do not exist on OS X. That's going to require a bit more actual understanding and thought, I expect, although I suspect I can get away with using TexManager.make_dvi(...) to get a more locally-appropriate DVI file. The issue is really that there's some global state required to set up a TeX preamble that actually does color, and I'm not sure offhand the cleanest way to do that in the middle of the test suite without, you know, polluting other tests.

@anntzer
Copy link
Contributor

anntzer commented Mar 13, 2026

The issue is really that there's some global state required to set up a TeX preamble that actually does color, and I'm not sure offhand the cleanest way to do that in the middle of the test suite without, you know, polluting other tests.

If I understand your question correctly: rcParams actually get fully reset between each test so setting them in a test is fine (or you can just use mpl.rc_context() to be extra safe).

@anntzer
Copy link
Contributor

anntzer commented Mar 13, 2026

To keep things from ballooning out of control and help review, it may help to split this into two stacked PRs, one with just the refactoring of Dvi into a parsing and an interpreter step (with no behavioral changes), and one with the color handling.

return int.from_bytes(b, "big", signed=signed)


class Ops:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stay private (at least until there's third-party demand for it being public); ditto for VM. In general we try quite hard to keep as much stuff private as possible, because we don't want to become a general dvi parsing library, and any public API is much more complicated to alter.

@MaddieM4
Copy link
Author

Alright, sounds good. Considering the branch name here, I'll break out the non-behavioral refactoring into a separate PR (with the details kept as private as possible), and then come back to this PR to turn it into a proper sequel. That's also going to be an opportunity to make the git histories cleaner, because wrestling with CI for 10 commits in a row doesn't make for very clear storytelling 6 months later during a git bisect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants