Plotting -- In Papers and Development

TL;DR

Plotting data is a central element of dealing with numerical data. However, the goal of plotting can be very different:

  • In an early phase, the plots might be more exploratory. Large amounts of data have to be printed in quick succession. Here, packages like pandas and seaborn are very handy.

  • In later stages, e.g. writing of a paper, you plot to tell a story. Thus, more control is needed and detailed adjustments via matplotlib might be necessary.

Plotting for Exploration vs. Plotting for Publication: Understanding Two Very Different Goals

When you work with data, plotting becomes one of the most natural things to do. Looking at plots is just much more enjoyable than staring at columns of numbers. The moment you load a dataset, you begin drawing quick charts—partly out of curiosity, partly to check whether the data even makes sense. Later, when the analysis is complete and you are writing a paper or a report, you return to plotting again, this time with a very different mindset. Although both activities involve turning data into pictures, they serve distinct purposes and require different approaches.

In practice, the difference between plotting for data exploration and plotting for publication changes the way we approach plotting. In the early stages of a project, visualizations act as a form of thinking: can we trust the simulation, does the data make sense, do we see something interesting in the data… Later, they become a form of communication. The needs of these two phases are so different that understanding the boundary between them makes your work both faster and clearer.

This distinction between epxloration and publication already shows clearly one key point: store the data. While this may sound obivous, Jupyter notebooks make it all too easy to plot directly from memory. However, if the simulations takes a long time, we do not want to run the full simulation every time we want to change the style of a plot. Thus, the magic three words: simulate, save, plot.

Plotting for Exploration: Learning What the Data Wants to Tell You

Exploratory plotting is messy (not necessarily on purpose). At this stage, you are not trying to impress anyone; you’re trying to understand your data. You might generate a dozen figures within minutes, each slightly different from the last. Perhaps you filter the data, change the variable on the x-axis, switch scales, or try a smoothing curve. The goal is not to create a perfect plot — it is to obtain new insights.

Packages like pandas and seaborn shine in this phase. They offer quick, high-level plotting functions that let you move fast without writing much code. With a single line, seaborn can show distributions, correlations, or trends and apply helpful statistical transformations automatically. You don’t worry about alignment, font sizes, or precise color choice. The plots are temporary tools, like sketches in a research notebook.

During exploration, you gradually discover the structure of your data: whether variables correlate, whether the distribution is heavy-tailed, whether there are missing values or outliers, whether surprising patterns appear when you group by categories or apply filters.

Plotting for Papers: Communicating With Purpose

Fast-forward to the writing phase: the analysis is done, we know what we want to say in the paper. The relationship to your plots becomes quite different. You are now in a stage of data exploitation — you already know what the data shows, and your task is to convey that information as clearly as possible to someone else. A plot that was “good enough” when you were exploring is suddenly insufficient. Small design choices start to matter: the thickness of a curve, the spacing between tick marks, the color palette, the axis labels, the annotation placement, the size of the figure on the page.

In this phase, I would reocmmend to rely more heavily on matplotlib directly. It is also the underlying engine that plots for libraries like seaborn, but now we can use it with finer control. While seaborn can still be useful to get the basic plot right, matplotlib allows for the careful, fine-grained control that publication-ready figures demand. Think about labels in LaTeX font, cropping of plots when exporting, etc. Whatever you produce must be unambiguous, concise, and aesthetically consistent with the rest of the paper. It needs to reproduce accurately when printed in grayscale, fit alongside text in a column, and remain readable at reduced size.

The shift from exploration to publication is also a shift in discipline. You now include error bars, confidence intervals, proper units, and descriptive labels. What remains is a figure that carries a single, precise message, easily digestible by the reader.

The difference in practice

In the first phase, we are aiming for many plots in a short time. We do not fine-tune any of them. This can easily done with seaborn by exploiting the hue, col (for column), or style options in the sns.relplot command. Much information can be easily packed in one plot or many similar plots can be extracted from a dateset with very few lines of code.

The pandas data frames that built the basis for the analysis in the first phase, can be easily adapted for phase two. The data can still be loaded as before, but now we are aiming for a single plot. I usually tend to write a single Python file per plot in this phase. For an example, see this repo. These scripts can are completely independent of the acutal simulation code and will exactly reproduce the plots in the paper, even after years. If I have to get back to that paper, I will know which data to use and how to obtain the plots. It maximizes reproducibility and minimizes time spent guessing which data was used for which plot.

Happy plotting :)

Patrick Emonts
Patrick Emonts
Junior Research Group Leader

My research interests include tensor networks, lattice gauge theories and quantum information.