Monthly Archives: August 2013

Introducing the heatmap

Our original colouring scheme for the generated Paperscape map assigns the arXiv categories (mostly) unique colours to easily distinguish them. This colour coding shows clearly how authors cite mostly within their own fields, as well as revealing interesting interfaces between the different categories. For example, check out the fields of dark matter (astrophysics meets high energy phenomenology) and dark energy (high energy theory meets general relativity/quantum cosmology meets astrophysics). However, the cost of using colour to code categories is that other features, such as a paper’s age, must be shown in a different way. Specifically, we use brightness to highlight newer papers in this scheme,  but, due to all the different colours present, new regions of papers don’t really shine through.

Enter our new heatmap, which purely shows the age of papers using a colour gradient from dark gray (old) to bright red (new). The heatmap can be activated using the new drop down menu located at the top left of the map. In this new colouring scheme regions of recent activity stand out much more clearly, and new papers that are growing quickly can be easily identified. If you haven’t yet, go visit the Paperscape map to try it.

Comparison of category and heatmap colour schemes
Comparison of category and heatmap colour schemes

Now for some details. We found that a linear mapping of the arXiv’s paper ages (spanning 23 years) to the chosen colour gradient wasn’t sufficient to highlight recent activity. After trying various mappings, we’ve opted for a Voigt profile with a sigma of 4 years and a gamma of 1/27 inverse years. These values simply represent what we think best distinguishes what’s currently hot with what’s not. We’ll probably continue to tune the heatmap in the future, and your suggestions are very welcome!

By giving the map two different colour schemes, the question of whether there are other interesting colour schemes naturally arises. It could for example be useful to highlight trending papers i.e. papers that are growing quickly in their number of citations, irrespective of their age. If you have any good ideas please share them!

Searching

Seaching is an important part of Paperscape, since it allows you to find papers on the map. When you enter a search term in the box, all papers that match the search result have a large white halo drawn around them.

At the moment our search can handle arXiv identifiers (eg 1207.7214, hep-ex/9807003), author names (eg E.Witten), titles, keywords (the most common words in the title and abstract of a paper), and new papers (those that appeared on the arXiv today, eg ?n hep-th).

If you type in a list of words in the search box, we do a “boolean and” search for all those words using the authors and keywords of each paper. This gives decent results in a lot of the common cases. For example, searching for "witten qcd" finds papers written by Witten that are about QCD, and also finds papers written about QCD that mention Witten in the abstract.

It is not at the moment possible to construct your own boolean search phrases. For example "?au witten ?ti qcd" does not work, at least not yet!

We are still developing search. If you have any suggestions for how searching should work, please leave a comment.

Some teething issues

Paperscape has been getting quite a bit of traffic in the past 12 hours. Thanks for your interest!

With all the traffic, we have encountered one mild bug. When you click on a paper your browser sends the location of the click to our servers, which then return the associated paper id, if one exists at that location. On rare occasions it is possible to request a paper at a (NaN,NaN) location (yes, I know, that’s strange!), and this was causing issues with our server looking for that location. Consequently, search and clicking on papers was down for a few hours.

It should be fixed now. Please, let us know if you run into any problems.

Labelling regions of the map

The labels on the map are generated mostly automatically. When zoomed out, arXiv categories are displayed, and the position of the category label is computed as the average of all papers in that category. As you zoom in, these category labels disappear, and are replaced by individual labels on top of each paper, so long as that paper is “big enough” on screen. The labels for each paper are determined by analysing the title and abstract, looking for common keywords.

We have now added a third layer to this labelling process: we identify by eye regions of the map that have a definite theme, and give these regions a generic, but not too generic, label. For example, we can identify cleary the “neutrino” area in the north, and the “inflation” area at the interface of hep-th and astro-ph.

These new labels make the transition from arXiv category to keyword labels a bit easier to follow, and also allows you to more easily understand where you are on the map.

In the future we plan to implement a more sophisticated way of labelling that transits smoothly between zoom level, much like in a map of the geographic world. If you have any suggestions for this, please leave us a comment.