open science enthusiast ~ software developer ~ under the sea code monkey
449 stories
·
16 followers

Bibliothèque Municipale d’Epinal

1 Share

On my first day at Les Imaginales, a pair of librarians came up and invited me to visit the Epinal Library. What I didn’t realize — they may have mentioned it and I just missed it — was that they were giving us a private tour of the rare books room.

Epinal Library Rare Books Room

It was amazing. One of the true highlights of my trip to France. My interpreter Lionel, an author himself, was as awestruck as I was. Especially when they brought out the first book. If I’m remembering right, this was from the 8th century.

8th century religious text

The next one wasn’t quite as old…being from the 9th century. This Gospel of Saint Mark was a youthful 1200 years old.

Gospel of St. Mark: 9th Century

The cover is metal and ivory. I’m not sure what kind of jewels those are. The circular areas on the corners were for holding relics. Here’s a glimpse of the interior:

Gospel of St. Mark: Interior

You can see the full set of photos on Flickr. (Or you may have already seen them on Facebook.) It was such a wonderful experience. My thanks to everyone at Bibliothèque Municipale d’Epinal for their time and generosity.

I’ll end with a map of Michigan from one of the books that was “only” a few centuries old. Michigan sure looked different in the old days…

Map of Michigan

Read the whole story
codersquid
9 hours ago
reply
chicago
Share this story
Delete

"The genre, its own kind of endurance art, shuns immediacy."

jwz
1 Comment and 2 Shares
Insulting headline aside, I kind of love this:

The forgotten joys of the screen saver:

If screen savers still have an eschatological tinge for me, it's also because of their own demise. We no longer need them now, when our phones nudge us at all hours, our inboxes bloat, and dystopian headlines scorch themselves onto our consciousnesses. Our laptops, when we look away from them, have optimized screen protection with a bland and dreamless sleep mode. What we abandoned with the death of screen savers -- themselves testifiers of disuse -- was a culture that could accept walking away from life onscreen.

Might we call the screen saver an artistic ideal? F. T. Marinetti, in 1909, planted the flag of futurism in the art world with the following declaration: "Up to now, literature has extolled a contemplative stillness, rapture, and reverie. We intend to glorify aggressive action, a restive wakefulness, life at the double, the slap and the punching fist." Despite screen savers' frequent tendency towards futurist abstractions, they revel in the stillness, rapture, and reverie Marinetti despised. Their banality approaches sublimity. Of course, we're now used to the heroic nostalgia with which custodians of culture acquire relics from the Internet's own dusty, evanescent museum. As emoticons, computer games, and GIFs are exhumed and then corralled into prestige institutions to be coated with the respectable patina of Art, we marvel at how what once was ubiquitous or clunky can now be considered aesthetically or conceptually profound. But of all the overlooked digital antiques of the computer's youth, perhaps the most thrilling is the screen saver. Visually mesmerizing, intellectually engaging, and nearly decommodified, the best screen savers achieve the virtues of multiple art movements. They even make a damning statement: the faintest human touch breaks their spell. [...]

You can't consume a screen saver in an instant. You can't fast-forward or rewind one. The genre, its own kind of endurance art, shuns immediacy. Fugitives from time, screen savers possess no real beginning or end. Their ouroboric nature is perhaps why preservations on YouTube, whether ten minutes or twelve hours long, tend to evoke disenchantment. Decades ago, stumbling upon a screen saver in a shared living room -- or perhaps finding an entire office full of them at lunchtime, cubicles lambent with workers' judiciously chosen modules -- likely signaled your own solitude. When you're watching one intentionally, that feeling never arrives. [...]

Then there are the slick stock photographs of fjords and aurora borealis so endemic to LCD and plasma, the islands we long to be marooned on. Screen savers depict what we desire -- often with a Ken Burns panning effect. [...]

If the nineties and early aughts were a time of collaborative whimsy for screen savers, our current era treats them as an afterthought. Inspecting my laptop's default modes, which include jubilant penguins, pastoral landscapes, and the cosmos, it becomes apparent that today's screen savers are designed to tranquilize. That's a shame. The screen savers of my youth told me life was full of rapture and reverie, and stillness, too.

Previously, previously.

Read the whole story
codersquid
21 hours ago
reply
chicago
Share this story
Delete

Is something rotten in the state of social psychology? Part Two: digging through the past

1 Share

Victorian lifeboat men rowing to rescue a stricken shipBy Alex Fradera

A new paper in the Journal of Personality and Social Psychology has taken a hard look at psychology’s crisis of replication and research quality and we’re covering its findings in two parts.

In Part One, published yesterday, we reported the views of active research psychologists on the state of their field, as surveyed by Matt Motyl and his colleagues at the University of Illinois at Chicago. Researchers reported a cautious optimism: research practices hadn’t been as bad as feared, and are in any case improving.

But is their optimism warranted? After all, several high-profile replication projects have found that, more often than not, re-running previously successful studies produces only null results. But defenders of the state of psychology argue that replications fail for many reasons, including defects in the reproduction and differences in samples, so the implications aren’t settled.

To get closer to the truth, Motyl’s team complemented their survey findings with a forensic analysis of published data, uncovering results that seem to bolster their optimistic position. In Part Two of our coverage, we look at these findings and why they’re already proving controversial.

Motyl and his colleagues used a relatively new type of analysis to assess the quality and honesty of the data found in over 500 previously published papers in social psychology. Their approach is technical, involving weirdly-named statistics conducted upon even more statistics, so it helps to use an analogy: Just as a vegetable garden produces a variety of tomatoes, some bigger than others, some misshapen, some puny and poor for eating, an honestly-conducted body of research should bear a range of fruit in the same way. True experimental effects shouldn’t always come out exactly the same: they should vary in size from experiment to experiment, including instances when the effect is too small to be statistically significant.

These are the sorts of things you can evaluate in a body of research – in this case with the Test for Insufficient Variance, which Motyl’s study used alongside six other indices. When there were too many irregularities in the data, or bizarre regularity like identikit supermarket tomatoes, this suggested to Motyl and his colleagues that questionable research practices may have been used to make the weak results swell up to reach the desired appearance.

Crucially, however, the study found that more often than not, the indices showed low levels of anomalies, suggesting research practices are more likely to be acceptable than questionable. This was the case for studies from 2003-4, before the crisis was fully acknowledged, and the researchers found an even better picture for more recent (2013-14) papers. The fruits of the research may have been tampered with from time to time, but there was no case that the entire enterprise was “rotten to the core”.

This optimistic conclusion conflicts with similar analyses performed in the past, but this might be explained by the different approaches of collecting the data – of gathering the fruit, if you will. Past approaches automatically scraped articles for every instance of a statistic, such as every listed p-value. But this is like a bulldozer ripping out a corner of a garden and measuring everything that looks anything like a tomato, including stones and severed gnome-heads. To take just one example, articles will often list p-values for manipulation checks: confirmations that an experimental condition was set up correctly (did participants agree that the violent kung-fu clip was more violent than the video of grass growing?). But these aren’t tests to determine new scientific knowledge, rather – turning to another analogy – the equivalent of a chemist checking their equipment works before running an experiment. So Motyl’s team took a more nuanced approach, reading through every article and picking out by hand only the relevant statistics.

However, all is not rosy in the garden. At their Datacolada blog, “state of science” researchers Joseph Simmons, Leif Nelson, and Uri Simonsohn, have already responded to the new analysis and they’re sceptical. Simmons and co first note the daunting scale of the new enterprise: to correctly identify 1800 relevant test statistics from 500 papers. In an online response, Motyl’s team agreed that yes, it was time consuming, and yes, it required a lot of hands: “there are reasons this paper has many authors: It really took a village,” they said.

But Datacolada sampled some of the statistics that Motyl’s team used in their assessments and they argue that far too many of them were inappropriate, including data from manipulation checks that Motyl’s group had themselves categorised as statistica non grata. To the Datacolada team, this renders the whole enterprise suspect: “We are in no position to say whether their conclusions are right or wrong. But neither are they.” In their response, Motyl’s team make some concessions, but they argue that some of the statistic selection comes down to difference of opinion, and defend both their overall  procedure, and the amount of coding errors they expect their study will contain. So….

So?

So doing high-quality science isn’t straightforward. Neither is doing high-quality science on the quality of science, nor is gathering everything together to form high-quality conclusions. But if we care about the validity of the more sexy findings in psychology – the amazing powers of power poses to make you physically more confident, how you can hack your happiness simply by changing your face, and how even subtle social signals about age, race or gender can transform how we perform at tasks – we need to care about psychological science itself, how it’s working and how it isn’t. (By the way, those findings I just listed? They’ve all struggled to replicate.)

There are surely ways to to improve the methods of this new study – perhaps not coincidentally, Datacolada’s Leif Nelson is running a similar project – but even if the new assessment does include some irrelevant statistics, it will likely be an advance on past analyses that included every irrelevant statistic.

So … the new insights have budged my position on the state of science a little: I’m still worried, but I can see a little more light among the dark. Motyl’s group make the case that social psychology isn’t ruined, that the garden isn’t totally contaminated. I hope so. But it’s not hope on its own that will move our field forward, but research, debate, and making sense of the evidence. After all, psychology is too good to give up on.

The State of Social and Personality Science: Rotten to the Core, Not so Bad, Getting Better, or Getting Worse?

Main image: An illustration from ‘The Family Friend’ published by S.W. Partridge & Co. (London, 1874). Lifeboat men rowing towards a wrecked ship in high seas. (via GettyImages.co.uk under licence)

Also check out:

Alex Fradera (@alexfradera) is Contributing Writer at BPS Research Digest






Read the whole story
codersquid
2 days ago
reply
chicago
Share this story
Delete

Was the “crisis” in social psychology really that bad? Have things improved? Part One: the researchers’ perspective

1 Share

Antique illustration of By Alex Fradera

The field of social psychology is reeling from a series of crises that call into question the everyday scientific practices of its researchers. The fuse was lit by statistician John Ioannidis in 2005, in a review that outlined why, thanks particularly to what are now termed “questionable research practices” (QRPs), over half of all published research in social and medical sciences might be invalid. Kaboom. This shook a large swathe of science, but the fires continue to burn especially fiercely in the fields of social and personality psychology, which marshalled its response through a 2012 special issue in Perspectives on Psychological Science that brought these concerns fully out in the open, discussing replication failure, publication biases, and how to reshape incentives to improve the field. The fire flared up again in 2015 with the publication of Brian Nosek and the Open Science Collaboration’s high-profile attempt to replicate 100 studies in these fields, which succeeded in only 36 per cent of cases. Meanwhile, and to its credit, efforts to institute better safeguards like registered reports have gathered pace.

So how bad did things get, and have they really improved? A new article in pre-print at the Journal of Personality and Social Psychology tries to tackle the issue from two angles: first by asking active researchers what they think of the past and present state of their field, and how they now go about conducting psychology experiments, and second by analysing features of published research to estimate the prevalence of broken practices more objectively.

The paper comes from a large group of authors at the University of Illinois at Chicago under the guidance of Linda Skitka, a distinguished social psychologist who participated in the creation of the journal Social Psychological and Personality Science and who is on the editorial board of many more social psych journals, and led by Matt Motyl, a social and personality psychologist who has published with Nosek in the past, including on the issue of improving scientific practice.

Psychology research is the air that we breathe at the Digest, making it crucial that we understand its quality. So in this two-part series, we’re going to explore the issues raised in the University of Illinois at Chicago paper, to see if we can make sense of the state of social psychology, beginning in this post with the findings from Motyl et al’s survey of approximately 1,200 social and personality psychologists, from graduate students to full professors, mainly from the US, Europe and Australasia.

Motyl’s team began by asking their participants about the state of the field now as opposed to 10 years ago. On average, participants believed that older research would only replicate in 40 per cent of cases – quite close to Nosek’s figure – but they believed that research being conducted now would have a better rate, about 50 per cent, and that generally the field was improving itself in response to the crisis.

Motyl’s team also canvassed the respondents on a range of questionable research practices, sketchy behaviours like neglecting to report all the measures taken, or quietly dropping experimental conditions from your study. Thanks particularly to work by Joseph Simmons, Leif Nelson, and Uri Simonsohn, we understand just how much these practices compromise the assumptions of scientific significance testing, making it easy to produce false positive results even in the absence of fraudulent intent. In their words, QRPs are not wrong “in the way it’s wrong to jaywalk”, the way that researchers have often implicitly been encouraged to think of them, but “wrong the way it’s wrong to rob a bank.”

Previous surveys of researchers’ own QRP usage have uncovered high levels of admissions, as if the field was rushing to the confession box to purge their sins. Here, Motyl’s team used finer-grained questioning to look at frequency (often a “yes” turned out to be “rarely” or “once”) and justification. In some cases, a researcher’s justification showed that they had misinterpreted the question and that they were actually expressing strong disapproval of the QRP – in fact, this seemed to be the case in virtually all “confessions” of data fabrication. In other cases, the context provided by a justification painted the particular research practice in a completely different light.

For example, consider the seemingly dodgy decision to drop conditions from your study analysis. If your rationale is that the condition didn’t turn out to do what you want to do – in an emotion and memory study, your sad video didn’t produce a sad mood in participants, for instance – it’s actually more problematic to keep what is effectively a bogus condition in your analysis than it is to exclude it (ideally in a principled way according to a registered procedure). For the new survey, independent judges evaluated all the stated justifications, and felt they legitimised the “questionable” practices in 90 per cent of cases.

Discovering these misunderstandings and justifiable practices littered through the QRP data led Motyl’s team to conclude that pre-explosion psychology practices aren’t as derelict as once feared, although the fact that 70 per cent respondents said they are now less likely to engage in many of these practices than ten years ago suggests that all was not entirely virtuous back then.

So not perfect, but getting better, is the take within the field: a cautious optimism compared to some dire pronouncements on the state of psychology. In Part  Two, we’ll look at the body of psychological research itself, to see if this optimism is justified.

The State of Social and Personality Science: Rotten to the Core, Not so Bad, Getting Better, or Getting Worse?

Main image: Antique illustration of ‘A nous! A nous!’ by Morlon (via GettyImages.co.uk under licence)

Here is Part Two of our coverage. Also check out:

Alex Fradera (@alexfradera) is Contributing Writer at BPS Research Digest






Read the whole story
codersquid
2 days ago
reply
chicago
Share this story
Delete

Solid Snakes or: How to Take 5 Weeks of Vacation

1 Share

No matter whether you run a web app, search for gravitational waves, or maintain a backup script: reliability of your systems make the difference between sweet dreams and production nightmares at 4am.

Read the whole story
codersquid
5 days ago
reply
chicago
Share this story
Delete

Introducing the new goodtables library and goodtables.io

1 Share

Information is everywhere. This means that there is so much we need to know at any given time, but such limited capacity and time to internalize it all. True art, therefore, lies in the ability to draw summaries adequate enough to save time and impart knowledge. From the 1880s, tabulation has been our go-to method for compacting information, not only to preserve it, but also to make analyses and draw meaningful conclusions out of it.

Tables, comprised of rows and columns of related data, are not always as easy to analyze, and especially not when there are thousands of rows of data. Mixed data types, missing data, or ill-suited data in tables are but a few reasons why tabular data is often a nightmare to work with in its raw state, often referred to as “dirty” data.

Enter goodtables.

goodtables python library

The goodtables library

goodtables is a Python library that allows users to inspect tabular data, checking it for both structural and schematic errors, and giving pointers on plausible error fixes, before users draw analyses on the data using other tools. At its most basic level, goodtables highlights general errors in tabular files that would otherwise prevent loading or parsing.

Since the release of goodtables v0.7 in early 2015, the codebase has evolved, allowing for additional use cases while working with tabular data. Without cutting back on functionality, goodtables v1 has been simplified, and the focus is now on extensible data validation.

Using goodtables

goodtables is still in alpha, so we need to pass the pre-release flag (--pre) to pip to install. With that, installation of goodtables v1 is as easy as pip install goodtables --pre.

The goodtables v1 CLI supports two presets by default: table and datapackage. The table preset allows you to inspect a single tabular file.

Example:

goodtables --json table valid.csv returns a JSON report for the file specifying the error count, source, and validity of the data file among other things.

The datapackage preset allows you to run checks on datasets aggregated in one container. Data Packages are a format for coalescing data in one ‘container’ before shipping it for use by different people and with different tools.

Example:

goodtables datapackage datapackage.json allows a user to check a data package’s schema, table by table, and gives a detailed report on errors, row count, headers, and validity or lack thereof of a data package.

You can try out these commands on your own data or you can use datasets from this folder.

Customization

In addition to general structure and schema checks on tabular files available in v0.7, the goodtables library now allows users to define custom (data source) presets and run custom checks on tabular files. So what is the difference?

While basic schema checks inspect data against the data quality spec, custom_check gives developers leeway to specify acceptable values against data fields so that any values outside of the defined rules are flagged as errors.

custom_preset allows users to define custom interfaces to your data storage platform of choice. They allow you to tell goodtables where your dataset is held, whether it is hosted on CKAN, Dropbox, or Google Drive.

Any presets outside of the built-in ones above are made possible and registered through a provisional API.

Examples:

  • CKAN custom preset: CKAN is the world’s leading open data platform developed by Open Knowledge International to help streamline the publishing, sharing, finding and using of data. Here’s a custom preset that, for example, could help the user run an inspection on datasets from Surrey’s Data Portal which utilizes CKAN.

  • Dropbox custom preset: Dropbox is one of the most popular file storage and collaboration cloud service in use. It ships with an API that makes it possible for third party apps to read files stored on Dropbox as long as a user’s access token is specified. Here’s our goodtables custom preset for Dropbox. Remember to generate an access token by first creating a Dropbox app with full permissions.

  • Google Sheets custom preset: The Google Sheets parser to enable custom preset definition is currently in development. At present, for any data file stored in Google Drive and published on the web, the command goodtables table google_drive_file_url inspects your dataset and checks for validity, or lack thereof.

Validating multiple tables

goodtables also allows users to carry out parallel validation for multi-table datasets. The datapackage preset make this possible.

Example:

Frictionless Data is a core Open Knowledge International project and all goodtables work falls under its umbrella. One of the pilots working with Frictionless Data is DM4T, with an aim to understand the extent to which Data Package concepts can be applied in the energy sector. DM4T pilot’s issue tracker lives here and its Data Package comprises of 20 CSV files and is approximately 6.7 GB in size.

To inspect DM4T’s energy consumption data collected from 20 households in the UK, run:

goodtables --table-limit 20 datapackage https://s3-eu-west-1.amazonaws.com/frictionlessdata.io/pilots/pilot-dm4t/datapackage.json

In the command above, the --table-limit option allows you to check all 20 tables, since by default goodtables only runs checks on the first ten tables by default. You can find plenty of sample Data Packages for use with goodtables in this repository.

So why use GitHub for storage of data files? At Open Knowledge International, we highly recommend and work with others to use GitHub repositories for dataset storage.

PRO TIP: In working with datasets hosted on GitHub, say the countries codes Data Package, users should use the raw file URL with goodtables, since support for GitHub URL resolution is still in development.

Standards and other enhancements

goodtables v1 also works with our proposed data quality specification standard, which defines an extensive list of standard tabular data errors. Other enhancements from goodtables v0.7 include:

  1. Breaking out tabulator into its own library. As part of the Frictionless Data framework, tabulator is a Python library that has been developed to provide a consistent interface for stream reading and writing tabular data that is in whatever format, be it CSV, XML, etc. The library is installable via pip: pip install tabulator.
  2. Close to 100% support for Table Schema due to lots of work on the underlying Python library. The Table Schema Python library allows users to validate dataset schema and, given headers and data, infer a schema as a python dictionary based on its initial values.
  3. Better CSV parsing, better HTML detection, and less false positives.

goodtables.io

goodtablesio

Moving forward, at Open Knowledge International we want to streamline the process of data validation and ensure seamless integration is possible in different publishing workflows. To do so, we are launching a continuous data validation hosted service that builds on top of this suite of Frictionless Data libraries. goodtables.io will provide support for different backends. At this time, users can use it to check any datasets hosted on GitHub and Amazon S3 buckets, automatically running validation against data files every time they are updated, and providing a user friendly report of any issues found.

Try it here: goodtables.io

This kind of continuous feedback allows data publishers to release better, higher quality data and helps ensure that this quality is maintained over time, even if different people publish the data.

Using this dataset on Github, here’s sample output from data validation run on goodtables.io:

illustrating data validation on goodtables.io

Updates on the files in the dataset will trigger a validation check on goodtables.io. As with other projects at Open Knowledge International, goodtables.io code is open source and contributions are welcome. We hope to build functionality to support additional data storage platforms in the coming months, please let us know which ones to consider in our Gitter chat or on the Frictionless Data forum.

Read the whole story
codersquid
7 days ago
reply
chicago
Share this story
Delete
Next Page of Stories