Excel converts gene names to dates: bigger problem than expected


The problem of gene names converted unnoticed by Excel in scientific publications is even greater than previously assumed. This is what a team led by Mandhri Abeysooriya from Deakin University in Australia discovered. Almost one in three scientific publications with an Excel list of genes attached showed such errors, previously it was around 20 percent. After the problem had been known for years, there was no improvement either, the researchers warn. Just over a year ago, the Human Genome Organization committee responsible for naming human genes changed dozens of names to remedy the situation.

Mark Ziemann, who is now involved in the study, had already drawn attention to the problem five years ago. The point is, Microsoft’s spreadsheet Excel automatically converts certain alphanumeric gene names to dates without notice. After Microsoft did not react and no other solution emerged, the HUGO Gene Nomenclature Committee (HGNC) officially renamed several dozen genes last year. Since then, the MARCH1 gene is now called MARCHF1 (“Membrane Associated ring-CH-type finger 1”), and SEPT1 has become SEPTIN1 (“Septin 1”). In an English Excel spreadsheet, this became “1-Mar” or “1-Sep”. In the German versions, the behavior can be traced with “MÄRZ1”.

To quantify whether paying attention to the problem could reduce the number of errors, Abeysooriya and her Ziemann colleagues analyzed more than 11,000 scientific publications on genetics topics with Excel attachments. They appeared in specialized magazines between 2016 and 2020, To explain. Almost one in three tables therefore contained such errors; in 2016, it found an error rate of around 20%. The team admits that the name change should have reduced the problem in the meantime. This will not go away, however, partly because it was just genes from humans, mice and rats. Genes from other animals could still trigger such conversions. In addition, possible problems in Excel tables in other languages ​​have not been addressed.

The research team does not take responsibility for the software by responsibility, but they also do not wait for Microsoft’s reaction. Instead, they give the researchers themselves recommendations for possible countermeasures. Excel is not intended for this job anyway; for example, analyzes scripted in Python or R would be useful here. A programming language would have to be learned for this, but it would still pay off in the long run. But if a spreadsheet is really to be used, then recommend LibreOffice, as the problem does not occur there. And if you really can’t do without Excel, you need to be especially careful when including data.


Source of the article

Disclaimer: This article is generated from the feed and is not edited by our team.

Source link


Leave A Reply