Scientists say programmers can be easily identified by the code they write
Share on Twitter.
Get the most reliable SMTP service for your business. You wished you got it sooner!
January 22, 2015
Drexel University PhD student Aylin Caliskan says there's simply no such thing as an anonymous
programmer-- your unique coding style can really unmask you for all the world to see.
In work that has serious implications for anyone believing their open source project contributions
are 'anonymous', the University researchers found that as many as 95 percent of contributors to
a decent-sized code base can be easily identified among the group.
That's not a trivial issue, since the researchers cite the case of Iranian coder Saeed Malekpour,
condemned to death for developing a photo upload tool. The sentence was later commuted to life in prison.
With co-authors from the U.S. Army Research Laboratory, the University of Maryland, Princeton and
Germany's University of Goettingen, Caliskan applied machine learning to identify 250 C++ authors
using their coding style on Google Code Jam projects.
From her paper-- “Perhaps a programmer has a preference for spaces over tabs, or while loops
over for loops, or more subtly, modular rather than monolithic code. Can elements of coding style be
extracted computationally, and if so, what features are most informative?”
Scraping the work of successful contributors to the Google Code Jam competition, the researchers
discovered that a mere eight training files with 70 lines of code each were enough to identify authors
based in their syntactic, lexical and layout habits.
Moreover, code obfuscation doesn't help hide you-- “Our syntactic feature set is impervious to off-the-shelf
code obfuscators, which only change layout and some lexical features,” she added.
The individual programmer's skill is also important to the analysis-- “Overall, programmers
who are more advanced and are able to solve more dif?cult tasks have more distinct coding styles
than programmers that are not as advanced.”
“Programmers with a larger skill set can be identi?ed more easily and with higher con?dence. Programmers
reveal more individual coding style in more challenging programming tasks,” she said.
As well as software forensics, including tracking authors of malicious software, the paper
notes that its techniques could also be used to identify plagiarism among computer science students.
Interestingly, the team's research also reveals that a conventional wisdom that concise code is
good code may not hold true.
It turned out that Google Code Jam contestants that successfully completed the most tasks in the competition
wrote longer code than those completing fewer tasks.
The study goes on to say-- “Programmers with better skills tend to write longer code to solve
Google Code Jam issues. The mainstream concept is that better programmers write shorter and cleaner
code which contradicts with various line of code statistics”.
Source: Drexel University.
Get the most dependable SMTP server for your company. You will congratulate yourself!
Share on Twitter.