Get the lowest-cost and the best server colocation service in the business. Learn more.
Information Technology News.

Scientists say programmers can be easily identified by the code they write

Share on Twitter.

Get the most reliable SMTP service for your business. You wished you got it sooner!

Click here to order the best deal on a HP enterprise dedicated server and at a great price.

January 22, 2015

Drexel University PhD student Aylin Caliskan says there's simply no such thing as an anonymous programmer-- your unique coding style can really unmask you for all the world to see.

In work that has serious implications for anyone believing their open source project contributions are 'anonymous', the University researchers found that as many as 95 percent of contributors to a decent-sized code base can be easily identified among the group.

That's not a trivial issue, since the researchers cite the case of Iranian coder Saeed Malekpour, condemned to death for developing a photo upload tool. The sentence was later commuted to life in prison.

With co-authors from the U.S. Army Research Laboratory, the University of Maryland, Princeton and Germany's University of Goettingen, Caliskan applied machine learning to identify 250 C++ authors using their coding style on Google Code Jam projects.

From her paper-- “Perhaps a programmer has a preference for spaces over tabs, or while loops over for loops, or more subtly, modular rather than monolithic code. Can elements of coding style be extracted computationally, and if so, what features are most informative?”

Scraping the work of successful contributors to the Google Code Jam competition, the researchers discovered that a mere eight training files with 70 lines of code each were enough to identify authors based in their syntactic, lexical and layout habits.

Moreover, code obfuscation doesn't help hide you-- “Our syntactic feature set is impervious to off-the-shelf code obfuscators, which only change layout and some lexical features,” she added.

The individual programmer's skill is also important to the analysis-- “Overall, programmers who are more advanced and are able to solve more dif?cult tasks have more distinct coding styles than programmers that are not as advanced.”

“Programmers with a larger skill set can be identi?ed more easily and with higher con?dence. Programmers reveal more individual coding style in more challenging programming tasks,” she said.

As well as software forensics, including tracking authors of malicious software, the paper notes that its techniques could also be used to identify plagiarism among computer science students.

Interestingly, the team's research also reveals that a conventional wisdom that concise code is good code may not hold true.

It turned out that Google Code Jam contestants that successfully completed the most tasks in the competition wrote longer code than those completing fewer tasks.

The study goes on to say-- “Programmers with better skills tend to write longer code to solve Google Code Jam issues. The mainstream concept is that better programmers write shorter and cleaner code which contradicts with various line of code statistics”.

Source: Drexel University.

Get the most dependable SMTP server for your company. You will congratulate yourself!

Share on Twitter.

IT News Archives | Site Search | Advertise on IT Direction | Contact | Home

All logos, trade marks or service marks on this site are the property of their respective owners.

Sponsored by Sure Mail™, Avantex and
by Montreal Server Colocation.

       © IT Direction. All rights reserved.