Széchenyi Plan Plus | Government of Hungary. Funded by the European Union. NextGeneration EU.

EN HU
  • Discover
    • News
    • Events
  • Research fields
  • Resources
    • Publications
    • Downloads
    • Brochure
  • About us
  • Partners
  1. Home
  2. Publications
(2021) IEEE ACCESS 2169-3536 9 79089-79104

Large Scale Evaluation of Natural Language Processing Based Test-to-Code Traceability Approaches

doi.org/10.1109/ACCESS.2021.3083923
Abstract

Traceability information can be crucial for software maintenance, testing, automatic program repair, and various other software engineering tasks. Customarily, a vast amount of test code is created for systems to maintain and improve software quality. Today’s test systems may contain tens of thousands of tests. Finding the parts of code tested by each test case is usually a difficult and time-consuming task without the help of the authors of the tests or at least clear naming conventions. Recent test-to-code traceability research has employed various approaches but textual methods as standalone techniques were investigated only marginally. The naming convention approach is a well-regarded method among developers. Besides their often only voluntary use, however, one of its main weaknesses is that it can only identify one-to-one links. With the use of more versatile text-based methods, candidates could be ranked by similarity, thus producing a number of possible connections. Textual methods also have their disadvantages, even machine learning techniques can only provide semantically connected links from the text itself, these can be refined with the incorporation of structural information. In this paper, we investigate the applicability of three text-based methods both as a standalone traceability link recovery technique and regarding their combination possibilities with each other and with naming conventions. The paper presents an extensive evaluation of these techniques using several source code representations and meta-parameter settings on eight real, medium-sized software systems with a combined size of over 1.25 million lines of code. Our results suggest that with suitable settings, text-based approaches can be used for test-to-code traceability purposes, even where naming conventions were not followed.

Authors
András Kicsi
Viktor Csuvik
László Vidács
Institutes

Become a partner

Subscribe to newsletter

Send partnership request

Explore

  • News
  • Events
  • Publications
  • Downloads
  • Partners

Research fields

  • Foundations of AI
  • Human Language Processing
  • Machine perception
  • Medical, Health and Biology
  • Security and Privacy
  • Sensors, IoT and Telecommunications

Contact us

Hungary, H-1111 Budapest,
Kende u. 13-17.

+36 1 279 6000

milab@sztaki.hun-ren.hu

© 2020-2021 Artifical Intelligence National Laboratory, Budapest