SMILESENG

Page 56 - SMILESENG

P. 56

Intl. Summer School on Search- and Machine Learning-based Software Engineering
of building commonly used action sequences, need of assistive tools to configure GHA, and the latency of communication among jobs through the artifacts [2].
Valenzuela-Toledo et al. have studied the modifications on workflows’ configuration. By manual inspection of 222 commits on GHA workflows of 10 software projects, they have highlighted the need of tools for workflow creation/edition, identification of syntax errors, and recommendation of specific common task [3].
III. PROPOSED RESEARCH QUESTIONS
To reach the goal of this research, I will focus on answering the following research questions:
RQ1 How GHA is adopted across different programming languages?
There has been studies that looked at the porpotion of GHA usage among different projects with different PLs [4], [2]. But none looked deeper to see if there is any PL specific characteristic in GHA adoption. Con- sequently, to answer this research question I want to see how projects make use of GHA by looking at the type of actions they use, number/kinds of jobs/workflows, number of steps in each job, and the relation between jobs in a workflow. More specifically, I would like to see how GHA adoption differs in Java projects that already benefit from Maven or Gradle, and other projects that do not have such build automation tools.
RQ2 What kind of tasks are mostly automated by GHA?
Other related works have only looked at the actions and did a categorization of actions used in projects’ workflows [4], [2]. However, I aim to see what tasks are automated, how they are automated, and whether they are using the same actions for the same purposes? If they are not, what are the reasons?
RQ3 What are the common patterns in GHA workflows configuration?
Common patterns of workflow configurations are of high importance, since they can be recommended to other projects with similar properties. I want to look deeper into these configurations to find the common patterns. As an example, obviously “actions/checkout” is the most commonly used action among projects [4], [2] because they all need to “checkout” the changes before running any kind of actions on the repository. Hence, I want to be more focused at task level configurations that contain many actions, and also the relation among those actions.
IV. BUILDING THE DATASET
As an empirical study, the first step is to build the initial dataset of selected projects. Similar to Chen et al. [2], I will start with a set of popular projects on GitHub (using GitHub Search (GHS) dataset [5]) and then filter the ones using GHA workflows by checking if they have any files with YAML format in the “./github/workflows” directory.
Based on the proposed RQ1, the next step would be choosing the top PLs using GHA. Again according to Chen et
al. [2]’s study, JavaScript, Python, Go, TypeScript, and Java are among the top PLs using with more than 500 repositories for each of them.
Other steps include: cloning projects, extracting their YAML configuration files, and parsing them to obtain detailed well formatted configuration data. This data will be used in the analysis part to answer the research questions.
V. PRELIMINARY RESULTS
This work is still in the stage of brainstorming and exploring new ideas, so there are no preliminary results available yet. But, if obtained before the time of the live talk, will be presented at the SMILESENG Summer School.
VI. CONCLUSION
In this short paper, we saw the importance of studying GitHub Actions (GHA) workflows to find the so-called best practices in configuring software workflows on GitHub. Also, we saw there is a need of assistive tools for software devel- opers designing these workflows.
As a first step towards building the assistive tools, and proposing the best practices, I mentioned three research ques- tions, and briefly explained what are the benefits of answering them. Moreover, I mentioned a few steps of the data collection step of the study.
I am excited to see how participants of the SMILESENG Summer School find this research idea, receive feedbacks, and discuss each of the steps of conducting it.
REFERENCES
[1] M.Golzadeh,A.Decan,andT.Mens,“OntheriseandfallofCIservices in GitHub,” in 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2022.
[2] T. Chen, Y. Zhang, S. Chen, T. Wang, and Y. Wu, “Let’s supercharge the workflows: An empirical study of github actions,” in 21st IEEE International Conference on Software Quality, Reliability and Security, QRS 2021 - Companion, Hainan, China, December 6-10, 2021. IEEE, 2021, pp. 1–10. [Online]. Available: https://doi.org/10.1109/QRS- C55045.2021.00163
[3] P. Valenzuela-Toledo and A. Bergel, “Evolution of github action work- flows,” in 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2022.
[4] T. Kinsman, M. S. Wessel, M. A. Gerosa, and C. Treude, “How do software developers use github actions to automate their workflows?” in 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021, Madrid, Spain, May 17-19, 2021. IEEE, 2021, pp. 420–431. [Online]. Available: https://doi.org/10.1109/MSR52588.2021.00054
[5] O. Dabic, E. Aghajani, and G. Bavota, “Sampling projects in GitHub for MSR studies,” in International Conference on Mining Software
44
Repositories (MSR).
IEEE, 2021, pp. 560–564.

54 55 56 57 58