# CCLearner ## Folders and Files - CCLearner_Feature -- Generate data for training model - CCLearner_Test -- Detect clone pairs by leveraging training models - CCLearner_Train -- Generate training models - Recall_Query -- SQL scripts for calculating recall rates of different types of clones - Run -- Jar Files and dependencies for easy mode - CCLearner.conf -- Configuration file of CCLearner ## Prerequisite - Ubuntu14.04, JAVA 8 ## BigCloneBench Preparation #### Extract SQL script ``` $ tar -xvzf era_bigclonebench.sql.tar.gz ``` #### Extract raw java files ``` $ tar -xvzf era_bcb_sample.tar.gz ``` #### PostgreSQL installation ``` $ apt-get update $ apt-get install postgresql postgresql-contrib ``` #### Database configuration and data import ``` # Change user $ sudo -i -u postgres # Run PostgreSQL console $ psql # Create dependent roles for BigCloneBench postgres=# CREATE ROLE postgresql; postgres=# CREATE ROLE bigclonebench; # Data dump postgres=# \i /home/cclearner/Desktop/CCLearner/era_bigclonebench.sql # Create another user for use CREATE USER cclearner with PASSWORD 'cclearner'; ALTER ROLE cclearner Superuser; ``` #### pgAdmin installation ``` $ apt-get install pgadmin3 ``` ## Customization To run all the experiments in our paper, the following parameters could be changed. For 1-7, change the path with your own username and directory. 1. source.file.path 2. output.dir 3. feature.file.path 4. model.file.path 5. pos.file.path 6. sim.file.path 7. clones.file.path 8. feature.num 9. feature.name 10. training.iteration 11. training.input.num 12. training.hidden.num (also need to modify the source file in CCLearner_Train) 13. testing.folder (users can reduce the number of testing folders to save time) ## Execution -- Easy Mode (Recommended) By using the default or modified configuration file, go to Run folder and execute the following commands ``` java -jar CCLearner_Feature.jar java -jar CCLearner_Train.jar java -jar CCLearner_Test.jar (may take some time) ``` ## Execution -- Developer Mode To change datasets, more parameters or the source code, open CCLearner_Feature, CCLearner_Train, CCLearner_Test, rebuild and rerun the given project ## Evaluation #### Data import Table "tools_clones" in PostgreSQL is used for data import. It is better to use pgAdmin to truncate table and import csv file into database. 1. Double click server's name to connect server and database 2. Right click "tools_clones" and click "truncate". 3. Right click "tools_clones" and click "import..." (Choose Filename; Format - "csv"; Encoding - "UTF8") #### Calculate recall rate In pgAdmin, click SQL icon on the top menu, choose one query file from Recall_Query folder and execute the query. The numbers of true clones with different types in BigCloneBench for testing are T1(2,383), T2(671), VST3(873), ST3(5,365), MT3(31,413), WT3/4(1,540,513). Recall Rate = Query Result / corresponding number of true clones