Sourcerer/website/tutorial-command.html at master · Mondego/Sourcerer · GitHub - github.com

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
      <div class="content">
        <h1>Command Line Tutorial</h1>
        <div class="content_item">
          <p>This tutorial is designed to show how to download, compile, install, and use the Sourcerer infrastructure to build a database of Open Source Software. It is designed to be run from the command line, and Eclipse is needed only to provide supporting Java libraries. The requirements for this tutorial are:</p>
          <ul>
            <li>Oracle's Java SE Development Kit 8, <a href="http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html" target="_blank">here</a> (it should work with alternatives, such as <a href="http://openjdk.java.net/projects/jdk8/" target="_blank">OpenJDK</a>).</li>
            <li>Apache Ant, <a href="http://ant.apache.org/" target="_blank">here</a>.</li>
            <li>Eclipse Mars, <a href="https://www.eclipse.org/mars/" target="_blank">here</a> (also known to work with the previous version, Luna).</li>
            <li>A Mysql server, <a href="https://www.mysql.com/" target="_blank">here</a>.</li>
            <li><strong>Extractor</strong> Jar executable - this file can only be generated by folowing the <strong>Eclipse Tutorial</strong>. If you do not want to do that, you can download it <a href="downloads/Extractor_1.0.0.jar" target="_blank">here</a>.</li>
            <li>A test repository, <a href="downloads/test-repo.zip" target="_blank">here</a>.</li>
          </ul>
          <p>Start by clonning the Sourcerer repository:</p>
          <p class="command-line-block" > [sourcerer-path]$ git clone https://github.com/Mondego/Sourcerer.git </p>
          <p> where <strong class="command-line-text">[sourcerer-path]</strong> is the directory where you cloned Sourcerer.</p>

        <h1>Preparing your Environment</h1>

          <p>You will need to create three folders, as seen below:</p>
          <p class="command-line-block" > ~$ mkdir crawled-projects </br> ~$ mkdir extracted-projects </br> ~$ mkdir db-import-output </p>
          <p>We created these folders in the home directory. This means, for example, that the folder <strong>crawled-projects</strong> is accessed by typing <strong class="cd command-line-text">cd ~/crawled-projects/</strong>. We will follow this path, but there is no restriction where you create these folders, just be sure to adjust this tutorial accordingly.</p>
          <p>The folder <strong class="cd command-line-text">~/crawled-projects/</strong> should contain Java projects. To start, you can download our <a href="downloads/test-repo.zip" target="_blank">test repository</a> and move the contents to this folder.</p>
          <p> Follow the instructions <a href="https://dev.mysql.com/doc/refman/5.7/en/index.html" target="_blank">here</a> to set-up mysql and create a database. Create a user for the database and give it writing permissions. Save the <strong>DATABASE-URL</strong> (for ex.: 127.0.0.1 if the server is running in your machine), <strong>DATABASE-NAME</strong>, the <strong>DATABASE-USER</strong> and <strong>DATABASE-PASSWORD</strong>.</p>

        <h1>Compiling Everything</h1>

        <p>Go to <strong class="command-line-text">[sourcerer-path]/bin/</strong>, and you will see a script <strong>build-add.sh</strong>. Run this script, and you will build all the tools available. We will not need all of them for this tutorial, but the process is relatively fast and it does not hurt to do so.</p>

        <h2>Want to know more?</h2>
        <p>The script <strong>build-add.sh</strong> should run on Windows and UNIX systems. You can take a look inside, it just calls a series of <strong>ant</strong> tasks to create executable <strong>Jar</strong> files. You can individually run these to create only the tools you need.</p>

        <h1>Extracting Information</h1>

          <p>We start by extracting information from <strong class="cd command-line-text">~/crawled-projects/</strong> into the intermediate folder <strong class="cd command-line-text">~/extracted-projects/</strong>, which contains information like fully qualified names and metrics about each project.</p>
          <p>The first step is to aggregate the jar files found in <strong class="cd command-line-text">~/crawled-projects/</strong>:</p>
          <p class="command-line-block" > $ cd [sourcerer-path]/bin/dist/ </br> $ java -jar java-repo-tools --aggregate-jar-files --input-repo ~/crawled-projects</p>
          <p>Note that there is now a new folder <strong class="cd command-line-text">~/crawled-projects/Jars/</strong>.</p>

          <p>The extractor is an Eclipse plugin, so to run it the file must be moved into the eclipse workspace. Assuming
          <strong >[eclipse]</strong> is the directory where eclipse was installed,
          move <strong >Extractor_1.0.0.jar</strong> (remember, you can download it with the link above)
          into the folder <strong >[eclipse]/plugins/ </strong>:
          <p class="command-line-block">$ cp [sourcerer-path]/bin/dist/Extractor_1.0.0.jar [eclipse]/plugins/</p>

          <p>To run the extractor, which is a plugin, you need to use Eclipse's launcher. It can be found in <strong class="cd command-line-text">[eclipse]/plugins/</strong> (that is right, the folder you moved the Extractor to).</p>

          <p>To test the extractor, run</p>
          <p class="command-line-block" > $ java -jar [eclipse]/plugins/org.eclipse.equinox.launcher_[VERSION].jar -consolelog -application Extractor.Extractor</p>
          where <strong class="cd command-line-text">[VERSION]</strong> is the version of Eclipse you have installed. You should see a list of parameters for the extractor.

          <p>The next steps are to prepare the target repository to receive the libraries, and extract the libraries, the Jars and the projects.</p>

          <p class="warning" >The next steps imply a very large number of pre-processing tasks, as a huge volume of information is extracted for each projects. The tasks are, therefore, very slow. Please be patience and be sure each task finishes before starting the next one.</p>

          <p>Prepare the temporary repository to receive the existing libraries:</p>
          <p class="command-line-block" > $ java -jar [eclipse]/plugins/org.eclipse.equinox.launcher_[VERSION].dist.jar -application Extractor.Extractor --add-libraries-to-repo --output-repo ~/extracted-projects --input-repo ~/crawled-projects</p>
          <p>Add the libraries from the crawled projects into the temporary repository:</p>
          <p class="command-line-block" > $ java -jar [eclipse]/plugins/org.eclipse.equinox.launcher_[VERSION].dist.jar -application Extractor.Extractor --extract-libraries --output-repo ~/extracted-projects --input-repo ~/crawled-projects</p>
          <p>Add the jars from the crawled projects into the temporary repository:</p>
          <p class="command-line-block" > $ java -jar [eclipse]/plugins/org.eclipse.equinox.launcher_[VERSION].dist.jar -application Extractor.Extractor --extract-maven-jars --output-repo ~/extracted-projects --input-repo ~/crawled-projects</p>
          <p>Add information from the projects from the crawled projects into the temporary repository:</p>
          <p class="command-line-block" > $ java -jar [eclipse]/plugins/org.eclipse.equinox.launcher_[VERSION].dist.jar -application Extractor.Extractor --extract-projects --output-repo ~/extracted-projects --input-repo ~/crawled-projects</p>

          <p>If you inspect the folder <strong class="cd command-line-text">~/extracted-projects/</strong> you can now see a set of folders containing metadata about each of the projects in <strong class="cd command-line-text">~/crawled-projects/</strong>. The next step is to use this interemediate folder and create a relational database.</p>

          <h2>Want to know more?</h2>
          <p>This extraction shows the steps we took when creating the various <strong>Datasets</strong> of Sourcerer. In particular, the raw dataset <strong>sourcerer_repo_2011</strong> contains the same type of information that we find in <strong class="cd command-line-text">~/crawled-projects/</strong>, and the extracted dataset <strong>sourcerer_repo_2011_extracted</strong> contains the same type of information that we find in <strong class="cd command-line-text">~/extracted-projects/</strong>, although on a much larger scale.</p>

        <h1>Creating Database</h1>

          <p>The final step is to create a relational database from the information extracted to <strong class="cd command-line-text">~/extracted-projects/</strong>.</p>

          <p class="warning" >The tasks may be slow, depending on how much data you have. Please be patient and be sure each task finishes before starting the next one.</p>

          <p>Initialize the database:</p>
          <p class="command-line-block" > $ java -jar db-import.jar --database-url jdbc:mysql://[DATABASE-URL]/[DATABASE-NAME] --database-user [DATABASE-USER] --database-password [DATABASE-PASSWORD] --initialize-db</p>
          <p>Import the extracted libraries into the database:</p>
          <p class="command-line-block" > $ java -jar db-import.jar --database-url jdbc:mysql://[DATABASE-URL]/[DATABASE-NAME] --database-user [DATABASE-USER] --database-password [DATABASE-PASSWORD] --add-libraries --input-repo ~/extracted-projects --output ../db-import-output/</p>
          <p>Import the extracted jars into the database:</p>
          <p class="command-line-block" > $ java -jar db-import.jar --database-url jdbc:mysql://[DATABASE-URL]/[DATABASE-NAME] --database-user [DATABASE-USER] --database-password [DATABASE-PASSWORD] --add-jars --input-repo ~/extracted-projects --output ../db-import-output/</p>
          <p>Import the extracted projects into the databases:</p>
          <p class="command-line-block" > $ java -jar db-import.jar --database-url jdbc:mysql://[DATABASE-URL]/[DATABASE-NAME] --database-user [DATABASE-USER] --database-password [DATABASE-PASSWORD] --add-projects --input-repo ~/extracted-projects --output ../db-import-output/</p>

          <p>Please refer to the tab <strong>Database</strong> to see some examples of queries you can do.</p>

          <h2>Want to know more?</h2>
          <p>The creation of the database shows the steps we took when creating the <strong>Database</strong> of Sourcerer. In particular, the raw dataset extracted dataset <strong>sourcerer_repo_2011_extracted</strong> contains the same type of information that we find in <strong class="cd command-line-text">~/extracted-projects/</strong>, and the database we generated has the same schema and the same type of information found in the Sourcerer database.</p>

          <p class="warning" >Did you have any problem running this tutorial? Do you have a suggestion or a comment? </br> Please reach us on: <span class="reverse">ude.icu@oriebirp</span>.</p>

        </div>
      </div>