SoNeR
SoNeR (short for Social Network Ranker) is software that retrieves, from the Semantic Web, documents describing people and ranks them based on their popularity.
SoNeR supports the following functionality:
- Semantic Web crawling
- FOAF processing
- Identity synonym detection and merging based on an SVM classifier trained on manually labelled data
- User ranking based on PageRank and HITS algorithms
Demonstration video is available on Youtube: https://www.youtube.com/watch?v=-m5sXEYoagI
The software can be obtained from the Github Installationrelease page. In order to use this software a Java version capable of supporting JavaFX is required (Java 1.8+). Any additional installation configuration (such as an existing database system) isn't mandatory.
Quick start
SoNeR comes with a wizard application that combines the above mentioned functionality.
To start using it simply run the .jar provided on Github.
You will be greeted by the following page:
The application supports the following configuration:
- Download document output directory
- Database settings
- Initial page of crawling and download amount
- Crawling method (breadth first or depth-first)
To try the application out, simply press "Start" as the default settings should work. Alternatively one can specify a different initial URL.
Advanced
The advanced application configuration is accessible via the Advanced button in the GUI:
The following values are configurable:
- Amount - maximum number of pages that will be downloaded.
- Download folder - folder used to keep and load downloaded pages. It is possible to manually download FOAF files, put them in this folder and skip the crawling step entirely. Files that have been previously downloaded won't be deleted when new download is initiated. We provide an existing dataset can be obtained from the following link in order to achieve our results.
- Database URL - path to the database. For SQLite databases (default) this should point to a file.
- Database username - used to login to the database (should have full CRUD rights on the database). Not required for SQLite databases.
- Database password - password for the database user.
- Database driver - Java driver used to connect to the database. The supported values are: org.sqlite.JDBC for SQLite, org.postgresql.Driver for PostgreSQL and com.mysql.jdbc.Driver for MySQL.
- Starting step - defines the initial step of the wizard. By default it starts with the Download step, but it's possible to skip it and start at an arbitrary task.
- Automatic next step - can be enabled to automatically proceed to the next setup. This is disabled by default so output of each step can be viewed.
For details regarding the setup please consult the SoNeR.properties.template file. It is possible to use a properties file to load desired settings by specifying the -p command line option.
Command line options
SoNeR supports the following command line options:
- -h, --help print this message
- -m
module (suppresses GUI) - -p
properties file override
The module argument specifies one of the three modules that can be processed automatically, and accepts the values: crawler, parser and ranker.
Development
In order to compile the source the release software Maven is required.
The dependencies of the project as well as the build configurations can be seen in pom.xml, but no user intervention is necessary as Maven should automatically download the required packages
In order to compile and package software as an executable .jar with the required libraries packaged in (the same manner as it was released), the Maven install option should be used, which should generate a SoNeR-1.1-jar-with-dependencies.jar file in the target folder of the project.
Code documentation
Java code documentation is also provided.
Authors and Contributors
This software was created by @gajop for the Neurocomputing OSP.
Support or Contact
Any issues with this software should either be reported in the issues section of the GitHub repository or directly emailed at gajopetrovic@gmail.com.