AntiPlagiarism.NET program is designed to search for non-unique fragments of a checkable text in the Internet, detect its uniqueness, highlight the found non-unique text fragments in both the original document and the web-page on which the matches were found and precisely detect the percentage of the matches found in every single checkable web-document.
It contains all the necessary commands for the application setting and functioning.
The most frequent commands of the menu are just duplicated here.
Text editor consists of a window into which a checkable text is pasted and where it is edited.
There are several sources of a checkable text:
Text from the clipboard: the copied text is pasted into the text editor window by using the command Edit / Paste or with the help of
hot key combination.
- Text or a Word document file (*.txt, *.doc, *.docx): the text of the file is pasted into the editor window with the help of File / Open file command.
Web-document from the Internet: a page URL is pasted into the field 'Address' and
or 'Download the html-code of the page' (right button) is selected to download the initial document code into the text editor window. Meanwhile after the download is completed the text editor is switched to `Page` mode automatically.
In the text editor 2 modes are presented:
- 'Page' - the text is detected as html-document text being displayed as in a browser. In this mode editing is not possible.
- 'Text editor' - text editing is allowed.
In addition, the editor is able to display the so called canonicalized text in the right side of the text editor in a separate window. A canonicalized text is an original text with all the odd whitespaces, punctuation marks, html- markup and different irrelevant words deleted. For two documents comparison these are their canonicalized variants that are used. These default variants are not displayed, but it is possible to enable display function through the menu (View / Show canonicalized text) when needed.
After the check is completed the found fragments of the original text are highlighted with different colours (the correspondence between a particular matches colour and the original is shown in Log). If any text fragment is found in several sources, this fragment is highlighted in yellow.
Page represents the integrated browser window.
During the check progress the messages like "Found XXX % matches at YYY" are to be logged.
By clicking on the link YYY the document located at a given address will be displayed on the tab Page, the coinciding fragments of an original checkable text being highlighted with the yellow marker.
In this section the results of the checks are logged in real-time mode. It also contains the diagnostic messages and error messages.
A default detailed logging is disabled , but it is possible to enable it when needed in the settings of the program (Tools / Preferences / Report / Show detailed information on check progress in Log).
The program functioning principles:
- Word samples from a checkable text are created to be used as the search query text (the sample size in words and number of samples are specified in 'Basic parameters').
- The queries are built (search engines are used one by one, they are specified in 'Basic parameters').
- The search engines responses are analyzed and links to the web documents are taken from them. Only the most relevant links according to the search engines, i.e. only head queries (the number of links for a sample taken from every search engine response is specified in 'Basic parameters') are selected.
- Pages are downloaded by these references.
- The percentage of the checkable text matches with the fragments of restored copies is determined with a shingle method (the number of words in a shingle is specified in 'Plagiarism Detection / Words in a shingle'.
- The total percentage of matches found on the downloaded pages is detected.
'Change general settings' contains a number of set parameters from Basic parameters such as Default, Express, Deep. Saved contains recent set of parameters from Basic parameters saved by the user.
During Check for plagiarism operation the set of parameters from Saved is used as the settings from Basic parameters.
Options Plagiarism detection / Plagiarism limit (%) specify the maximum allowable percentage of the matches between the original checkable text and every single page downloaded during the check operation. If this limit is exceeded the search operation stops automatically. The default plagiarism limit is 50 %.
In case there is no direct connection to the Internet this tab allows to configure proxy settings.
Note: take into account that in this case the proxy list can not be used as the auto search protection tool (Preferences / Other / Auto search protection / Use http-proxy list)
Autosave specifies the number of recent reports to be saved automatically in the folder `Autosave` in the program root after every search operation. The reports are entitled according to the time of a given operation completion. Fast access to them is provided by the menu item File / Autosaves.
History specifies the number of recent search operations to be saved. It affects the availability of any check operation results in Log.
Log allows a detailed logging of the check progress. This default option is disabled.
Use the alternative downloading mode allows to use another inner pages download mechanism during the check.
Note: Use this mode if there are any problems with downloading pages using the program (for example, in the event of continuous error "Unable to load a page ..."). The same mode is used while forming the list of pages on the site during 'Website check'.
It specifies a set of URL-addresses / domains to be ignored while checking the text for plagiarism.
Ignore pages within the same domain as checking web site or html-page is useful in case the testable document text is a web-document from the Internet. For example, the text for check was pasted into the Text Editor through the field Address (it has not been edited in the text editor) or in case the page check is carried out by 'Website check'. Parameter 'Domain level' determines the top-level domain in the checkable page / site address that is to be ignored with all its subdomains during the check operation. The default domain level is 2. That means while checking, for instance, the site http://www.site.com pages like http://site.com/page.html, http://www.site.com/page.html, http://sub3.site.com/page.html, http://sub4.sub3.site.com/page.html, http://sub5.sub4.sub3.site.com/page.html are to be ignored.
Ignore addresses from a file allows to specify certain addresses to be ignored during the check operation. Common text file (txt.) is to contain a set of addresses, each address on a new line.
There is an example of such a file given below:
Ignore domains from a file allows to specify the whole domains to be ignored during the check. Common text file (txt.) is to contain a set of domains, each domain on a new line.
An example of such a file content looks like this:
Update allows to check for new versions of the program at its startup. In case there is a new version available you will be offered to update the program. After update the previous version user settings are preserved.
- Minimal interval between related queries to a search engine specifies time interval (seconds) and excludes the possibility to build a query to the same search engine during a short time interval
Download timeout for a page specifies the maximum time interval (seconds) given for every single web-document download. If during this interval a page fails to be downloaded it is excluded from the list (in a detailed logging mode the messages like `download timeout exceeded` are displayed in Log)
Note: In case of bad Internet connection this parameter value is probably to be increased.
Maximum number of pages to be simultaneously downloaded
Note: In case of bad Internet connection this parameter value is probably to be reduced.
Auto search protection
Its usage makes sense only if neither antigate.com nor proxy list is used. If so, when a search engine asks for captcha, the operation of the text uniqueness check is suspended to allow the user to enter captcha. If `Show captcha` is disabled, there will be no program ask to enter it and attempt to rebuild the same query to the same search engine (in this case the check quality is possibly to get worse, so instead of this it is better to disable irrelevant search engines in the list).
Such a usage makes sense only in case proxy list is not used. If so, when a search engine asks for captcha, the operation of the text uniqueness check is suspended and the program sends this captcha to the recognition service antigate.com. After the recognition is completed the program continues the operation.
Note: in the detailed logging both the captcha itself (a picture) and the result of its recognition by antigate.com are fixed in Log.
Note: Key parameter can be found in your personal account at http://antigate.com. Of course it is necessary to log in there and add money to your balance.
Use http-proxy list
It allows to configure proxy list through which the queries to search engines are to be built. Number of page downloading attempts parameter allows to specify maximum number of the same queries built to the same search engine through different proxies. For instance, in case the first attempt failed as timeout was exceeded or because of auto search protection, the second attempt can be carried out trough another proxy, if this attempt fails the third proxy is to be used.
Note: the number of page downloading attempts is fixed in Log in brackets.
Supported proxy types are: http, socks4(a), socks5.
Proxy list is specified by a common text file (.txt) containing a set of addresses and ports, each address on a separate line. In case of private proxies it is possible to indicate login and password.
The file line format is as follows:
As an example of such a file content can be used the following:
In the program besides a common check there are two more options: a batch check, that is the check of the documents from a specified directory and a website check, that is a website or a separate page the addresses of which are taken from a specified text file.
This is the check of the documents from a specified directory when all the text and Word documents (*.txt, *.doc, *.docx) are taken but their number is limited by the set 'Maximum number of documents'. Default code is specified automatically, but its manual setting from the pop-up list containing codes is also possible.
The check operation is divided into several steps:
For a site check it is necessary first to download its pages either by setting its address in the field 'Enter address' and selecting the push button 'Download' or pushing the button 'Load from file' and selecting a text file with addresses list (every address on a separate line).
Note: before downloading it is possible to specify a download filter allowing to eliminate unsuitable web-documents.
- In the window containing downloaded pages there is a column Selection that allows to exclude any page from the checklist at this step.
- The check operation startup itself
Note: Show filter 'Prohibited' displays the pages that has not been downloaded because of download filter or as their addresses were located in the robot.txt file (that is normally located in the root of a site).