1. What the software does
2. What researchers do
How to use the site (visitors)
1. Browse & search texts
2. Browse & search words
3. Download words to spreadsheet (video)
4. Request researcher access to the site
Tutorial for Contributing Researchers
1. Log in to the site
2. View the 1,000 most frequent words in a language
3. Add a new text (video)
4. Add a new language
5. Update individual word entries (definition, part of speech) (video)
6. Remove words (e.g., proper nouns) from display in frequency results (video)
7. Categorize words into semantic domains
8. Dealing with nonstandard spellings
9. Develop the readability algorithm for your language (video)
10. Backup data on the website (admins only)
What the software does
This website is a collection of texts and words in 15 Filipino languages, with corresponding linguistic information (definition, part of speech, frequency, readability). This data can be used as a multilanguage dictionary using the "search" function, a text leveler for classroom reading materials, and a linguistic tool for understanding the structure of these languages.
Information is added to the site in a standard pattern, shown in this flowchart of how the software works. Briefly, the workflow is as follows:
(1) A researcher uploads a text
(2) Text is stored in the site for historical archiving
(3) The software splits the text into individual words and generates metadata about the text (readability) & words (frequency)
(4) Researchers add further information about the words (definition, part of speech)
What Researchers do
This site does all the mathematical calculations for generating text readability, word frequency, and sorting. Researchers provide information that a computer cannot, like definitions for words and parts of speech. They can also assign new words to a semantic domain and provide their English equivalents, so that words in different languages can be associated with each other as a multilingual dictionary. Researchers are also responsible for checking, before adding a text, that it has not already been uploaded by another researcher. They also curate the database to mark proper nouns, words as borrowed from English, identify alternate/nonstandard spellings, and nonsense words in the official frequency list.
Browse & search texts
Visitors may browse listings of all texts archived on the site on the Texts page, but may not view the texts due to copyright limitations. Results may be filtered by genre and language. Researchers can view and edit the texts by clicking on the titles listed on the same page. Researchers will also see data about each text, specifically, number of sentences, words, words per sentence, and calculated grade level. Generally, changes should not be made to the texts themselves once uploaded, assuming the texts are authentic. However, updating or deleting the texts will subsequently update the corpus data about word frequency to maintain accuracy.
Browse & search words
Search for words in any language using the search bar at the top right of most pages. Since some Filipino languages share the same word, the search results indicate in parentheses what language the word comes from. Alternatively, search a single language using the language dropdown bar shown at right. An exact match for the search term will provide the words part of speech, meaning, sample sentence, and language equivalents (if any), along with related words. If no exact matches are found, the search offers similar words in the database. Authorized researchers may update the word entry from the search results using the "Edit" link that appears.
Visitors and researchers can also browse all words in each language, sorted by frequency, from the Words page. By default, the listings filter out words that have been marked as proper nouns or nonsense words, and English language borrowings. To show these results, check the appropriate box on the page.
Download words to spreadsheet
On the Words page, you may download page results by clicking the "Export all languages to Spreadsheet" button shown at right. Note: to download words in a single language, first select the language from the dropdown menu. The button will then update to read "Export [selected language] words to spreadsheet". The file will be downloaded to your computer.
Request access to the site
Researchers to the site are granted access by administrators if they are part of the 3NS Corpora Project. Researchers will be given a login name (typically an email address) and password. Administrators explicitly grant access to certain languages. Researchers: if you are able to login but are not able to access texts in your assigned language(s), contact the administrator to grant you permission.
Log in to the site
1) Enter the username (usually your email address) and password given to you by site administrators
2) Click "Log in"
3) Upon logging in, a new item will appear in the navigation menu called "Edit Website" which will allow to perform research tasks.
If you are unable to log in, contact the administrators to confirm your credentials.
If you are able to log in but are not able to access texts in your assigned language(s), contact the administrators to grant you permission.
Add a new text
Researchers who have been authorized to work in a given language may add new texts to the site. To do this:
1) Go to "Edit Website" > "Add/Edit texts"
2) Browse the listings to make sure that the text you plan to add has not already been added. You may have to browse the actual text, since another researcher might have named it differently.
3) If the text has not been added by another researcher, click "Add a new text"
4) Select the language of the text in the dropdown menu. If you do not see your language in the dropdown menu, contact the administrator to grant you permission
5) Select the text's genre. If the genre does not appear in the list, you may add it by going to "Edit Website" > "Add/Edit genres"
6) Indicate the title, author, and year of publication
7) Paste the actual text into the body field of the form (do not put the title in this area)
8) Click "Add"
Upon adding the text, the website will calculate the text's word count, sentence count, words per sentence, readability, add any new words in the text to the language word list, and update the words' frequency.
Add a new language
Researchers with appropriate authorization may add new languages to the site. To do this:
1) Go to "Edit Website" > "Add/Edit languages"
2) To add a new language, click "Add a new language"
3) To modify an existing language, click on its name in the list
4) Languages may be deleted from the site, but do this with caution. If you delete a language, any texts or words already in the database will remain, but they will be listed as "Uncategorized".
Update individual word entries
When a text is uploaded, the software automatically adds new words, updates word frequency, and generates sample sentences. Researchers with appropriate authorization may add information individual words. To do this:
1) Depending on which is easier, search for the word or browse to it on the Words page
2) From the search results, click "Edit [word]". From the Words page, click the word itself, underlined as a link.
3) The Word itself and the Language should not be changed. These are generated when texts are uploaded. If the word is misspelled, use the "Standard Spelling" field in the form to indicate the authoritative spelling.
4) Add or update the Definition. It may include more than one definition.
5) The Sample Sentence is generated by the website when a word is first recorded in the language, but may be updated by researchers with a more representative sample.
6) Indicate the Primary Part of Speech and Secondary Part of Speech using the dropdown lists.
7) Provide an English Equivalent to associate words in other languages (a key value). Words that have the same English equivalent will show up in search results. For example, searching for Iring in Subuanong Binisaya will show "misay" (Winaray), "puso" (Tagalog), and "miyong" (Inabaknon) because each of these entries has "cat" as its English equivalent.
8) Use the Semantic Domain to further associate words into categories. This is not required for the initial project but may be used later on to create a multilanguage semantic dictionary.
9) Check "Do not display this word in page results" if the word is a proper noun, nonsense word, or should not otherwise be shown in the search results
10) Check "This is an English loan word" if it is actually English but was embedded in a non-English language text. Again, the Language of the word should not be changed to "English". Instead, using this checkbox helps track which English loan words are used in which Filipino languages.
11) Click "Update"
Remove words (e.g., proper nouns) from display in frequency results
Words should never be deleted from the corpus. Instead they should be tagged to indicate nonstandard usage. To do this:
1) Find the word by following the steps under update individual word entries (definition, part of speech)
2) Check "Do not display this word in page results" if the word is a proper noun, nonsense word, or should not otherwise be shown in the search results
3) Click "Update"
Dealing with nonstandard spellings
Many texts uploaded will contain spelling variations for words. Nonstandard spellings should not be removed from the corpus. Instead, they should be tagged to indicate nonstandard usage. To do this:
1) Find the nonstandard word by following the steps under update individual word entries (definition, part of speech)
2) Type the standard spelling in the text field indicated "Standard Spelling"
3) Click "Update"
Working with the semantic domains
Placing words into semantic domains (e.g., Plants, Animals, Household, Education) is a feature included from waraylanguage.org, a language corpus developed for the Winaray language. It is a way of grouping words other than alphabetical order, and is a key tool in language learning materials.
For the 3NS Corpora Project, the semantic domain feature will not initially be actively maintained, but words may be tagged into semantic domains and then viewed on the Semantic Lists page.
To add a new semantic domain, go to "Edit Website" > Add/edit semantic domains (permission granted individually)
To tag a word into a semantic domain, follow the instructions for updating individual word entries (definition, part of speech)
Develop the readability algorithm for your language
One key feature of the software is to analyze a text and determine its reading level based on grade. The calculation is done by the software using this basic formula:
Grade level = (sentences_constant*words_per_sentence) + (words_constant*(100-percent_frequent_words)) + 0.839
However, since individual languages are slightly different, researchers may work to improve the formula. To do this:
1) Go to "Edit Website" > Add/Edit languages
2) Select the language whose readability algorithm you want to work with.
3) Adjust the "Words constant" by typing in a different value in the field provided. A higher fraction means that the number of difficult words in a text will have a larger impact on determining the grade level. You may reset this to the default value indicated in the form.
4) Adjust the "Sentences constant" by typing in a different value in the field provided. A higher fraction means that the length of sentences in a text will have a larger impact on determining the grade level. You may reset this to the default value indicated in the form.
5) Click "Update"
All texts for the language selected will be recalculated using the adjusted formula.
Quickly view the 1,000 most frequent words in each language
1) Go to "Edit Website" > Add/Edit languages
2) Select the language whose 1,000 frequent words you want to view by clicking on its underlined name (You may only view languages you have been granted permission to work with)
3) At the bottom of the language page, you will see a list of the 1,000 most frequent words, listed in descending order of frequency.
Backup data on the website (admins only)
Due to the fact that this is a collaborative project involving many researchers, it is a good idea to save periodical snapshots of the website to prevent data loss or to restore data if entry mistakes were made. To do this:
1) Click /backup
2) An authentication window will drop down. Enter the credentials established by the site
3) To save the current version of the website, click "Save"
4) To restore a previous version, choose "Restore to the version saved on ...", type "YES" to confirm you really want to do this, and click "Restore"