Indonesian (Bahasa Indonesia) enjoys the erronous reputation of
a language in which homonym (and homograph) cognate pairs that
complicate automatic text processing are almost non-existant.
This paper strives to do away with this illusion by inspecting
the main groups of homonym cognate pairs more closely.
Group I: pairs consisting of a fixed expression (phraseology)
or compound word and the corresponding free word group or phrase
(e.g. _tempat tidur_ 'bed' vs. 'place where [smb] sleeps').
Confusion is mainly caused by the preferential spelling of
compounds disjunctly, but is on the other hand minimized by
a tendency to avoid use of the corresponding free phrases.
complications can arise in instances where the first components
are proclitics, some being joined to the next word in the spelling,
some not. In the former case, both compound and phrase are spelled
as one (_sebelah_ 'side/adjacent' vs. 'one side/part').
Analogically, there are instances with an enclitic as second
component (_ialah_ 'be' vs. 'it is he/she [who is...]')
Group II: pairs formed by an attributive verbform (X, X-an,
peN-X, peN-X-an, per-X-an) and a deverbal noun either derived
from the former by conversion, or derived parallelly from the
same verb base (X) with the same affixes. Particular attention
is directed to distinctive syntactical criteria.
Group III etc: other instances of ambiguity in the use of
certain means of derivation or inflection, e.g. ke-X-an forming
an incidental passive verb form or an abstract noun (_kedatangan_
'be [unexpectedly] visited' vs. 'arrival'), full reduplication
(X-X) of a noun base to form the plural or derive another noun
(_kuda-kuda_ 'horses' vs. 'trestle'), etc. A particular subgroup
is formed by homonym pairs historically deriving from metonymy
(_dalam_ 'inside' vs. 'deep') or grammaticalization (_baru_
'new' vs. 'just, recently')
Finally, there are homographs that are not homonyms, but result
from inadequacies of the spelling. Some non-homonymous homographs
are cognate (_berapa_ /b@rapa/ 'how much/many' vs. /b@r?apa/
'have/contain what'), others not (_berevolusi_ /b@revolusi/
'experience a revolution' vs. /b@r?evolusi/ 'undergo evolution').
The various kinds of pairs can be roughly classified into 'systematic',
forming an open class, and 'coincidental', forming a closed or limited
number of instances. Tables are planned for the latter category.