[LUGOS-SLO] Tezaver
Robert Ludvik
robert.ludvik at zd-lj.si
Tue Aug 5 11:29:34 CEST 2003
Zdravo
Bernard Herman, ki je vodja projekta OKO je trenutno na dopustu.
Verjetno bi mu to lahko predstavili in kaksno rekli. Ali komu drugemu?
> Groznje: nezmoznost pridobitve avtorskih pravic za OK distribucijo
> baze;
Morda bi se to lahko resilo tako, da se vsi uporabniki, ki
dodajajo v bazo strinjajo s tem (da se lahko izda kot OK)?
Avtorje obstojecih bi lahko kontaktiral LUGOS in zaprosil za
'besede' (ce bi jih vprasal SDTJ ali MID ali kdo bolj zvenec oz.
vsi skupaj, bi bilo morda se boljse :-). Tako bi se za zacetek
lahko napolnilo bazo.
> bi pa lahko nudil prikaz primerov (konkordance). Pri vsem tem je
> seveda poljubno dela, tudi za dipl-dr.
To bi bilo dobro povedati komu, ki je bolj v tej sceni :-) Kaksen
informatik/jezikoslovec lahko pri OOo najde kup idej. Ena od njih
je tudi "Grammar checking"
One of the goals of the Lingucomponent project is to design,
develop, and implement a Grammar checker for English and other
supported Languages.
Summary
This is a "Wish" project. I do not intend to undertake it unless
significant interest and developers decide to help out. If you
have any interest in helping to design, develop, and implement a
Grammar Checker for the OpenOffice.Org project, please send an
e-mail to dev at lingucomponent.openoffice.org identifying yourself,
your skills, your willingness to lead this project, etc.
In se vec na http://lingucomponent.openoffice.org/
Lp
--
Robert Ludvik
PS
Zadnja razprava na dev at lingucomponent.openoffice.org glede
trenutne omejitve 32000 vnosov v bazi. Kevin je soavtor tezavra za
OOo "(Sander and I just threw some ideas around over a
weekend and I simply coded something up.)", so pa tudi novi
predlogi in ljudje, ki se bodo s tem ukvarjali.
*******************************************************************
The en_US thesaurus only needed under 32000 unique entries but the
binary
format chosen to hold offsets into the table were unsigned shorts
which can
hold up to 64000 entries.
The current en_US thesaurus code was and still is a hack I wrote
to get the a
theasurus in place in time for OOo 1.0. It was never meant to be
be a model
for other languages to use.
If I had to do it all over again I would redesign it with other
languages in
mind, support for affixes, support for multiple meanings, etc.
Unfortunately, I did not (Sander and I just threw some ideas
around over a
weekend and I simply coded something up.
I was kind of hoping that someone else would come along and design
a much
better international thesaurus. Now the French, German, and
Italian, Czech
seem to have thesauri (or one in the works) and those langauges
probably feel
quite constrained by the layout and design I did originally.
The key here is that if someone else would like to propose a
better design and
layout for a thesaurus I would be happy to create a new thesaurus
component
that would interface to that that design. I just haven't had time to
develop this properly and hoped that someone else would pick up
the pieces
and move forward.
I just got back from vacation and I will check that OOo 1.1 rc3
really does
have the changes in place to support 64535 entries so that others
can use it
as well.
Hope this helps,
Kevin
*******************************************************************
The OpenThesaurus website (German thesaurus) indeed has some
features that
are lost in the export for OOo (e.g. multiple meanings). But it's
based on
a simple relational data model, which should be easy to take over
to OOo
if there's a relational data lookup stuff possible in OOo, but a
simple
Berkely DB key/value pair lookup would also work. Currently the data
lookup code for the thesaurus is hand-written AFAIK, which makes
it a bit
complicated.
I will some day add other features to OpenThesaurus, like query
normalization. People can then search for German "ging" and find
the base
form, "gehen" (in English that would be: search for "walked" and
find the
synset for "walk", because "walked" isn't known). Once these
things are
added, OpenThesaurus can be a model for a new OOo thesaurus. It's
easy to
understand, you only need to look at the database structure.
Regards
Daniel
> Projekt izdelave OK slovenskega tezavra
>
> Povezave: Lugos, SDJT, OKO
> Projekti: MID, MSZS, EU
>
> Pristop: Lokalizacija nemskega OpenThesaurus
> http://thesaurus.kdenews.org/
>
> Pravijo:
>
> Gibt es dieses Projekt auch f?r anderen Sprachen?
>
> Es ist geplant, das gleiche auch mit anderen Sprachen zu machen,
> sofern f?r diese ebenfalls noch kein freier Thesaurus zur Verf?gung
> steht und sofern sich Muttersprachler der jeweiligen Sprachen
finden,
> die sich als Administrator intensiv um ihren Bereich k?mmern.
>
> oz, po Googlu:
>
> It is planned to make the same also with other languages if for
this
> likewise still no free thesaurus is available and if to native
speaker
> of the respective languages are, which worry intensively as
> administrator about their range.
>
> Prednosti koncepta: moznost porazdeljenega vnasanje popravkov in
> dodatkov, ze postavljena kvalitetna mrezna platforma (php)
>
> Sibkosti: potrebno napisati inicialni slovar, ki bo zadosti
> atraktiven, da bodo ljudje sploh prisli na obisk.
>
> Priloznosti: avtomatsko polnenje (inicialne) baze z Amebisovin
> tezavrom, SSKJ, drugimi slovarji, ali korpusom.
>
> Groznje: nezmoznost pridobitve avtorskih pravic za OK distribucijo
> baze; in kompleksnost detekcije sopomenk (..pomenk) iz
> eno/dvo-jezicnih slovarjev, kaj sele korpusov.
>
>
> Naloge:
> 1. prenos in lokalizacija nemskega OT
> 2. pridobitev virov, konverzija in polnenje
> 3. pridobitev uporabnikov (najprej mailing liste, potem casopisi)
> (4. izgradnja novih modulov: konkordance, lematizacija, rudarjenje)
>
> Viri / Pridobitev:
>
> 1.Amebisev splosni tezaver - mogoce bi ga proti placilu prepustili
> (del?) v OK; dodatna, grass-roots moznost je, da dajo na
zacetku cel
> svoj tezaver v iskalnik/editor (ne pa v distribucije) - vendar pa
> postane OK vsako geslo, ki ga nek uporabnik (v toku
projekta?) popravi.
>
> 2.Drugi slovarji - to so verjetno terminoloski tezavri in slovarji
> posameznih podrocij (linki na
http://nl.ijs.si/sdjt/sdjt-www.html#lex),
> mogoce Geodetski tezaver, EvroTerm, Slovar Informatike, baza
> Pametnjakovica (Smarta:)??
> Specializiranih tezavrov je na www vec kot bi si mislil:
> http://www.mszs.si/eurydice/term/tez1poj.htm
> Avtorske bi bilo treba razcistit za vsakega posebej.
>
> 3.SSKJ - problematicno je prodobiti podpisan dokument, ki dovoljuje
> uporabo za namene projekta; avt.prav. si delijo ZRC SAZU in
> Avtorji. Alternativna, anarhisticna varianta
> je, da se za privoljenje ne vprasa, pac pa se poslje obvestilo na
> ZRC, da projekt namerava SSKJ uporabiti za gradnjo tezavra.
> Sporno je namrec ce predstavlja izdelava (OK) trezavra iz SSKJ
> sploh krsenje avtorskih pravic - 'kopirani' so lahko izredno
majhni
> deli besedila slovarja ki dostikrat niti niso zvezni (buljiti
.... gledati)
> Pravno gledano, je na strani projekta Fair Use Agreement;
proti pa,
> mogoce, Millenium Act...
> No, cela stvar je, kot recejo, "a can of worms"...
> V okviru SDJT je bilo receno, da se bo za elektronsko bazo
SSKJ vsaj
> dokumentiralo trenutno stanje, no, pa se nihce ni.
>
> 3. Korpusi - na lov za slovenskimi besedili na Web?
Povprasevanje po
> besedah na najdi.si? FIDA - avtorske pravice.
>
>
> Viri / Uporaba:
>
> Cilja sta dva: vkljuciti obstojece vire v OO::XThesaurus oz.
OpenThesaurus;
> google mi na "openoffice thesaurus: vrne
> com::sun::star::linguistic2::XThesaurus
> Description
> allows for the retrieval of possible meanings for a
given word and language.
>
> kar je zelo splosno - ali so ti "pomeni" definicije, sopomenke,
nad-
> in pod- in druge pomenke? Mogoce kar prevodi? Vse to? Verjetno
je to
> precej odvisno kateri vir bi imeli za polnenje in koliko dela bi se
> vlozilo v ekstrakcijo - ne vem kaj ima Amebisov tezaver;
Geodestki bi
> lahko dal nad- in pod-pomenke (sopomenk pa, skorajda po definiciji,
> ne); EvroTerm mogoce sopomenke, ce bi sledili prevodom neke besede;
> SSKJ spet nad- in pod-, ali pa definicije? Korpus, ce bi bil
vkljucen,
> bi pa lahko nudil prikaz primerov (konkordance). Pri vsem tem je
> seveda poljubno dela, tudi za dipl-dr.
>
>
>
> ...
>
> Tomaz Erjavec wrote:
>
>> Zdravo,
>> se strinjam z Alesem, sem pa tudi kar navdusen nad
OpenThesaurusom.
>> Pod http://nl.ijs.si/et/project/ootezaver.txt sem napisal nekaj
>> iztocnic (beri: bljuz), sedaj je samo se treba najti nekoga,
ki mu je
>> slovenscina materin jezik, obenem pa "which worry intensively as
>> administrator about their range"!
>> No, resno, komentarji dobrodosli...
>> lv,
>> Tomaz
>>
>> Ales Kosir writes:
>> > Za resen in uporaben splosni tezaver je potrebno res veliko
dela (to se meri
>> > v cloveskih letih), ce ga zacnemo od zacetka. Zato je treba
razmisljati o
>> > drugih moznostih, da ne bomo zaceli iz nic. > > Lep pozdrav,
>> > Ales
>> >
More information about the lugos-slo
mailing list