Gerrit Brüning | 26 Jul 2012 17:34
Picon

How to indicate incomplete words (e.g. in false starts)

Dear all,

Is there a way to indicate that a written word is incomplete?
This occurs especially in the case of false starts ("fif", see ch. 
11.3.1.5 Substitutions) and instant revisions ("T", see ch. 11.3.4.7 
Instant Corrections).
We would like to prevent the interpretation of "So[gar]" as a record of 
the word "So" when it comes to creating an index.

Possible solutions:
- To conjecture the missing part of the word (<supplied>) is not always 
possible or desirable.
- <w part="I">So</w> seems incompatible with the definition of  <at> part 
(applies to elements "fragmented by some other structural element").

Do you have any suggestions?

Gerrit

--

-- 
Gerrit Brüning, M.A.
Wissenschaftlicher Mitarbeiter

Historisch-kritische Edition von Goethes Faust | Genetic Edition of Goethe's Faust | faustedition.net

Freies Deutsches Hochstift | Großer Hirschgraben 23-25 | 60311 Frankfurt am Main | Fon +49(0)69-13880-292

Alexey Lavrentev | 27 Jul 2012 15:48
Picon
Picon

Re: How to indicate incomplete words (e.g. in false starts)

Dear Gerrit,

I don't think there is a "canonical" solution to your
problem. There are not so many projects that combine
fine-grain transcriptions of primary sources with
high quality indexing of lexical items.

In many cases the most efficient solution would be
to "filter out" all the stuff inside <del> and <sic>
before lexical indexing and applying NLP tools.

If, for some reason, you want to tokenize the false
starts and instant deletions along with the "main
text", I would suggest one of the following solutions:

a) use  <at> type on <w> to indicate that the word is
incomplete, e.g. <w type="incomplete">So</w>

b) use <gap/> inside <w>, e.g.
<w>So<gap reason="false_start"/></w>

Using  <at> part does not seem to be a good idea as this
attribute is actually intended for linking fragmented
elements

Best regards,

Alexei
Gerrit Brüning <bruening <at> FAUSTEDITION.DE> a écrit :

(Continue reading)

Gerrit Brüning | 27 Jul 2012 16:16
Picon

Re: How to indicate incomplete words (e.g. in false starts)

Dear Alexei,

Thank you for your advice!  <at> type on <w> should suffice our needs.

Best regards,

Gerrit

Am 27.07.2012 15:48, schrieb Alexey Lavrentev:
> Dear Gerrit,
>
> I don't think there is a "canonical" solution to your
> problem. There are not so many projects that combine
> fine-grain transcriptions of primary sources with
> high quality indexing of lexical items.
>
> In many cases the most efficient solution would be
> to "filter out" all the stuff inside <del> and <sic>
> before lexical indexing and applying NLP tools.
>
> If, for some reason, you want to tokenize the false
> starts and instant deletions along with the "main
> text", I would suggest one of the following solutions:
>
> a) use  <at> type on <w> to indicate that the word is
> incomplete, e.g. <w type="incomplete">So</w>
>
> b) use <gap/> inside <w>, e.g.
> <w>So<gap reason="false_start"/></w>
>
(Continue reading)


Gmane