public class WikipediaScanner extends Object
| Modifier and Type | Field and Description |
|---|---|
static int |
EOF |
protected int |
fScannerPosition |
protected char[] |
fSource
The corresponding
char[] array for the string source |
protected String |
fStringSource
The
String of the given raw wiki text |
protected IWikiModel |
fWikiModel |
| Constructor and Description |
|---|
WikipediaScanner(String src) |
WikipediaScanner(String src,
int position) |
| Modifier and Type | Method and Description |
|---|---|
static int |
findNestedEnd(char[] sourceArray,
char startCh,
char endChar,
int startPosition)
Read until the end of a nested block i.e.
|
static int |
findNestedEndSingle(char[] sourceArray,
char startCh,
char endChar,
int startPosition)
Read until the end of a nested block i.e.
|
static int[] |
findNestedParamEnd(char[] sourceArray,
int startPosition)
Find the end of a template parameter declaration or the end of a template
declaration.
|
static int |
findNestedTemplateEnd(char[] sourceArray,
int startPosition) |
int |
getPosition() |
int |
indexEndOfComment()
Get the offset position behind the next closing HTML comment tag (-->).
|
int |
indexEndOfNowiki()
Get the offset position behind the next </nowiki> tag.
|
int |
indexEndOfTable()
Get the offset position behind the corresponding wiki table closing tag
(i.e.
|
protected int |
indexOfUntilNoLetter(char testChar,
int fromIndex)
Read the characters until no more letters are found or the given
testChar is found. |
protected WikiTagNode |
makeTag(int start,
int end,
ArrayList<NodeAttribute> attributes)
Create a tag node based on the current cursor and the one provided.
|
int |
nextNewline() |
int |
nextNewlineCell(WPCell cell) |
protected List<NodeAttribute> |
parseAttributes(int start,
int end) |
protected WikiTagNode |
parseTag(int start)
Parse a tag.
|
protected int |
readSpecialWikiTags(int start) |
protected int |
readUntilIgnoreCase(int start,
String startString,
String endString)
Read the characters until the concatenated start and end
substring is found.
|
void |
setModel(IWikiModel wikiModel) |
void |
setPosition(int newPos) |
protected static List<String> |
splitByChar(char splitChar,
char[] srcArray,
int currOffset,
int endOffset,
List<String> resultList,
int maxParts)
Split the given
srcArray character array by the given
character. |
static List<String> |
splitByChar(char splitChar,
String sourceString,
List<String> resultList,
int maxParts)
Split the given src string by pipe symbol (i.e.
|
static List<String> |
splitByPipe(char[] srcArray,
int currOffset,
int endOffset,
List<String> resultList)
Split the given
srcArray character array by pipe symbol (i.e. |
static List<String> |
splitByPipe(String sourceString,
List<String> resultList)
Split the given src string by pipe symbol (i.e.
|
static boolean |
startsWith(String str,
int toffset,
String prefix,
boolean ignoreCase)
Check if a String starts with a specified prefix (optionally case
insensitive).
|
WPList |
wpList() |
WPTable |
wpTable(ITableOfContent tableOfContentTag)
Scan a wikipedia table.
|
public static final int EOF
protected int fScannerPosition
protected IWikiModel fWikiModel
protected final String fStringSource
String of the given raw wiki textprotected final char[] fSource
char[] array for the string sourcepublic WikipediaScanner(String src)
public WikipediaScanner(String src, int position)
public void setModel(IWikiModel wikiModel)
public int getPosition()
public void setPosition(int newPos)
public WPTable wpTable(ITableOfContent tableOfContentTag)
tableOfContentTag - null if no wiki table was foundpublic WPList wpList()
public int nextNewline()
public int nextNewlineCell(WPCell cell)
public int indexEndOfComment()
-1 if no tag could be found.public int indexEndOfNowiki()
-1 if no tag could be found.public int indexEndOfTable()
|}). The scanner detects HTML comment tags,
<nowiki> tags and nested wiki table tags (i.e.
{|... {|... ...|} ...|}).-1 if no corresponding tag could be found.public static boolean startsWith(String str, int toffset, String prefix, boolean ignoreCase)
Check if a String starts with a specified prefix (optionally case insensitive).
str - the String to check, may be nulltoffset - the starting offset of the subregion the String to checkprefix - the prefix to find, may be nullignoreCase - inidicates whether the compare should ignore case (case
insensitive) or not.true if the String starts with the prefix or both
nullString.startsWith(String)public static List<String> splitByPipe(String sourceString, List<String> resultList)
sourceString - resultList - the list which contains the splitted stringspublic static List<String> splitByPipe(char[] srcArray, int currOffset, int endOffset, List<String> resultList)
srcArray character array by pipe symbol (i.e.
"|").srcArray - the array to splitcurrOffset - start position in srcArrayendOffset - end position in srcArrayresultList - the list which contains the splitted stringspublic static List<String> splitByChar(char splitChar, String sourceString, List<String> resultList, int maxParts)
splitChar - the character to split bysourceString - the string to splitresultList - the list which contains the splitted stringsmaxParts - max number of parts to split the source into (less than 0
for infinite number of parts, otherwise only values greater than
0 allowed!)protected static List<String> splitByChar(char splitChar, char[] srcArray, int currOffset, int endOffset, List<String> resultList, int maxParts)
srcArray character array by the given
character.splitChar - the character to split bysrcArray - the array to splitcurrOffset - start position in srcArrayendOffset - end position in srcArrayresultList - the list which contains the splitted stringsmaxParts - max number of parts to split the source into (less than 0
for infinite number of parts, otherwise only values greater than
0 allowed!)public static int findNestedEnd(char[] sourceArray,
char startCh,
char endChar,
int startPosition)
[[...[[ ]]...]]sourceArray - startCh - endChar - startPosition - -1 if not
foundpublic static int findNestedEndSingle(char[] sourceArray,
char startCh,
char endChar,
int startPosition)
{{{...{...{{ }}...}...}}}sourceArray - startCh - endChar - startPosition - -1 if not
foundpublic static int findNestedTemplateEnd(char[] sourceArray,
int startPosition)
{{foo:bar|{|}|{}|baz}}.public static int[] findNestedParamEnd(char[] sourceArray,
int startPosition)
sourceArray - startPosition - array[0] > 0 the scanner
has found the end position of a template parameter declaration. If
array[1] > 0 the scanner has found the end position of
a template declaration.protected WikiTagNode parseTag(int start)
From the HTML 4.01 Specification, W3C Recommendation 24 December 1999 https://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2
3.2.2 Attributes
Elements may have associated properties, called attributes, which may have
values (by default, or set by authors or scripts). Attribute/value pairs
appear before the final ">" of an element's start tag. Any number of
(legal) attribute value pairs, separated by spaces, may appear in an
element's start tag. They may appear in any order.
In this example, the id attribute is set for an H1 element:
In certain cases, authors may specify the value of an attribute without any
quotation marks. The attribute value may only contain letters (a-z and
A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46),
underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend
using quotation marks even when it is possible to eliminate them.
Attribute names are always case-insensitive.
Attribute values are generally case-insensitive. The definition of each
attribute in the reference manual indicates whether its value is
case-insensitive.
All the attributes defined by this specification are listed in the
attribute index.
<H1 id="section1">
This is
an identified heading thanks to the id attribute
</H1>
By default, SGML
requires that all attribute values be delimited using either double
quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal
39). Single quote marks can be included within the attribute value when the
value is delimited by double quote marks, and vice versa. Authors may also
use numeric character references to represent double quotes (") and
single quotes ('). For doublequotes authors can also use the
character entity reference ".
This method uses a state machine with the following states:
The starting point for the various components is stored in an array of
integers that match the initiation point for the states one-for-one, i.e.
bookmarks[0] is where state 0 began, bookmarks[1] is where state 1 began,
etc. Attributes are stored in a Vector having one slot for
each whitespace or attribute/value pair. The first slot is for attribute
name (kind of like a standalone attribute).
start - The position at which to start scanning.ParserException - If a problem occurs reading from the source.protected List<NodeAttribute> parseAttributes(int start, int end)
protected WikiTagNode makeTag(int start, int end, ArrayList<NodeAttribute> attributes)
start - The starting point of the node.end - The ending point of the node.attributes - The attributes parsed from the tag.ParserException - If the nodefactory creation of the tag node fails.protected int readSpecialWikiTags(int start)
protected final int readUntilIgnoreCase(int start,
String startString,
String endString)
startString - the start string which should be searched in exact case modeendString - the end string which should be searched in ignore case modeprotected int indexOfUntilNoLetter(char testChar,
int fromIndex)
testChar is found. If testChar was found, return
the offset position.testCh - the test characterfromIndex - read from this offset-1 if the character could not be found or no more
letter character were found.Copyright © 2017 Java Wikipedia API (Bliki engine). All rights reserved.