[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pspp-commits] [SCM] GNU PSPP branch, master, updated. v0.6.1-1932-g9ade
From: |
Ben Pfaff |
Subject: |
[Pspp-commits] [SCM] GNU PSPP branch, master, updated. v0.6.1-1932-g9ade26c |
Date: |
Sun, 20 Mar 2011 16:56:36 +0000 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU PSPP".
The branch, master has been updated
via 9ade26c8349b4434008c46cf09bc7473ec743972 (commit)
via afdf3096926b561f4e6511c10fcf73fc6796b9d2 (commit)
via 75a467ed2d32e1adb0c24cf89676cfb48845be98 (commit)
via d3e294c031bb767336435d2f0048994103fcd47a (commit)
via f3668539947d5baed813a4f8436d6cf36abeedd2 (commit)
via c69c407c02121e63bdadf6efe55e4211abd03ad2 (commit)
via 1b3322acf30d531cefe3cdbf7287ec8cde601bcd (commit)
via 9d1d71e732eeed85ca3002b264e1269cdd005a3f (commit)
via f5099c58d17e8f66a74a84918e688ef17936d392 (commit)
via 6d89701ab597b810da249ff0e4e42423e869df66 (commit)
via 9bbbfbc94aead4518e17eb6304451f6ad2ca2db2 (commit)
via 530906aaa19f6c209ca008c8187f7f750a0b1283 (commit)
via 086322fd8c85a303ba6f552950d6f057f2867add (commit)
via 687c1acdbeecd7d0d7fdc4143d444e8b1563b532 (commit)
via 417bac514fb3de900cb12689d8668d4d30a82e3f (commit)
via d8fdf0b4fa919e48397b438e9453d6b82215ff51 (commit)
via ca0a72e321421d02a1fd6df943425eff4bd1a257 (commit)
via 510366c9d99de028f0322e3df01bc813ec60099b (commit)
from c831ad10d7e9d494e5e22ab30306057e81bc52cd (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit 9ade26c8349b4434008c46cf09bc7473ec743972
Author: Ben Pfaff <address@hidden>
Date: Sat Mar 19 17:05:47 2011 -0700
lexer: Reimplement for better testability and internationalization.
This commit reimplements PSPP lexical analysis from the ground up.
From a PSPP user's perspective, this should make PSPP more reliable
and make it easier to work with syntax files in non-ASCII encodings.
See the changes to NEWS for more details.
From a developer's perspective, the most visible change may be that
strings within tokens are now always encoded in UTF-8, regardless of
the syntax file's encoding. Many of the changes in this commit are
due to this, especially those to functions that check for valid
identifiers: an identifier in UTF-8 is not necessarily the same length
when encoded in the dictionary's encoding, but limits on identifier
length must be enforced in the dictionary's encoding (otherwise it
might not be possible to write out a valid system file, since the
identifier might not fit in the fixed length fields in such files).
Another important change is that, whereas before some special syntax
had to be handled by the parser providing feedback to the lexer, now
increasing the sophistication of the lexer has enabled all PSPP syntax
to be analyzed into tokens. This permitted some other improvements:
- An arbitrary number of tokens of lookahead, up to the end of the
current command, is now supported using lex_next_token() and
related functions.
- Before, some command implementations had a special attribute that
meant that the top-level PSPP command parser would not consume the
final token of the command name (because that token was not
followed by tokenizable syntax). This is no longer necessary and
has been removed.
- Before, each command implementation was responsible for ensuring
that valid command syntax was not followed by trailing garbage,
often by calling lex_end_of_command() as the last step of parsing.
This is no longer necessary; the main command parser will ensure
this for itself.
commit afdf3096926b561f4e6511c10fcf73fc6796b9d2
Author: Ben Pfaff <address@hidden>
Date: Sat Mar 19 16:32:16 2011 -0700
scan: New library for high-level PSPP syntax lexical analysis.
This library converts a stream of segments output by the "segment"
library into PSPP tokens.
commit 75a467ed2d32e1adb0c24cf89676cfb48845be98
Author: Ben Pfaff <address@hidden>
Date: Sat Mar 19 16:30:55 2011 -0700
segment: New library for low-level phase of lexical syntax analysis.
This library provides for a low-level part of lexical analysis for
PSPP syntax, which I call "segmentation". Segmentation accepts a
stream of UTF-8 bytes as input. It outputs a label (a segment type)
for each byte or contiguous sequence of bytes in the input.
The following commit will implement the high-level phase of lexical
analysis, called "scanning", that converts a sequence of segments into
PSPP tokens.
commit d3e294c031bb767336435d2f0048994103fcd47a
Author: Ben Pfaff <address@hidden>
Date: Sat Mar 19 16:34:53 2011 -0700
u8-istream: New library for reading a text file and recoding to UTF-8.
This new library will be used in an upcoming commit.
commit f3668539947d5baed813a4f8436d6cf36abeedd2
Author: Ben Pfaff <address@hidden>
Date: Sun Mar 20 09:43:42 2011 -0700
encoding-guesser: New library to guess the encoding of a text file.
This will be used by other new libraries in upcoming commits.
commit c69c407c02121e63bdadf6efe55e4211abd03ad2
Author: Ben Pfaff <address@hidden>
Date: Sat Mar 19 16:20:44 2011 -0700
i18n: New functions and data structure for obtaining encoding info.
For now these functions don't do any caching, but it might sense to
add caching later if they are called frequently.
commit 1b3322acf30d531cefe3cdbf7287ec8cde601bcd
Author: Ben Pfaff <address@hidden>
Date: Sat Mar 19 14:40:11 2011 -0700
identifier: Rename token_type_to_string() and make a new version.
commit 9d1d71e732eeed85ca3002b264e1269cdd005a3f
Author: Ben Pfaff <address@hidden>
Date: Sun Feb 13 10:43:57 2011 -0800
i18n: New functions for truncating strings in an arbitrary encoding.
commit f5099c58d17e8f66a74a84918e688ef17936d392
Author: Ben Pfaff <address@hidden>
Date: Sat Feb 12 16:37:10 2011 -0800
i18n: New function recode_string_len().
commit 6d89701ab597b810da249ff0e4e42423e869df66
Author: Ben Pfaff <address@hidden>
Date: Sat Dec 11 20:58:32 2010 -0800
i18n: New function uc_name().
commit 9bbbfbc94aead4518e17eb6304451f6ad2ca2db2
Author: Ben Pfaff <address@hidden>
Date: Mon Dec 6 20:50:04 2010 -0800
hash-functions: New function hash_case_bytes().
This is useful for hashing an arbitrary byte sequence case-insensitively.
Obviously most uses would be better off working with Unicode but we aren't
there yet.
commit 530906aaa19f6c209ca008c8187f7f750a0b1283
Author: Ben Pfaff <address@hidden>
Date: Wed Mar 9 22:21:11 2011 -0800
str: New functions for checking for and removing string suffixes.
commit 086322fd8c85a303ba6f552950d6f057f2867add
Author: Ben Pfaff <address@hidden>
Date: Wed Mar 9 22:10:48 2011 -0800
str: Rename ss_chomp() to ss_chomp_byte(), ds_chomp() to ds_chomp_byte().
This paves the way for new functions that chomp an entire substring.
commit 687c1acdbeecd7d0d7fdc4143d444e8b1563b532
Author: Ben Pfaff <address@hidden>
Date: Mon Dec 6 20:46:56 2010 -0800
str: New function ss_realloc().
commit 417bac514fb3de900cb12689d8668d4d30a82e3f
Author: Ben Pfaff <address@hidden>
Date: Mon Dec 6 20:54:40 2010 -0800
output: New function text_item_create_nocopy().
commit d8fdf0b4fa919e48397b438e9453d6b82215ff51
Author: Ben Pfaff <address@hidden>
Date: Sat Feb 5 21:10:10 2011 -0800
sys-file-reader: Refactor to clean up character encoding support.
The system file format is unusual in that it does not record the encoding
used by character strings at the beginning or at any fixed place in the
file. Instead, it can be recorded practically anywhere in the file. It
never precedes all of the actual character strings in the file, which makes
it impossible to interpret those strings completely and correctly until it
is encountered.
Until now, the system file reader has dealt with this situation by
stuffing uninterpreted character strings into data structures until the
encoding is known, then at that point fetching out the character strings,
reencoding them, and stuffing them back into the data structures. This
does work, but it has the disadvantage that all of the PSPP data
structures have to tolerate character strings with unknown encoding. In
some cases this seems like an ugly situation. For example, arbitrary
variable names have to be supported, even though the syntax for variable
names is circumscribed by the language, because the syntax rules for
variable names cannot be completely and correctly applied to a string that
is in an unknown encoding.
This commit fixes that problem by adopting a new way to read system files.
Each record in the system file dictionary is essentially slurped into
memory as a chunk, then the character encoding is extracted from it, then
the rest of the dictionary is interpreted based on that encoding. The
actual implementation is a little more intricate because the format of
system file records is somewhat non-uniform.
commit ca0a72e321421d02a1fd6df943425eff4bd1a257
Author: Ben Pfaff <address@hidden>
Date: Wed Mar 16 21:33:54 2011 -0700
file-name: Do not make output files line-buffered in fn_open().
I don't see any reason to do this. I can't see anything in the commit
log for this file or in OChangeLog that explains why it was done.
commit 510366c9d99de028f0322e3df01bc813ec60099b
Author: Ben Pfaff <address@hidden>
Date: Mon Mar 14 18:19:23 2011 -0700
data-reader: Remove unreachable "return" statements.
-----------------------------------------------------------------------
Summary of changes:
NEWS | 46 +-
Smake | 8 +-
doc/dev/concepts.texi | 25 +-
doc/flow-control.texi | 33 +-
doc/invoking.texi | 20 +-
doc/language.texi | 96 +-
doc/utilities.texi | 69 +-
perl-module/PSPP.xs | 14 +-
perl-module/t/Pspp.t | 4 +-
src/data/automake.mk | 1 +
src/data/dictionary.c | 174 +-
src/data/dictionary.h | 14 +-
src/data/file-handle-def.c | 8 +-
src/data/file-name.c | 11 +-
src/data/gnumeric-reader.h | 8 +-
src/data/identifier.c | 124 +-
src/data/identifier.h | 12 +-
src/data/identifier2.c | 133 ++
src/data/mrset.c | 33 +-
src/data/mrset.h | 7 +-
src/data/por-file-reader.c | 25 +-
src/data/por-file-writer.c | 5 +-
src/data/procedure.c | 19 +
src/data/procedure.h | 4 +-
src/data/sys-file-reader.c | 2084 +++++++++++---------
src/data/sys-file-reader.h | 2 +-
src/data/sys-file-writer.c | 20 +-
src/data/variable.c | 139 +-
src/data/variable.h | 7 +-
src/data/vector.c | 15 +-
src/data/vector.h | 4 +-
src/language/automake.mk | 6 -
src/language/command.c | 158 +-
src/language/command.def | 11 +-
src/language/control/automake.mk | 3 +-
src/language/control/do-if.c | 12 +-
src/language/control/loop.c | 4 +-
src/language/control/repeat.c | 714 +++-----
src/language/control/temporary.c | 6 +-
src/language/data-io/combine-files.c | 21 +-
src/language/data-io/data-list.c | 5 +-
src/language/data-io/data-parser.c | 9 +-
src/language/data-io/data-reader.c | 58 +-
src/language/data-io/file-handle.q | 29 +-
src/language/data-io/get-data.c | 8 +-
src/language/data-io/inpt-pgm.c | 30 +-
src/language/data-io/save-translate.c | 2 +
src/language/data-io/trim.c | 5 +-
src/language/dictionary/apply-dictionary.c | 11 +-
src/language/dictionary/attributes.c | 69 +-
src/language/dictionary/missing-values.c | 44 +-
src/language/dictionary/modify-variables.c | 4 +-
src/language/dictionary/mrsets.c | 27 +-
src/language/dictionary/numeric.c | 12 +-
src/language/dictionary/rename-variables.c | 3 +-
src/language/dictionary/split-file.c | 4 +-
src/language/dictionary/sys-file-info.c | 18 +-
src/language/dictionary/value-labels.c | 22 +-
src/language/dictionary/variable-label.c | 17 +-
src/language/dictionary/vector.c | 13 +-
src/language/dictionary/weight.c | 4 +-
src/language/expressions/parse.c | 34 +-
src/language/expressions/private.h | 5 +-
src/language/lexer/automake.mk | 8 +
src/language/lexer/include-path.c | 89 +
.../temp-file.h => language/lexer/include-path.h} | 16 +-
src/language/lexer/lexer.c | 2143 +++++++++++---------
src/language/lexer/lexer.h | 158 ++-
src/language/lexer/q2c.c | 6 +-
src/language/lexer/scan.c | 596 ++++++
src/language/lexer/scan.h | 93 +
src/language/lexer/segment.c | 1631 +++++++++++++++
src/language/lexer/segment.h | 122 ++
src/language/lexer/token.c | 173 ++
src/language/{stats/friedman.h => lexer/token.h} | 40 +-
src/language/lexer/value-parser.c | 2 +-
src/language/lexer/variable-parser.c | 17 +-
src/language/lexer/variable-parser.h | 10 +-
src/language/prompt.c | 75 -
src/language/stats/aggregate.c | 23 +-
src/language/stats/autorecode.c | 3 +-
src/language/stats/descriptives.c | 23 +-
src/language/stats/flip.c | 8 +-
src/language/stats/frequencies.q | 25 +-
src/language/stats/npar.c | 77 +-
src/language/stats/rank.q | 12 +-
src/language/stats/sort-cases.c | 2 +-
src/language/syntax-file.c | 144 --
src/language/syntax-file.h | 25 -
src/language/syntax-string-source.c | 151 --
src/language/syntax-string-source.h | 33 -
src/language/tests/format-guesser-test.c | 2 +-
src/language/tests/moments-test.c | 2 +-
src/language/tests/paper-size.c | 2 +-
src/language/utilities/cache.c | 4 +-
src/language/utilities/cd.c | 8 +-
src/language/utilities/date.c | 4 +-
src/language/utilities/host.c | 14 +-
src/language/utilities/include.c | 198 +-
src/language/utilities/permissions.c | 13 +-
src/language/utilities/set.q | 13 +-
src/language/utilities/title.c | 92 +-
src/language/xforms/compute.c | 6 +-
src/language/xforms/count.c | 26 +-
src/language/xforms/fail.c | 8 +-
src/language/xforms/recode.c | 45 +-
src/language/xforms/sample.c | 4 +-
src/language/xforms/select-if.c | 2 +-
src/libpspp/automake.mk | 10 +-
src/libpspp/encoding-guesser.c | 289 +++
src/libpspp/encoding-guesser.h | 126 ++
src/libpspp/getl.c | 271 ---
src/libpspp/getl.h | 113 -
src/libpspp/hash-functions.c | 18 +-
src/libpspp/hash-functions.h | 1 +
src/libpspp/i18n.c | 345 ++++-
src/libpspp/i18n.h | 87 +-
src/libpspp/message.c | 118 +-
src/libpspp/message.h | 23 +-
src/libpspp/msg-locator.c | 87 -
src/libpspp/msg-locator.h | 34 -
.../utilities/cache.c => libpspp/prompt.c} | 32 +-
src/{language => libpspp}/prompt.h | 17 +-
src/libpspp/str.c | 54 +-
src/libpspp/str.h | 11 +-
src/libpspp/u8-istream.c | 475 +++++
src/libpspp/u8-istream.h | 45 +
src/output/driver.c | 56 +-
src/output/text-item.c | 12 +-
src/output/text-item.h | 3 +-
src/ui/gui/automake.mk | 2 -
src/ui/gui/comments-dialog.c | 13 +-
src/ui/gui/executor.c | 19 +-
src/ui/gui/executor.h | 4 +-
src/ui/gui/main.c | 39 +-
src/ui/gui/psppire-data-window.c | 16 +-
src/ui/gui/psppire-dict.c | 9 +-
src/ui/gui/psppire-syntax-window.c | 16 +-
src/ui/gui/psppire-syntax-window.h | 1 -
src/ui/gui/psppire-var-store.c | 7 +-
src/ui/gui/psppire.c | 34 +-
src/ui/gui/psppire.h | 6 +-
src/ui/gui/syntax-editor-source.c | 130 --
src/ui/gui/syntax-editor-source.h | 34 -
src/ui/gui/text-data-import-dialog.c | 4 +-
src/ui/source-init-opts.c | 20 +-
src/ui/source-init-opts.h | 4 +-
src/ui/terminal/automake.mk | 13 +-
src/ui/terminal/main.c | 113 +-
src/ui/terminal/msg-ui.c | 41 -
src/ui/terminal/msg-ui.h | 29 -
src/ui/terminal/read-line.h | 31 -
src/ui/terminal/terminal-opts.c | 58 +-
src/ui/terminal/terminal-opts.h | 8 +-
src/ui/terminal/{read-line.c => terminal-reader.c} | 310 ++--
.../repeat.h => ui/terminal/terminal-reader.h} | 11 +-
tests/automake.mk | 43 +
tests/data/data-in.at | 64 +-
tests/data/sys-file-reader.at | 295 +--
tests/dissect-sysfile.c | 8 +-
tests/language/control/do-repeat.at | 101 +-
tests/language/data-io/data-list.at | 6 +-
tests/language/data-io/get.at | 2 -
tests/language/data-io/inpt-pgm.at | 6 +-
tests/language/data-io/print.at | 2 +-
tests/language/dictionary/missing-values.at | 2 +-
tests/language/dictionary/sys-file-info.at | 4 +-
tests/language/expressions/evaluate.at | 9 +-
tests/language/expressions/parse.at | 2 +-
tests/language/lexer/lexer.at | 43 +
tests/language/lexer/q2c.at | 2 +-
tests/language/lexer/scan-test.c | 217 ++
tests/language/lexer/scan.at | 818 ++++++++
tests/language/lexer/segment-test.c | 318 +++
tests/language/lexer/segment.at | 1070 ++++++++++
tests/language/stats/aggregate.at | 4 +-
tests/language/stats/rank.at | 6 +-
tests/language/utilities/insert.at | 29 +-
tests/libpspp/encoding-guesser-test.c | 102 +
tests/libpspp/encoding-guesser.at | 143 ++
tests/libpspp/i18n-test.c | 68 +-
tests/libpspp/i18n.at | 125 +-
tests/libpspp/u8-istream-test.c | 126 ++
tests/libpspp/u8-istream.at | 142 ++
184 files changed, 12208 insertions(+), 5428 deletions(-)
create mode 100644 src/data/identifier2.c
create mode 100644 src/language/lexer/include-path.c
copy src/{libpspp/temp-file.h => language/lexer/include-path.h} (71%)
create mode 100644 src/language/lexer/scan.c
create mode 100644 src/language/lexer/scan.h
create mode 100644 src/language/lexer/segment.c
create mode 100644 src/language/lexer/segment.h
create mode 100644 src/language/lexer/token.c
copy src/language/{stats/friedman.h => lexer/token.h} (51%)
delete mode 100644 src/language/prompt.c
delete mode 100644 src/language/syntax-file.c
delete mode 100644 src/language/syntax-file.h
delete mode 100644 src/language/syntax-string-source.c
delete mode 100644 src/language/syntax-string-source.h
create mode 100644 src/libpspp/encoding-guesser.c
create mode 100644 src/libpspp/encoding-guesser.h
delete mode 100644 src/libpspp/getl.c
delete mode 100644 src/libpspp/getl.h
delete mode 100644 src/libpspp/msg-locator.c
delete mode 100644 src/libpspp/msg-locator.h
copy src/{language/utilities/cache.c => libpspp/prompt.c} (63%)
rename src/{language => libpspp}/prompt.h (69%)
create mode 100644 src/libpspp/u8-istream.c
create mode 100644 src/libpspp/u8-istream.h
delete mode 100644 src/ui/gui/syntax-editor-source.c
delete mode 100644 src/ui/gui/syntax-editor-source.h
delete mode 100644 src/ui/terminal/msg-ui.c
delete mode 100644 src/ui/terminal/msg-ui.h
delete mode 100644 src/ui/terminal/read-line.h
rename src/ui/terminal/{read-line.c => terminal-reader.c} (53%)
rename src/{language/control/repeat.h => ui/terminal/terminal-reader.h} (77%)
create mode 100644 tests/language/lexer/scan-test.c
create mode 100644 tests/language/lexer/scan.at
create mode 100644 tests/language/lexer/segment-test.c
create mode 100644 tests/language/lexer/segment.at
create mode 100644 tests/libpspp/encoding-guesser-test.c
create mode 100644 tests/libpspp/encoding-guesser.at
create mode 100644 tests/libpspp/u8-istream-test.c
create mode 100644 tests/libpspp/u8-istream.at
hooks/post-receive
--
GNU PSPP
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Pspp-commits] [SCM] GNU PSPP branch, master, updated. v0.6.1-1932-g9ade26c,
Ben Pfaff <=