Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.canton.network/llms.txt

Use this file to discover all available pages before exploring further.

Run and prune

To run PQS you need the following:
  • PostgreSQL database server
  • Daml Sandbox or Canton Participant Node as the source of ledger data
  • Any access tokens or TLS certificates required by the above
  • PQS’ scribe.jar or Docker image

Running PQS

PQS application mostly runs as a long-running process, but also includes several user-interactive commands:
CommandDescription
pipeline ledger postgres-documentInitiate continuous ledger data export
datastore postgres-document schema showInfer required database schema, display it and quit
datastore postgres-document schema applyInfer required database schema, apply it to data store and quit
datastore postgres-document prunePrune transactions to a given offset inclusively and quit
Consult pqs-references-configuration-options how to configure each command. PQS pipeline is crash friendly and restarts automatically. See pqs-ledger-streaming-and-recovery for more details on how PQS recovers from a crash.

Getting help

Exploring commands and parameters is easiest via the --help (and --help-verbose) arguments: For example, if you are running a downloaded .jar file:
$ ./scribe.jar --help
Usage: scribe COMMAND

An efficient ledger data exporting tool

Options:
  -h, --help            Print help information and quit
  -H, --help-verbose    Print help information with extra details and quit
  -v, --version         Print version information and quit

Commands:
  pipeline     Initiate continuous ledger data export
  datastore    Perform operations supporting a certified data store

Run 'scribe COMMAND --help[-verbose]' for more information on a command.
Or similarly, using Docker:
$ docker run -it europe-docker.pkg.dev/da-images/public/docker/participant-query-store:0.6.14 --help
Picked up JAVA_TOOL_OPTIONS: -javaagent:/open-telemetry.jar
Usage: scribe COMMAND

An efficient ledger data exporting tool

Commands:
  pipeline     Initiate continuous ledger data export
  datastore    Perform operations supporting a certified data store

Run 'scribe COMMAND --help[-verbose]' for more information on a command.

History slicing

As described in pqs-ledger-streaming-and-recovery you can use PQS with --pipeline-ledger-start and --pipeline-ledger-stop to ask for the slice of the history you want. There are some constraints on start and stop offsets which cause PQS to fail-fast if they are violated. You cannot use:
  1. Offsets that are outside ledger history
  2. Pruned offsets or Genesis on pruned ledger
  3. Offsets that lead to a gap in datastore history
  4. Offsets that are before the PQS datastore history
In the above examples:
  • Request represents offsets requested via --pipeline-ledger-start and --pipeline-ledger-stop arguments
  • Participant Node represents the availability of unpruned ledger history in the Participant Node
  • Datastore represents data in the PQS database

Pruning

Pruning ledger data from the database can help reduce storage size and improve query performance by removing old and irrelevant data. PQS provides two approaches to prune ledger data: using the PQS CLI or using the prune_archived_to_offset() SQL function (see pqs-references-sql-api).
The legacy prune_to_offset function has been deprecated since version 3.5.0. It is known to cause deadlocks and introduces performance bottlenecks.It is succeeded by prune_archived_to_offset, which resolves these issues by modifying the pruning logic. Unlike the legacy function, the new variant does not collapse historical transactions. Instead, it preserves untouched all active contracts, and maintains their associated events and transactions.
Decide on what are the oldest offsets that you will ever need in PQS and setup periodic pruning for data at offsets older than that. Thereby ensuring that your query performance does not deteriorate over time as your PQS database continuously increases in size. In case you need all data from ledger begin consider:
  • data growth rate, and
  • size your database server to comfortably hold that data
Calling either the prune CLI command with --prune-mode Force or calling the PostgreSQL function prune_archived_to_offset() deletes data irrevocably

Data deletion and changes

Both pruning approaches (CLI and SQL function) share the same behavior in terms of data deletion and changes:
  • Removes archived contracts and their associated create/archive events.
  • Removes exercises and their corresponding exercise events.
  • Removes transactions that no longer reference active contracts.
They also provide the following guarantees:
  • All currently active contracts and their history remain intact.
  • All data (transactions/events/choices/contracts) for transaction with an offset greater than the pruning target remains intact.
The target offset, that is, the offset provided via --prune-target or as argument to prune_archived_to_offset() SQL function is the transaction with the highest offset to be affected by the pruning operation.
If the provided offset does not have a transaction associated with it, the effective target offset becomes the oldest offset that succeeds (is greater than) the provided offset.
Pruning is a destructive operation and cannot be undone. If necessary, make sure to back up your data before performing any pruning operations.

Constraints

Some constraints apply to pruning operations (see also pqs-references-pqs-time-model):
  1. The provided target offset must be within the bounds of the contiguous history. If the target offset is outside the bounds, an error is raised.
  2. The pruning operation cannot coincide with the latest consistent checkpoint of the contiguous history. If so, it raises an error.

Pruning from the command line

The PQS CLI provides a prune command that allows you to prune the ledger data up to a specified offset, timestamp, or duration. For detailed information on all available options, please run:
$ ./scribe.jar datastore postgres-document prune --help-verbose
To use the prune command, you need to provide a pruning target as an argument. The pruning target can be an offset, a timestamp, or a duration (ISO 86011):
$ ./scribe.jar datastore postgres-document prune --prune-target '<offset>'
By default, the prune command performs a dry run, meaning it displays the effects of the pruning operation without actually deleting any data. To execute the pruning operation, add the --prune-mode Force option:
$ ./scribe.jar datastore postgres-document prune --prune-target '<offset>' --prune-mode Force

Example with timestamp and duration

In addition to providing an offset as --prune-target, a timestamp or duration can also be used as a pruning cut-off. For example, to prune data older than 30 days (relative to now), you can use the following command:
$ ./scribe.jar datastore postgres-document prune --prune-target P30D
To prune data up to a specific timestamp, you can use the following command:
$ ./scribe.jar datastore postgres-document prune --prune-target 2023-01-30T00:00:00.000Z

Pruning with sql function

The prune_archived_to_offset() is a SQL function that allows you to prune the ledger data up to a specified offset. It has the same behavior as the datastore postgres-document prune command, but does not feature a dry-run option.
The legacy prune_to_offset function has been deprecated since version 3.5.0. It is known to cause deadlocks and introduces performance bottlenecks.It is succeeded by prune_archived_to_offset, which resolves these issues by modifying the pruning logic. Unlike the legacy function, the new variant does not collapse historical transactions. Instead, it preserves untouched all active contracts, and maintains their associated events and transactions.
To use prune_archived_to_offset(), you need to provide an offset:
select * from prune_archived_to_offset('<offset>');
The function deletes transactions and updates active contracts as described above. You can use prune_archived_to_offset() in combination with the nearest_offset() function to prune data up to a specific timestamp or interval:
select * from prune_archived_to_offset(nearest_offset('1970-01-01 08:01:00+08' :: timestamp with time zone));
select * from prune_archived_to_offset(nearest_offset('PT2H' :: interval));
select * from prune_archived_to_offset(nearest_offset(interval '3 days'));

Resetting

Reset-to-offset is a manual procedure that deletes all transactions from the PQS database after a given offset. This allows you to restart processing from the offset as if subsequent transactions have never been processed.
Reset is a dangerous, destructive, and permanent procedure that needs to be coordinated within the entire ecosystem and not performed in isolation.
Reset can be useful to perform a point-in-time rollback of the ledger in a range of circumstances. For example, in the event of:
  1. Unexepected new entities - A new scope, such as a Party or template, appears in ledger transactions without coordination. That is, new transactions arrive without ensuring PQS is restarted - to ensure it knows about these new enitities prior.
  2. Ledger roll-back - If a ledger is rolled-back due to the disaster recovery process, you will need to perform a similar roll back with PQS. This is a manual process that requires coordination with the Participant Node.
The procedure:
  • Stop any applications that use the PQS database.
  • Stop the PQS process.
  • Connect to the PostgreSQL as an administrator.
  • Prevent PQS database readers from interacting (revoke connect).
  • Terminate any other remaining connections:
    select pg_terminate_backend(pid)
    from pg_stat_activity
    where pid <> pg_backend_pid() and datname = current_database();
    
  • Obtain a summary of the scope of the proposed reset and validate that the intended outcome matches your expectations by performing a dry run:
    select * from validate_reset_offset("0000000000000a8000");
    
  • Implement the destructive changes of removing all transactions after the given offset and adjust internal metadata to allow PQS to resume processing from the supplied offset:
    select * from reset_to_offset("0000000000000a8000");
    
  • Re-enable access for PQS database users (grant connect)
  • Wait for the Participant Node to be available post-repair.
  • Start PQS.
  • Conduct any remedial action required in PQS database consumers, to account for the fact that the ledger appears to be rolled back to the specified offset.
  • Start applications that use the PQS database and resume operation.
The provided target offset must be within the bounds of the contiguous history. If the target offset is outside the bounds, it raises an error.

Redacting

The redaction feature enables removal of sensitive or personally identifiable information from contracts and exercises within the PQS database. This operation is particularly useful for complying with privacy regulations and data protection laws, as it enables the permanent removal of contract payloads, contract keys, choice arguments, and choice results. Note that redaction is a destructive operation and once redacted, information cannot be restored. The redaction process involves assigning a redaction_id to a contract or an exercise and nullifying its sensitive data fields. For contracts, the payload and contract_key fields are redacted, while for exercises, the argument and result fields are redacted.

Conditions for redaction

The following conditions apply to contracts and interface views:
  • You cannot redact an active contract
  • A redacted contract cannot be redacted again
There are no restrictions on the redaction of choice exercise events. A redaction operation requires a redaction ID, which is an arbitrary label to identify the redaction and provide information about its reason, and correlate with other systems that coordinate such activity.

Examples

Redacting an archived contract

To redact an archived contract, use the redact_contract function by providing the contract_id and a redaction_id. The intent of the redaction_id is to provide a case reference to identify the reason why the redaction has taken place, and it should be set according to organizational policies. This operation nullifies the payload and contract_key of the contract and assigns the redaction_id.
select redact_contract('<contract_id>', '<redaction_id>');
Redaction is applied to the contract and its interface views, if any, and it returns the number of affected entries.

Redacting a choice exercise

To redact an exercise, use the redact_exercise function by providing the event_id of the exercise and a redaction_id. This nullifies the argument and result of the exercise and assigns the redaction_id.
select redact_exercise('<event_id>', '<redaction_id>');

Accessing redaction information

The redaction_id of a contract is exposed as a column in the following functions of the SQL API. The columns payload and contract_key for a redacted contract are NULL.
  • creates(...)
  • archives(...)
  • active(...)
  • lookup_contract(...)
The redaction_id of an exercise event is exposed as a column in the following functions of the SQL API. The columns argument and result for a redacted exercise are NULL:
  • exercises(...)
  • lookup_exercises(...)

Representative package ID support

PQS does not support ingesting contract payloads where the original package ID of a Ledger API create event is not available in the Participant Node’s package store that the PQS instance is connected to. Such a situation can occur on an ACS import procedure on the Participant Node, where the original package ID is replaced by its representative package ID
If you are using ACS import/export procedures that can replace the original package ID of contracts, please ensure that the original package IDs are also uploaded to the Participant Node’s package store to avoid disruptions in PQS processing.

Observe

This section describes observability features of PQS, which are designed to help you monitor health and performance of the application.

Approach to observability

PQS opted to incorporate OpenTelemetry APIs to provide its observability features. All three sources of signals (traces, metrics, and logs) can be exported to various backends by providing appropriate configuration defined by OpenTelemetry protocols and guidelines. This makes PQS flexible in terms of observability backends, allowing users to choose what fits their needs and established infrastructure without being overly prescriptive. To have PQS emit observability data, an OpenTelemetry Java Agent must be attached to the JVM running PQS. OpenTelemetry’s documentation page on Java Agent Configuration1 has all the necessary information to get started. As a frequently requested shortcut (only metrics over Prometheus exposition endpoint embedded by PQS), the following snippet can help you get started. For more details, refer to the official documentation:
$ export OTEL_SERVICE_NAME=pqs
$ export OTEL_TRACES_EXPORTER=none
$ export OTEL_LOGS_EXPORTER=none
$ export OTEL_METRICS_EXPORTER=prometheus
$ export OTEL_EXPORTER_PROMETHEUS_PORT=9090
$ export JDK_JAVA_OPTIONS="-javaagent:path/to/opentelemetry-javaagent.jar"
$ ./scribe.jar pipeline ledger postgres-document ...
PQS Docker images already come pre-configured this way, but users are free to override these values as they see fit for their environments.

Logging

Log level

Set log level with --logger-level. Possible value are All, Fatal, Error, Warning, Info (default), Debug, Trace:
--logger-level=Debug

Per-logger log level

Use --logger-mappings to adjust the log level for individual loggers. For example, to remove Netty network traffic from a more detailed overall log:
--logger-mappings-io.netty=Warning \
--logger-mappings-io.grpc.netty=Trace

Log pattern

With --logger-pattern, use one of the predefined patterns, such as Plain (default), Standard (standard format used in DA applications), Structured, or set your own. Check Log Format Configuration2 for more details. To use your custom format, provide its string representation, such as:
--logger-pattern="%highlight{%fixed{1}{%level}} [%fiberId] %name:%line %highlight{%message} %highlight{%cause} %kvs"

Log format for console output

Use --logger-format to set the log format. Possible values are Plain (default) or Json. These formats can be used for the pipeline command.

Log format for file output

Use --logger-format to set the log format. Possible values are Plain (default), Json, PlainAsync and JsonAsync. They can be used for the interactive commands, such as prune. For PlainAsync and JsonAsync, log entries are written to the destination file asynchronously.

Destination file for file output

Use --logger-destination to set the path to the destination file (default: output.log) for interactive commands, such as prune.

Log format and log pattern combinations

  • Plain / Plain
    00:00:23.737 I [zio-fiber-0] com.digitalasset.scribe.pipeline.pipeline.Impl:34 Starting pipeline on behalf of 'Alice_1::12209982174bbaf1e6283234ab828bcab9b73fbe313315b181134bcae9566d3bbf1b'  application=scribe
    00:00:24.658 I [zio-fiber-0] com.digitalasset.scribe.pipeline.pipeline.Impl:61 Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'  application=scribe
    00:00:25.043 I [zio-fiber-895] com.digitalasset.zio.daml.ledgerapi.package:201 Contract filter inclusive of 1 templates and 0 interfaces  application=scribe
    00:00:25.724 I [zio-fiber-0] com.digitalasset.scribe.pipeline.pipeline.Impl:85 Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'  application=scribe
    
  • Plain / Standard
    component=scribe instance_uuid=5f707d27-8188-4a44-904e-2f98ee9f4177 timestamp=2024-01-16T23:42:38.902+0000 level=INFO correlation_id=tbd description=Starting pipeline on behalf of 'Alice_1::1220c6d22d46d59c8454bd245e5a3bc238e5024d37bfd843dbad6885674f3a9673c5'  scribe=application=scribe
    component=scribe instance_uuid=5f707d27-8188-4a44-904e-2f98ee9f4177 timestamp=2024-01-16T23:42:39.734+0000 level=INFO correlation_id=tbd description=Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'  scribe=application=scribe
    component=scribe instance_uuid=5f707d27-8188-4a44-904e-2f98ee9f4177 timestamp=2024-01-16T23:42:39.982+0000 level=INFO correlation_id=tbd description=Contract filter inclusive of 1 templates and 0 interfaces  scribe=application=scribe
    component=scribe instance_uuid=5f707d27-8188-4a44-904e-2f98ee9f4177 timestamp=2024-01-16T23:42:40.476+0000 level=INFO correlation_id=tbd description=Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'  scribe=application=scribe
    
  • Plain / Custom
    --logger-pattern=%timestamp{yyyy-MM-dd'T'HH:mm:ss} %level %name:%line %highlight{%message} %highlight{%cause} %kvs
    
    2024-01-16T23:55:52 INFO com.digitalasset.scribe.pipeline.pipeline.Impl:34 Starting pipeline on behalf of 'Alice_1::1220444f494b31c0a40c2f393edac3f5900325028c6f810a203a0334cd830ec230c8'  application=scribe
    2024-01-16T23:55:53 INFO com.digitalasset.scribe.pipeline.pipeline.Impl:61 Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'  application=scribe
    2024-01-16T23:55:53 INFO com.digitalasset.zio.daml.ledgerapi.package:201 Contract filter inclusive of 1 templates and 0 interfaces  application=scribe
    2024-01-16T23:55:53 INFO com.digitalasset.scribe.pipeline.pipeline.Impl:85 Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'  application=scribe
    
  • Json / Standard
    {"component":"scribe","instance_uuid":"03c263a0-6e3d-416e-b7f2-0e56b9e34841","timestamp":"2024-01-17T00:04:12.537+0000","level":"INFO","correlation_id":"tbd","description":"Starting pipeline on behalf of 'Alice_1::1220f03ed424480ab4487d88230fc033f3910f4cb4492fea68535a5760744b53dabe'","scribe":{"application":"scribe"}}
    {"component":"scribe","instance_uuid":"03c263a0-6e3d-416e-b7f2-0e56b9e34841","timestamp":"2024-01-17T00:04:13.551+0000","level":"INFO","correlation_id":"tbd","description":"Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'","scribe":{"application":"scribe"}}
    {"component":"scribe","instance_uuid":"03c263a0-6e3d-416e-b7f2-0e56b9e34841","timestamp":"2024-01-17T00:04:13.935+0000","level":"INFO","correlation_id":"tbd","description":"Contract filter inclusive of 1 templates and 0 interfaces","scribe":{"application":"scribe"}}
    {"component":"scribe","instance_uuid":"03c263a0-6e3d-416e-b7f2-0e56b9e34841","timestamp":"2024-01-17T00:04:14.659+0000","level":"INFO","correlation_id":"tbd","description":"Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'","scribe":{"application":"scribe"}}
    
  • Json / Structured
    {"timestamp":"2024-01-17T00:08:25+0000","level":"INFO","thread":"zio-fiber-0","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:34","message":"Starting pipeline on behalf of 'Alice_1::122077c6b00e952ff694e2b25b6f5eb9582f815dfe793e2da668b119481a1dd5acdc'","application":"scribe"}
    {"timestamp":"2024-01-17T00:08:26+0000","level":"INFO","thread":"zio-fiber-0","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:61","message":"Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'","application":"scribe"}
    {"timestamp":"2024-01-17T00:08:26+0000","level":"INFO","thread":"zio-fiber-882","location":"com.digitalasset.zio.daml.ledgerapi.package:201","message":"Contract filter inclusive of 1 templates and 0 interfaces","application":"scribe"}
    {"timestamp":"2024-01-17T00:08:26+0000","level":"INFO","thread":"zio-fiber-0","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:85","message":"Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'","application":"scribe"}
    
  • Json / Custom
    --logger-pattern=%label{timestamp}{%timestamp{yyyy-MM-dd'T'HH:mm:ss}} %label{level}{%level} %label{location}{%name:%line} %label{description}{%message} %label{cause}{%cause} %label{scribe}{%kvs}
    
    {"timestamp":"2024-01-17T00:16:31","level":"INFO","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:34","description":"Starting pipeline on behalf of 'Alice_1::1220ee13431ac437d454ea59d622cfc76599e0846a3caf166b4306d47b1bf83944a6'","scribe":{"application":"scribe"}}
    {"timestamp":"2024-01-17T00:16:33","level":"INFO","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:61","description":"Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'","scribe":{"application":"scribe"}}
    {"timestamp":"2024-01-17T00:16:34","level":"INFO","location":"com.digitalasset.zio.daml.ledgerapi.package:201","description":"Contract filter inclusive of 1 templates and 0 interfaces","scribe":{"application":"scribe"}}
    {"timestamp":"2024-01-17T00:16:35","level":"INFO","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:85","description":"Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'","scribe":{"application":"scribe"}}
    
    Notice you need to use %label{your_label}{format} to describe a Json attribute-value pair.

Application metrics

Assuming PQS exposes metrics as described above, you can access the following metrics at http://localhost:9090/metrics. Each metric is accompanied by # HELP and # TYPE comments, which describe the meaning of the metric and its type, respectively. Some metric types have additional constituent parts exposed as separate metrics. For example, a histogram metric type tracks max, count, sum, and actual ranged buckets as separate time series. Metrics are labeled where it makes sense, providing additional context such as the type of operation or the template/choice involved. Conceptual list of metrics (refer to actual metric names in the Prometheus output):
TypeNameDescription
gaugewatermark_ixCurrent watermark index (transaction ordinal number for consistent reads)
counterpipeline_events_totalProcessed ledger events
histogramjdbc_conn_useLatency of database connections usage
histogramjdbc_conn_isvalidLatency of database connection validation
histogramjdbc_conn_commitLatency of database connection commit
histogramtotal_tx_handling_latencyTotal latency of transaction handling in PQS (observed in LAPI to committed in DB)
gaugetx_lag_from_ledger_wallclockLag from ledger (wall-clock delta (in ms) from command completion to receipt by pipeline)
histogrampipeline_convert_acs_eventLatency of converting ACS events
histogrampipeline_convert_transactionLatency of converting transactions
histogrampipeline_prepare_batch_latencyLatency of preparing batches of statements
histogrampipeline_execute_batch_latencyLatency of executing batches of statements
histogrampipeline_progress_watermark_latencyLatency of watermark progression
histogrampipeline_wp_acs_events_sizeNumber of in-flight units of work in pipeline_wp_acs_events wait point
histogrampipeline_wp_acs_statements_sizeNumber of in-flight units of work in pipeline_wp_acs_statements wait point
histogrampipeline_wp_acs_batched_statements_sizeNumber of in-flight units of work in pipeline_wp_acs_batched_statements wait point
histogrampipeline_wp_acs_prepared_statements_sizeNumber of in-flight units of work in pipeline_wp_acs_prepared_statements wait point
histogrampipeline_wp_events_sizeNumber of in-flight units of work in pipeline_wp_events wait point
histogrampipeline_wp_statements_sizeNumber of in-flight units of work in pipeline_wp_statements wait point
histogrampipeline_wp_batched_statements_sizeNumber of in-flight units of work in pipeline_wp_batched_statements wait point
histogrampipeline_wp_prepared_statements_sizeNumber of in-flight units of work in pipeline_wp_prepared_statements wait point
histogrampipeline_wp_watermarks_sizeNumber of in-flight units of work in pipeline_wp_watermarks wait point
counterpipeline_wp_acs_events_totalNumber of units of work processed in pipeline_wp_acs_events wait point
counterpipeline_wp_acs_statements_totalNumber of units of work processed in pipeline_wp_acs_statements wait point
counterpipeline_wp_acs_batched_statements_totalNumber of units of work processed in pipeline_wp_acs_batched_statements wait point
counterpipeline_wp_acs_prepared_statements_totalNumber of units of work processed in pipeline_wp_acs_prepared_statements wait point
counterpipeline_wp_events_totalNumber of units of work processed in pipeline_wp_events wait point
counterpipeline_wp_statements_totalNumber of units of work processed in pipeline_wp_statements wait point
counterpipeline_wp_batched_statements_totalNumber of units of work processed in pipeline_wp_batched_statements wait point
counterpipeline_wp_prepared_statements_totalNumber of units of work processed in pipeline_wp_prepared_statements wait point
counterpipeline_wp_watermarks_totalNumber of units of work processed in pipeline_wp_watermarks wait point
counterapp_restarts_totalTracks number of times recoverable failures forced the pipeline to restart
gaugegrpc_upIndicator whether gRPC channel is up and operational
gaugejdbc_conn_pool_upIndicator whether JDBC connection pool is up and operational

Grafana dashboard

Based on the metrics described above, it is possible to build a comprehensive dashboard to monitor PQS. Vendor-supplied Grafana dashboard for PQS can be downloaded from artifacts repository (see pqs-download). You may want to refer to this as a starting point for your own.
grafana/v9.4.0/dashboard.json
grafana/v10.4.0/dashboard.json
grafana/v11.0.0/dashboard.json
image

Health check

The health of the PQS process can be monitored using the health check endpoint /livez. The health check endpoint is available on the configured network interface (--health-address) and TCP port (--health-port). Note the default is 127.0.0.1:8080.
$ curl http://localhost:8080/livez
{"status":"ok"}

Tracing of pipeline execution

PQS instruments the most critical parts of its operations with tracing to provide insights into the execution flow and performance. Traces can be exported to various OpenTelemetry backends by providing appropriate configuration, for example:
$ export OTEL_TRACES_EXPORTER=otlp
$ export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
$ export OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4317"
$ export JDK_JAVA_OPTIONS="-javaagent:path/to/opentelemetry-javaagent.jar"
$ ./scribe.jar pipeline ledger postgres-document ...
The following root spans are emitted by PQS:
span namedescription
process metadata and schemainteractions that happen when PQS starts up and ensures its datastore is ready for operations
initialization routineinteractions that happen when PQS establishes its offset range boundaries (including seeding from ACS if requested) on startup
consume com.daml.ledger.api.v1.TransactionService/GetTransactions consume com.daml.ledger.api.v1.TransactionService/GetTransactionTrees[Daml SDK v2.x] timeline of processing a ledger transaction from delivery over gRPC to its persistence to datastore
consume com.daml.ledger.api.v2.UpdateService/GetUpdates consume com.daml.ledger.api.v2.UpdateService/GetUpdateTrees[Daml SDK v3.x] timeline of processing a ledger transaction from delivery over gRPC to its persistence to datastore
execute datastore transactioninteractions when a batch of transactions is persisted to the datastore
advance datastore watermarkinteractions when the latest consecutive watermark is persisted to the datastore
All spans are enriched with contextual information through OpenTelemetry’s attributes and events where appropriate. It is advisable to get to know this contextual data. Due to the technical nature of asynchronous and parallel execution, PQS heavily employs span links3 to highlight causal relationships between independent traces. Modern trace visualisation tools leverage this information to provide a usable representation and navigation through the involved traces. Below is an example of causal trace data that spans receipt of a transaction from the Ledger API all the way to it becoming visible by PQS’ SQL API in Postgres. image
Span #110
Trace ID       : 042ce1ffa24b34b38472933ac8209d54
Parent ID      :
ID             : d5c0071e1d9bbf76
Name           : consume com.daml.ledger.api.v1.TransactionService/GetTransactionTrees
Kind           : Consumer
Start time     : 2024-11-06 03:16:43.004004 +0000 UTC
End time       : 2024-11-06 03:16:43.004193 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> messaging.operation.name: Str(consume)
     -> messaging.batch.message_count: Int(1)
     -> messaging.destination.name: Str(com.daml.ledger.api.v1.TransactionService/GetTransactionTrees)
     -> messaging.system: Str(canton)
     -> messaging.operation.type: Str(process)

Span #123
Trace ID       : 042ce1ffa24b34b38472933ac8209d54
Parent ID      : d5c0071e1d9bbf76
ID             : 9d60e1f4c42dce76
Name           : export transaction tree
Kind           : Internal
Start time     : 2024-11-06 03:16:43.004134 +0000 UTC
End time       : 2024-11-06 03:16:43.024574 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> daml.effective_at: Str(2024-11-06T03:16:42.827847Z)
     -> daml.command_id: Str(3563113460)
     -> daml.events_count: Int(3)
     -> daml.workflow_id: Empty()
     -> daml.transaction_id: Str(122056219af2a73f913e1c2f0ce4422c156bc9cfdb5e5d49baaee0053bf3787f4a97)
     -> daml.offset: Str(000000000000000261)
Events:
SpanEvent #0
     -> Name: canonicalizing transaction tree
     -> Timestamp: 2024-11-06 03:16:43.004809542 +0000 UTC
SpanEvent #1
     -> Name: canonicalized transaction tree
     -> Timestamp: 2024-11-06 03:16:43.005138375 +0000 UTC
SpanEvent #2
     -> Name: converting canonical transaction to domain model
     -> Timestamp: 2024-11-06 03:16:43.005690625 +0000 UTC
SpanEvent #3
     -> Name: converted canonical transaction to domain model
     -> Timestamp: 2024-11-06 03:16:43.006170917 +0000 UTC
SpanEvent #4
     -> Name: released transaction model into batch
     -> Timestamp: 2024-11-06 03:16:43.015018459 +0000 UTC
SpanEvent #5
     -> Name: prepared SQL statements for transaction model
     -> Timestamp: 2024-11-06 03:16:43.015437 +0000 UTC
SpanEvent #6
     -> Name: flushed transaction model SQL to datastore
     -> Timestamp: 2024-11-06 03:16:43.019356042 +0000 UTC
SpanEvent #7
     -> Name: advanced datastore watermark
     -> Timestamp: 2024-11-06 03:16:43.024570334 +0000 UTC
     -> Attributes::
          -> index: Int(384)
          -> offset: Str(000000000000000261)
Links:
SpanLink #0
     -> Trace ID: 839da768a12333920b709410fb73911a
     -> ID: 276627b6e10f62c5
     -> TraceState:
     -> Attributes::
          -> target: Str(↥ ledger submission)
SpanLink #1
     -> Trace ID: 76c58361d46c08761c37ef5821e8fb78
     -> ID: 6051f05f10af0399
     -> TraceState:
     -> Attributes::
          -> target: Str(↧ persist to datastore)
SpanLink #2
     -> Trace ID: 71e67e2420deeef36ef3efacea6399dc
     -> ID: 161b5911e7a0ec18
     -> TraceState:
     -> Attributes::
          -> target: Str(↧ advance watermark)
image
Span #115
Trace ID       : 76c58361d46c08761c37ef5821e8fb78
Parent ID      :
ID             : 81f5f42361aa93ee
Name           : execute datastore transaction
Kind           : Internal
Start time     : 2024-11-06 03:16:43.015931 +0000 UTC
End time       : 2024-11-06 03:16:43.020991 +0000 UTC
Status code    : Unset
Status message :

Span #111
Trace ID       : 76c58361d46c08761c37ef5821e8fb78
Parent ID      : 81f5f42361aa93ee
ID             : 4bf8484e99999c64
Name           : acquire connection
Kind           : Internal
Start time     : 2024-11-06 03:16:43.016475 +0000 UTC
End time       : 2024-11-06 03:16:43.016688 +0000 UTC
Status code    : Unset
Status message :

Span #113
Trace ID       : 76c58361d46c08761c37ef5821e8fb78
Parent ID      : 81f5f42361aa93ee
ID             : 6051f05f10af0399
Name           : execute batch
Kind           : Internal
Start time     : 2024-11-06 03:16:43.016828 +0000 UTC
End time       : 2024-11-06 03:16:43.019494 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> scribe.batch.models_count: Int(37)
Links:
SpanLink #0
     -> Trace ID: 33736b299a690b885c2314b9b17bde05
     -> ID: aba3d1dd6024ff71
     -> TraceState:
     -> Attributes::
          -> offset: Str(00000000000000025c)
          -> target: Str(↥ incoming transaction)
SpanLink #1
     -> Trace ID: 17f3edce9565defd379bf3ab8243f86d
     -> ID: 076afe5b4aac1212
     -> TraceState:
     -> Attributes::
          -> offset: Str(00000000000000025d)
          -> target: Str(↥ incoming transaction)
SpanLink #2
     -> Trace ID: 646ae61de95731c7726a6caee2d69ee9
     -> ID: bca9f5c28de74c90
     -> TraceState:
     -> Attributes::
          -> offset: Str(00000000000000025e)
          -> target: Str(↥ incoming transaction)
SpanLink #3
     -> Trace ID: 9ebd4d4f288b8b338f4192c0d7ea1b8c
     -> ID: a1d145fa9d76d5b3
     -> TraceState:
     -> Attributes::
          -> offset: Str(00000000000000025f)
          -> target: Str(↥ incoming transaction)
SpanLink #4
     -> Trace ID: e0716f968b5019a450da04317ea8f776
     -> ID: a75658ce89441bee
     -> TraceState:
     -> Attributes::
          -> offset: Str(000000000000000260)
          -> target: Str(↥ incoming transaction)
SpanLink #5
     -> Trace ID: 042ce1ffa24b34b38472933ac8209d54
     -> ID: 9d60e1f4c42dce76
     -> TraceState:
     -> Attributes::
          -> offset: Str(000000000000000261)
          -> target: Str(↥ incoming transaction)

Span #112
Trace ID       : 76c58361d46c08761c37ef5821e8fb78
Parent ID      : 6051f05f10af0399
ID             : 00419239933510fa
Name           : execute SQL
Kind           : Internal
Start time     : 2024-11-06 03:16:43.016855 +0000 UTC
End time       : 2024-11-06 03:16:43.019162 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> scribe.__contracts.rows_count: Int(9)
     -> scribe.__exercises.rows_count: Int(3)
     -> scribe.__events.rows_count: Int(12)
     -> scribe.__archives.rows_count: Int(1)
     -> scribe.__transactions.rows_count: Int(6)

Span #114
Trace ID       : 76c58361d46c08761c37ef5821e8fb78
Parent ID      : 81f5f42361aa93ee
ID             : 9872ff55adc9e370
Name           : commit transaction
Kind           : Internal
Start time     : 2024-11-06 03:16:43.019916 +0000 UTC
End time       : 2024-11-06 03:16:43.020742 +0000 UTC
Status code    : Unset
Status message :
image
Span #124
Trace ID       : 71e67e2420deeef36ef3efacea6399dc
Parent ID      :
ID             : 161b5911e7a0ec18
Name           : advance datastore watermark
Kind           : Internal
Start time     : 2024-11-06 03:16:43.021507 +0000 UTC
End time       : 2024-11-06 03:16:43.024872 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> scribe.watermark.offset: Str(000000000000000261)
     -> scribe.watermark.ix: Int(384)
Links:
SpanLink #0
     -> Trace ID: 76c58361d46c08761c37ef5821e8fb78
     -> ID: 6051f05f10af0399
     -> TraceState:
     -> Attributes::
          -> target: Str(↥ persist to datastore)

Span #116
Trace ID       : 71e67e2420deeef36ef3efacea6399dc
Parent ID      : 161b5911e7a0ec18
ID             : 33ab3918ebfe138d
Name           : acquire connection
Kind           : Internal
Start time     : 2024-11-06 03:16:43.022009 +0000 UTC
End time       : 2024-11-06 03:16:43.022222 +0000 UTC
Status code    : Unset
Status message :

Span #6
Trace ID       : 71e67e2420deeef36ef3efacea6399dc
Parent ID      : 161b5911e7a0ec18
ID             : 1a66240dfd597654
Name           : UPDATE scribe.__watermark
Kind           : Client
Start time     : 2024-11-06 03:16:43.022737084 +0000 UTC
End time       : 2024-11-06 03:16:43.023134917 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> db.operation: Str(UPDATE)
     -> db.sql.table: Str(__watermark)
     -> db.name: Str(scribe)
     -> db.connection_string: Str(postgresql://postgres-scribe:5432)
     -> server.address: Str(postgres-scribe)
     -> server.port: Int(5432)
     -> db.user: Str(pguser)
     -> db.statement: Str(update __watermark set "offset" = ?, ix = ?;)
     -> db.system: Str(postgresql)

Span #117
Trace ID       : 71e67e2420deeef36ef3efacea6399dc
Parent ID      : 161b5911e7a0ec18
ID             : f0b24fb074fe41f8
Name           : commit transaction
Kind           : Internal
Start time     : 2024-11-06 03:16:43.023629 +0000 UTC
End time       : 2024-11-06 03:16:43.024157 +0000 UTC
Status code    : Unset
Status message :

Trace context propagation

PQS is an intermediary between a ledger instance and downstream applications that would prefer to access data through SQL rather than in streaming manner from Ledger API directly. Despite forming a pipeline between two data storage systems (Canton and PostgreSQL), PQS stores the original ledger transaction’s trace context (see also open-tracing-ledger-api-client) for the purposes of propagation rather than its own. This allows downstream applications to decide for themselves how they want to connect to the original submission’s trace (as a child span or as a new trace connected through span links).
select "offset",
       (trace_context).trace_parent,
       (trace_context).trace_state
from transactions limit 1;
offset       |                      trace_parent                       |   trace_state
--------------------+---------------------------------------------------------+-----------------
0000000000000000bb | 00-f35923baa38cc520a1fc3aec6771380b-b4cf363cbf5efa6a-01 | foo=bar,baz=qux
Span #85
    Trace ID       : f35923baa38cc520a1fc3aec6771380b
    Parent ID      : d3300bedd4c64511
    ID             : b4cf363cbf5efa6a
    Name           : MessageDispatcher.handle
    Kind           : Internal
    Start time     : 2024-11-05 04:01:40.808 +0000 UTC
    End time       : 2024-11-05 04:01:40.822694083 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> canton.class: Str(com.digitalasset.canton.participant.protocol.EnterpriseMessageDispatcher)
↑↑↑ span context propagated through transaction/tree stream in Ledger API

↓↓↓ following parent's links chain leads us to the root span of original submission
Span #19
    Trace ID       : f35923baa38cc520a1fc3aec6771380b
    Parent ID      :
    ID             : de3aed62b5fb43ce
    Name           : com.daml.ledger.api.v1.CommandService/SubmitAndWaitForTransaction
    Kind           : Server
    Start time     : 2024-11-05 04:01:40.569 +0000 UTC
    End time       : 2024-11-05 04:01:40.866904459 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> rpc.method: Str(SubmitAndWaitForTransaction)
     -> daml.submitter: Str()
     -> rpc.service: Str(com.daml.ledger.api.v1.CommandService)
     -> net.peer.port: Int(38640)
     -> net.transport: Str(ip_tcp)
     -> daml.workflow_id: Str()
     -> daml.command_id: Str(3498760027)
     -> rpc.system: Str(grpc)
     -> net.peer.ip: Str(172.18.0.15)
     -> daml.application_id: Str(appid)
     -> rpc.grpc.status_code: Int(0)
Accessing data stored in PQS’ transactions.trace_context column allows any application to re-create the propagated trace context4 and use it with their runtime’s instrumentation library.

Diagnostics

PQS is capable of exporting diagnostic telemetry snapshots. This data export archive contains essential troubleshooting information such as:
  • application thread dumps (over a period of time)
  • application metrics (over a period of time)
Getting this archive is as easy as accessing the socket with netcat tool:
$ nc localhost 9091 > health-dump.zip
$ unzip health-dump.zip
Archive:  health-dump.zip
  inflating: metrics.openmetrics
  inflating: threads-20250307-105606.zip
The table below lists the available configuration sources with priority decreasing from left to right:
System propertyEnvironment variableDefault valueDescription
da.diagnostics.enabledDA_DIAGNOSTICS_ENABLEDtrueEnables/disables diagnostics data collection and exposition
da.diagnostics.hostDA_DIAGNOSTICS_HOST127.0.0.1Hostname or IP address to use for binding the exposition socket
da.diagnostics.portDA_DIAGNOSTICS_PORT0Port to use for binding the exposition socket (0 = random port)
da.diagnostics.dump.pathDA_DIAGNOSTICS_DUMP_PATH<empty>Directory to write to on graceful shutdown (path needs to be an existing writable directory)
da.diagnostics.metrics.intervalDA_DIAGNOSTICS_METRICS_INTERVALPT10SMetrics collection interval in ISO 8601 format
da.diagnostics.metrics.buffer.sizeDA_DIAGNOSTICS_METRICS_BUFFER_SIZE60Quantity of samples to store for each monitored metric (rolling window)
da.diagnostics.metrics.tagsDA_DIAGNOSTICS_METRICS_TAGS<empty>Comma-separated list of additional labels to enrich each metric with during exposition (for example, job=myapp,env=staging,deployed=20250101)
da.diagnostics.threads.intervalDA_DIAGNOSTICS_THREADS_INTERVALPT1MThread dumps collection interval in ISO 8601 format
da.diagnostics.threads.buffer.sizeDA_DIAGNOSTICS_THREADS_BUFFER_SIZE10Quantity of thread dumps to store (rolling window)

Recover

PQS is designed to operate as a long-running process which uses these principles to enhance availability:
  • Redundancy involves running multiple instances of PQS in parallel to ensure that the system remains available even if one instance fails.
  • Retry involves healing from transient and recoverable failures without shutting down the process or requiring operator intervention.
  • Recovery entails reconciling the current state of the ledger with already exported data in the datastore after a cold start, and continuing from the latest checkpoint.

High availability

Multiple isolated instances of PQS can be instantiated without any cross-dependency. This allows for an active-active high availability clustering model. Please note that different instances might not be at the same offset due to different processing rates and general network non-determinism. PQS’ SQL API provides capabilities to deal with this ‘eventual consistency’ model, to ensure that readers have at least ‘repeatable read’ consistency. See validate_offset_exists() in pqs-references-offset-management for more details.

Retries

PQS’ pipeline command is a unidirectional streaming process that heavily relies on the availability of its source and target dependencies. When PQS encounters an error, it attempts to recover by restarting its internal engine, if the error is designated as recoverable:
  • gRPC1 (white-listed; retries if):
    • CANCELLED
    • DEADLINE_EXCEEDED
    • NOT_FOUND
    • PERMISSION_DENIED
    • RESOURCE_EXHAUSTED
    • FAILED_PRECONDITION
    • ABORTED
    • INTERNAL
    • UNAVAILABLE
    • DATA_LOSS
    • UNAUTHENTICATED
  • JDBC2 (black-listed; retries unless):
    • INVALID_PARAMETER_TYPE
    • PROTOCOL_VIOLATION
    • NOT_IMPLEMENTED
    • INVALID_PARAMETER_VALUE
    • SYNTAX_ERROR
    • UNDEFINED_COLUMN
    • UNDEFINED_OBJECT
    • UNDEFINED_TABLE
    • UNDEFINED_FUNCTION
    • NUMERIC_CONSTANT_OUT_OF_RANGE
    • NUMERIC_VALUE_OUT_OF_RANGE
    • DATA_TYPE_MISMATCH
    • INVALID_NAME
    • CANNOT_COERCE
    • UNEXPECTED_ERROR

Configuration

The following pqs-references-configuration-options are available to control the retry behavior of PQS:
--retry-backoff-base string      Base time (ISO 8601) for backoff retry strategy (default: PT1S)
--retry-backoff-cap string       Max duration (ISO 8601) between attempts (default: PT1M)
--retry-backoff-factor double    Factor for backoff retry strategy (default: 2.0)
--retry-counter-attempts int     Max attempts before giving up (optional)
--retry-counter-reset string     Reset retry counters after period (ISO 8601) of stability (default: PT10M)
--retry-counter-duration string  Time limit (ISO 8601) before giving up (optional)
Configuring --retry-backoff-* settings control periodicity of retries and the maximum duration between attempts. Configuring --retry-counter-attempts and --retry-counter-duration controls the maximum instability tolerance before shutting down. Configuring --retry-counter-reset controls the period of stability after which the retry counters are reset across the board.

Logging

While PQS recovers, the following log messages are emitted to indicate the progress of the recovery:
12:52:26.753 I [zio-fiber-257] com.digitalasset.scribe.appversion.package:14 scribe, version: UNSPECIFIED  application=scribe
12:52:16.725 I [zio-fiber-0] com.digitalasset.scribe.pipeline.Retry.retryRecoverable:48 Recoverable GRPC exception. Attempt 1, unstable for 0 seconds. Remaining attempts: 42. Remaining time: 10 minutes. Exception in thread "zio-fiber-" java.lang.Throwable: Recoverable GRPC exception.
    Suppressed: io.grpc.StatusException: UNAVAILABLE: io exception
        Suppressed: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/[0:0:0:0:0:0:0:1]:6865
            Suppressed: java.net.ConnectException: Connection refused application=scribe
12:52:29.007 I [zio-fiber-0] com.digitalasset.scribe.pipeline.Retry.retryRecoverable:48 Recoverable GRPC exception. Attempt 2, unstable for 12 seconds. Remaining attempts: 41. Remaining time: 9 minutes 47 seconds. Exception in thread "zio-fiber-" java.lang.Throwable: Recoverable GRPC exception.
    Suppressed: io.grpc.StatusException: UNAVAILABLE: io exception
        Suppressed: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/[0:0:0:0:0:0:0:1]:6865
            Suppressed: java.net.ConnectException: Connection refused application=scribe
12:52:51.237 I [zio-fiber-0] com.digitalasset.scribe.pipeline.Retry.retryRecoverable:48 Recoverable GRPC exception. Attempt 3, unstable for 34 seconds. Remaining attempts: 40. Remaining time: 9 minutes 25 seconds. Exception in thread "zio-fiber-" java.lang.Throwable: Recoverable GRPC exception.
    Suppressed: io.grpc.StatusException: UNAVAILABLE: io exception
        Suppressed: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/[0:0:0:0:0:0:0:1]:6865
            Suppressed: java.net.ConnectException: Connection refused application=scribe
12:53:33.473 I [zio-fiber-0] com.digitalasset.scribe.pipeline.Retry.retryRecoverable:48 Recoverable GRPC exception. Attempt 4, unstable for 1 minute 16 seconds. Remaining attempts: 39. Remaining time: 8 minutes 43 seconds. Exception in thread "zio-fiber-" java.lang.Throwable: Recoverable GRPC exception.
    Suppressed: io.grpc.StatusException: UNAVAILABLE: io exception
        Suppressed: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/[0:0:0:0:0:0:0:1]:6865
            Suppressed: java.net.ConnectException: Connection refused application=scribe
12:54:36.328 I [zio-fiber-0] com.digitalasset.scribe.pipeline.Retry.retryRecoverable:48 Recoverable JDBC exception. Attempt 5, unstable for 2 minutes 19 seconds. Remaining attempts: 38. Remaining time: 7 minutes 40 seconds. Exception in thread "zio-fiber-" java.lang.Throwable: Recoverable JDBC exception.
    Suppressed: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
        Suppressed: java.net.ConnectException: Connection refused application=scribe

Metrics

The following metrics are available to monitor stability of PQS’ dependencies. See pqs-application-metrics for more details on general observability:
### TYPE app_restarts_total counter
### HELP app_restarts_total Number of total app restarts due to recoverable errors
app_restarts_total{,exception="Recoverable GRPC exception."} 5.0

### TYPE grpc_up gauge
### HELP grpc_up Grpc channel is up
grpc_up{} 1.0

### TYPE jdbc_conn_pool_up gauge
### HELP jdbc_conn_pool_up JDBC connection pool is up
jdbc_conn_pool_up{} 1.0

Retry counters reset

If PQS encounters network unavailability it starts incrementing retry counters with each attempt. These counters are reset only after a period of stability, as defined by --retry-counter-reset. As such, during the prolonged periods of intermittent failures that alternate with brief periods of operating normally, PQS keeps maintaining a cautious stance on assumptions regarding the stability of the overall system. This can be illustrated with an example below: As an example, for the setting --retry-counter-reset PT5M the following timeline illustrates how the retry works:
time -->       1:00            5:00               10:00
                v               v                   v
operation:  ====xx=x====x=======x========================
                ^               ^                   ^
                A               B                   C

x - a failure causing retry happens
= - operating normally
In the timeline above, intermittent failures start at point A, and each retry attempt contributes to the increase of the overall backoff schedule. Consequently, each subsequent retry allows more time for the system to recover. This schedule does not reset to its initial values until after the configured period of stability is reached following the last failure (point B), such as after operating without any failures for 5 minutes (point C).

Exit codes

PQS terminates with the following exit codes:
  • 0: Normal termination
  • 1: Termination due to unrecoverable error or all retry attempts for recoverable errors have been exhausted

Ledger streaming & recovery

On (re-)start, PQS determines last saved checkpoint and continues incremental processing from that point onward. PQS is able to start and finish at prescribed ledger offsets, specified via args. In many scenarios --pipeline-ledger-start Oldest --pipeline-ledger-stop Never is the most appropriate configuration, for both initial population of all available history, and also catering for resumption/recovery processing. Start offset meanings:
ValueMeaning
GenesisCommence from the first offset of the ledger, failing if not available.
OldestResume processing, or start from the oldest available offset of the ledger (if the datastore is empty).
LatestResume processing, or start from the latest available offset of the ledger (if the datastore is empty).
<offset>Offset from which to start processing, terminating if it does not match the state of the datastore.
Stop offset meanings:
ValueMeaning
LatestProcess until reaching the latest available offset of the ledger, then terminate.
NeverKeep processing and never terminate.
<offset>Process until reaching this offset, then terminate.
If the ledger has been pruned beyond the offset specified in --pipeline-ledger-start, PQS fails to start. For more details see pqs-history-slicing.

Secure

PQS application is a client to backend services (ledger and database) as such it needs to respect security settings mandated by those services - TLS and authentication:

TLS

Your server-side components (Canton and PostgreSQL) may require TLS to be used. Please refer to their documentation for instructions: Once configured, use appropriate values for dedicated parameters:
$ ./scribe.jar pipeline ledger postgres-document \
     --source-ledger-tls-cert /path/to/ledger.crt \
     --source-ledger-tls-key /path/to/ledger.pem \
     --source-ledger-tls-cafile /path/to/ledger.crt \
     --target-postgres-tls-cert /path/to/postgres.crt \
     --target-postgres-tls-key /path/to/postgres.der \
     --target-postgres-tls-cafile /path/to/postgres.crt \
     --target-postgres-tls-mode VerifyFull

Ledger authentication

To run PQS with authentication you need to turn it on via --source-ledger-auth OAuth. PQS uses OAuth 2.0 Client Credentials flow1.
$ ./scribe.jar pipeline ledger postgres-document \
     --source-ledger-auth OAuth \
     --pipeline-oauth-clientid my_client_id \
     --pipeline-oauth-clientsecret deadbeef \
     --pipeline-oauth-cafile ca.crt \
     --pipeline-oauth-endpoint https://my-auth-server/token
If your issuer is OIDC compliant, you can specify the issuer instead of the token URL.
$ ./scribe.jar pipeline ledger postgres-document \
     --source-ledger-auth OAuth \
     --pipeline-oauth-clientid my_client_id \
     --pipeline-oauth-clientsecret deadbeef \
     --pipeline-oauth-cafile ca.crt \
     --pipeline-oauth-issuer https://my-auth-server
PQS uses the supplied client credentials (clientid and clientsecret) to access the token endpoint (endpoint) of the OAuth service of your choice. Optional cafile parameter is a path to the Certification Authority certificate used to access the token endpoint. If cafile is not set, the Java TrustStore is used. Please make sure you have configured your Participant Node to use authorization (see ledger-api-jwt-configuration) and an authorization server to accept your client credentials for grant_type=client_credentials and scope=daml_ledger_api.

Audience-based token

For Audience-Based Tokens use the --pipeline-oauth-parameters-audience parameter:
$ ./scribe.jar pipeline ledger postgres-document \
    --source-ledger-auth OAuth \
    --pipeline-oauth-clientid my_client_id \
    --pipeline-oauth-clientsecret deadbeef \
    --pipeline-oauth-cafile ca.crt \
    --pipeline-oauth-endpoint https://my-auth-server/token \
    --pipeline-oauth-scope None \
    --pipeline-oauth-parameters-audience https://daml.com/jwt/aud/participant/my_participant_id

Scope-based token

For Scope-Based Tokens use the --pipeline-oauth-scope parameter:
$ ./scribe.jar pipeline ledger postgres-document \
     --source-ledger-auth OAuth \
     --pipeline-oauth-clientid my_client_id \
     --pipeline-oauth-clientsecret deadbeef \
     --pipeline-oauth-cafile ca.crt \
     --pipeline-oauth-endpoint https://my-auth-server/token \
     --pipeline-oauth-scope myScope \
     --pipeline-oauth-parameters-audience https://daml.com/jwt/aud/participant/my_participant_id
The default value of the --pipeline-oauth-scope parameter is daml_ledger_api. Ledger API requires daml_ledger_api in the list of scopes unless custom target scope is configured.

Custom Daml claims tokens

PQS authenticates as a user defined through the User Identity Management feature of Canton. Consequently, Custom Daml Claims Access Tokens are not supported. An audience-based or scope-based token must be used instead.

Static access token

Alternatively, you can configure PQS to use a static access token (meaning it is not refreshed) using the --pipeline-oauth-accesstoken parameter:
$ ./scribe.jar pipeline ledger postgres-document \
     --source-ledger-auth OAuth \
     --pipeline-oauth-accesstoken my_access_token

Ledger API users and Daml parties

PQS connects to a Participant Node (via Ledger API) as a user defined through the User Identity Management feature of Canton. PQS gets its user identity by providing an OAuth token of that user. After authenticating, the Participant Node has the authorization information to know what Daml Party data the user is allowed to access. By default, PQS will subscribe to data for all parties available to PQS’ authenticated user. However, this scope can be limited via the --pipeline-filter-parties filter parameter (see pqs-party-filtering). It is important to keep in mind that a PQS database instance might contain data for multiple Daml parties. To that extent, it is of paramount significance to ensure that queries are always scoped to the relevant parties, to avoid data leaks:
-- partyA needs to be signatory on the contract
select * from active() where signatories @> '{partyA}';

-- partyB needs to be observer on the contract
select * from active() where observers @> '{partyB}';

-- partyC can be either signatory or observer on the contract
select a.* from active() a where a.stakeholders @> '{partyC}';
-- `stakeholders` is really a function which can be called like a "virtual" column
select a.* from active() a where stakeholders(a) @> '{partyC}';
select a.* from active() a where stakeholders(a.*) @> '{partyC}';

-- partyD needs to be witness on the divulgence
select * from creates() where divulged_only and witnesses @> '{partyD}';

Token expiry

JWT tokens2 have an expiration time. PQS has a mechanism to automatically request a new access token from the Auth Server, before the old access token expires. To set when PQS should try to request a new access token, use --pipeline-oauth-preemptexpiry (default “PT1M” - one minute), meaning: request a new access token one minute before the current access token expires. This new access token is used for any future Ledger API calls. However, for streaming calls such as GetUpdates the access token is part of the request that initiates the streaming. Canton versions prior to 2.9 terminate the stream with error PERMISSION_DENIED as soon as the old access token expires to prevent streaming forever based on the old access token. Versions 2.9+ fail with code ABORTED and description ACCESS_TOKEN_EXPIRED and PQS streams from the offset of the last successfully processed transaction.

Forward proxy

If PQS runs in a network that requires a forward proxy to reach external OAuth endpoints (for example Azure AD or Okta), configure the proxy using CLI flags:
$ ./scribe.jar pipeline ledger postgres-document \
     --source-ledger-auth OAuth \
     --pipeline-oauth-clientid my_client_id \
     --pipeline-oauth-clientsecret deadbeef \
     --pipeline-oauth-endpoint https://login.microsoftonline.com/tenant/oauth2/v2.0/token \
     --pipeline-oauth-proxy-url http://proxy.corp.example.com:8080 \
     --pipeline-oauth-proxy-user proxyuser \
     --pipeline-oauth-proxy-password proxypass
Or via environment variables:
SCRIBE_PIPELINE_OAUTH_PROXY_URL=http://proxy.corp.example.com:8080
SCRIBE_PIPELINE_OAUTH_PROXY_USER=proxyuser        # optional
SCRIBE_PIPELINE_OAUTH_PROXY_PASSWORD=proxypass     # optional
The proxy is used for both OIDC discovery and token acquisition requests. It does not affect the gRPC connection to the ledger API or the PostgreSQL connection. If your infrastructure uses HTTPS_PROXY environment variables, set SCRIBE_PIPELINE_OAUTH_PROXY_URL to the same value.
If PQS runs inside an Istio service mesh, the Istio sidecar intercepts outbound TCP connections. For proxy connectivity to work, either create an Istio ServiceEntry for the proxy host, or exclude the proxy port from sidecar interception using the pod annotation traffic.sidecar.istio.io/excludeOutboundPorts.

PostgreSQL authentication

To authenticate to PostgreSQL, use dedicated parameters when launching the pipeline:
$ ./scribe.jar pipeline ledger postgres-document \
     --target-postgres-password "${YOUR_DB_PASSWORD}" \
     --target-postgres-username "${YOUR_DB_USER}"

Hardening recommendations

Use TLS: Always use TLS to encrypt data being transmitted to/from the Participant Node and the PostgreSQL datastore. This is especially important when dealing with sensitive information. Ensure that only secure TLS versions are used (for example TLS 1.2+) and that strong cipher suites are configured. Client authentication should be used to ensure that only trusted clients can connect to the Participant Node and PostgreSQL datastore, such that network level security is not overly relied upon. Logging: Ensure that logging is configured (see pqs-logging) to avoid logging sensitive information. This includes transaction details and metadata (eg. size) that is revealed in TRACE and DEBUG levels. These log levels should be used with caution, and only in a controlled environment. In production, we recommend using INFO or WARN levels. Ledger Authorization: Follow the principle of least privilege when granting access to the Participant Node Ledger API User that PQS uses:
  • Only canReadAs authorization to only Party’s that it requires; OR
  • Only readAsAnyParty authorization if PQS is used as a participant-wide service and needs access to all Party data.
  • No canActAs authorization (to submit commands). PQS has no capability to submit commands to the Canton Ledger API
  • No admin access to the Canton Ledger API.
Database Access: The datastore contains all ledger information obtained from the Participant Node. Ensure that the database users are tightly controlled, and set to the minimum required privileges:
  • Operational user: SQL Insert/Update/Delete/Copy - so PQS can maintain the datastore contents. No DDL rights should be in place - PQS does not need to change the database schema.
  • Others user: No write access should be granted to any other user. Excessive reading my other clients (leading to overload) should be avoided, to ensure that PQS has sufficient resources to operate.
  • Admin user: PQS will need to be able to apply schema changes to the database when deploying a new version containing database changes. This should be a separate user with the minimum required privileges to perform these operations. Also, redaction operations performed by an administrator will require Select/Update rights to the database.
Network Security: Ensure that the network security is configured to restrict access to only essential connections:
  • Using firewalls to restrict access to PostgreSQL database, Participant Node and auth server.
  • Using firewalls to restrict access to PQS. Even though it does not listen for any client connections, there are listing TCP ports for health and diagnostic purposes. By default, health and diagnostic ports are only accessible from localhost. Before changing this configuration, ensure that all hosts granted network level access are necessary & trusted: to mitigate the risk of exploit or exposing sensitive information.
Runtime Environment: Ensure that the runtime environment is secure:
  • Keeping the operating system and all software up to date with security patches.
  • Using a firewall to restrict access to the PQS process (and associated PostgreSQL datastore).
  • Monitoring logs for any suspicious activity, as well as errors and warnings.
  • Validate that the runtime enforces least privilege principles, and contains only intended tools. We recommend using a minimal Java runtime environment (JRE) to reduce the attack surface:
    • Use a minimal JRE (not JDK), for example Amazon Corretto or Azul Zulu, to reduce the attack surface.
    • Consider allowing only the jdk.attach3 module, in your chosen JRE. This enables the running process to produce more accurate stack traces when diagnostics are extracted.
    • Use a security manager to restrict the infrastructure permissions of the PQS process, if possible.
  • Use a containerized environment (for example Docker) to isolate the PQS process from the host system and other processes.
Observability: Ensure that the environment is monitored for logs, metrics and alerts of events of interest (see pqs-observe).
  • PQS log is monitored for errors and warnings, to ensure these do not go unnoticed.
  • PQS runtime is monitored for disk, memory and other JRE concerns such as heap, garbage collection cycle rates.
  • PQS health endpoint is polled regularly to verify availability.
  • PostgreSQL and Participant Node servers are similarly monitored.
Database Backups: Ensure that the database backups are encrypted and stored securely. This is especially important when dealing with sensitive information. The database should be backed up regularly, and the backup process should be tested to ensure that it works as expected. Data Retention: Ensure that the data retention policy is in place and enforced. This includes regular purging of old data (eg. PQS pqs-pruning), and ensuring that sensitive information is redacted or deleted as required by your organization’s policies and regulations. Software Updates: Ensure that the PQS software is kept up to date with the latest security patches and updates. This includes both the PQS software itself and any dependencies it may have.

Footnotes

  1. https://en.wikipedia.org/wiki/ISO_8601#Durations 2 3 4
  2. https://zio.dev/zio-logging/formatting-log-records/#log-format-configuration 2 3
  3. https://opentelemetry.io/docs/specs/otel/overview/#links-between-spans 2
  4. https://www.w3.org/TR/trace-context/