Helps users retrieve pre-built datasets from specific datamarts, with the capacity to supply values to some default filters, such as dates and diseases. In addition to retrieving data, user can perform preliminary self-diagnostics to troubleshoot issues, return query instead of data, and even supply filters outside the predefined default filters.
This function retains the legacy form, which can be invoked by providing
arguments with phrdw_
prefix, such as phrdw_datamart_connection
and
phrdw_datamart
.
The modern form, on the other hand, is invoked by a simple mart
and/or
type
, which streamlines the process, allows additional features, and
avoids redundant inputs such as phrdw_datamart
in both
connect_to_phrdw()
and get_phrdw_data()
.
Usage
get_phrdw_data(
phrdw_datamart_connection = NULL,
phrdw_datamart = NULL,
dataset_name = NULL,
query_start_date = NULL,
query_end_date = NULL,
include_patient_identifiers = F,
include_indigenous_identifiers = F,
retrieve_system_ids = "Yes",
disease = NULL,
surveillance_condition = NULL,
classification = NULL,
surveillance_region_ha = NULL,
infection_group = NULL,
ordering_provider_ha = NULL,
lis_status = NULL,
episode_status = NULL,
test_type = NULL,
episode_testing_pattern = NULL,
testing_region_ha = NULL,
case_status = NULL,
case_source = NULL,
ucd_3_char_code = NULL,
ccd_3_char_code = NULL,
residential_location_ha = NULL,
death_location_ha = NULL,
mart = NULL,
type = c("prod", "su", "sa")[1],
.head = NULL,
.check_params = F,
.return_query = F,
.return_data = !(.return_query || isTRUE(.check_params) || is.character(.check_params)),
.clean_data = F,
.query_df = NULL,
.query_str = NULL,
.cte = F,
...
)
Arguments
- phrdw_datamart_connection
-
Legacy function design: supply a connection object created by
connect_to_phrdw()
. Recommend usingmart
andtype
instead for flexibility (seeconnect_to_phrdw()
). The function takes care of connecting to the appropriate PHRDW database and disconnect after performing the requested data filtering and retrieving. - phrdw_datamart
-
Legacy mart designations provided by previous package authors. This backward-compatibility is meant to minimize changes on the user end. The
stable
approach is to referencemart
andtype
. - dataset_name
The name of the pre-built dataset to retrieve.
- query_start_date
Start date of your dataset. Can accept character date. Defaults to
NULL
, which indicates no lower bound.- query_end_date
End date of your dataset. Can accept character date. Defaults to
NULL
, which indicates no upper bound.- include_patient_identifiers
Whether to include patient identifier information. Accepts Boolean values. Defaults to
FALSE
. Note: user needs to have access, otherwise data restriction may return unintended results.- include_indigenous_identifiers
Whether to include indigenous identifier information. Accepts Boolean values. Defaults to
FALSE
. Note: user needs to have access, otherwise data restriction may return unintended results.- retrieve_system_ids
Whether to include systems IDs in the dataset. Currently only applies to
Enteric
datamart. Defaults to legacy value "Yes", but can accept Boolean, and no longer case-specific.- disease
Optional. Character vector of diseases. Only applicable to some datasets.
- surveillance_condition
Optional. Character vector of surveillance conditions. Only for applicable datasets.
- classification
Optional. Character vector of classifications. Only for applicable datasets.
- surveillance_region_ha
Optional. Character vector of Health Region Authorities where the Patient lives/lived. Only for applicable datasets.
- infection_group
Optional. Character vector of infection groups. Only for applicable datasets.
- ordering_provider_ha
Optional. Character vector of Health Region Authorities where the Ordering Provide resides. Only for applicable datasets.
- lis_status
Optional. Character vector of case level statuses about the LIS data in a Case. Only for applicable datasets.
- episode_status
Optional. Character vector of episode statuses from the LIS result processing and rule engine. Only for applicable datasets.
- test_type
Optional. Character vector of the types of tests. Only for applicable datasets.
- episode_testing_pattern
Optional. Character vector of testing patterns. Only for applicable datasets.
- testing_region_ha
Optional. Character vector of testing Health Region Authorities. Only for applicable datasets.
- case_status
Optional. Character vector of case status. Only for applicable datasets.
- case_source
Optional. Character vector of case status. Only for applicable datasets.
- ucd_3_char_code
Optional. Character vector of UCD 3-character codes. Only for applicable CDI datasets.
- ccd_3_char_code
Optional. Character vector of CCD 3-character codes. Only for applicable CDI datasets.
- residential_location_ha
Optional. Character vector of BC Health Authorities associated with decedant's usual residence. Only for applicable CDI datasets.
- death_location_ha
Optional. Character vector of BC Health Authorities associated with decedant's location of death. Only for applicable CDI datasets.
- mart
-
Provide an appropriate mart name (non-case specific). Must be one of "CDI", "CD", "Respiratory", "Enteric", "STIBBI", and "VPD". Non case-sensitive.
- type
-
Provide an appropriate mart type (non-case specific). Must be one of "prod" (default), "su", or "sa". Non case-sensitive. See
Details
. - .head
-
Optional. Single integer vector to indicate how many rows from the top to return. Note:
tail
is not supported on database backends. - .check_params
-
Can accept Boolean or character values.
Boolean
TRUE
will return general info of the dataset: For MDX queries: dimensions, hierarchies, and levels if possible, and hierarchies for default filters. For SQL queries: column names as they are in the source tables, and what they are renamed to, and columns for default filters.On the other hand, user can supply hierarchy or column names retrieved from the above as character vector, and a list of the cardinal levels will return. Useful to check for typos, or available names to filter for.
- .return_query
Boolean value. Whether to return query or not.
- .return_data
Boolean value. Whether to return data or not.
- .clean_data
Boolean value. Whether to attempt cleaning the dates in data or not.
- .query_df
-
Accepts a named list of a single element, where names can either be
sql
orolap
, and the element is adata.frame
object similar in structure tophrdwRdata:::list_query_info$olap
. User is responsible for syntax validity and compatibility (ie. OLAP or SQL, appropriate access, etc) of the query. - .query_str
-
Accepts a named list of a single element, where names can either be
sql
orolap
, and the element is acharacter
vector of query. User is responsible for syntax validity and compatibility (ie. OLAP or SQL, appropriate access, etc) of the query. - .cte
-
Experimental support for common table expressions (CTEs). Defaults to
FALSE
. This is the equivalent ofpipe
, which in essence allows writing subqueries in the order in which they are evaluated. Supported indplyr::show_query()
,dplyr::compute()
, anddplyr::collect()
. This is one of the ways to optimize SQL execution: using CTEsWITH
instead of subqueries. - ...
User can supply named vector: names being the column or hierarchy name, and elements of the vector being the value to filter for. See
Details
.
Details
List of values to supply
phrdw_datamart
. Case sensitive.
CDI
: Chronic Disease & Injury; links data from Vital Statistics death records and census-based socio-economic data.CD Mart
: Communicable Diseases; contains communicable disease public health investigation data from the Panorama public health system.Enteric
: Enteric; links data from the Panorama public health system and the Sunquest laboratory information system at PHSA.Respiratory
: Respiratory diseases; includes data from the Sunquest laboratory information system at PHSA.STIBBI
: Sexually Transmitted Blood Borne Infections; links data from the Panorama public health system, the Sunquest laboratory information system at PHSA, STIIS, HAISYS, and legacy laboratory systems.VPD
: Vaccine Preventable Disease; links data from the Panorama public health system and the Sunquest laboratory information system at PHSA.TAT
: TBD.Enteric SU
: UAT server of Enteric.STIBBI SU
: UAT server of STIBBI.STIBBI SA
: PROD copy/Staging server of STIBBI.VPD SU
: UAT server of VPD.
Using
mart
and type
is preferred. They are not case-sensitive and
more readable.
For now, PHRDW data architecture is either data warehouse/relational
table or data cubes, depending on which mart. Connection to data
warehouse returns an odbc::dbConnect()
connection object, whereas
connection to data cube returns an OLAP_Conn
object, which is
just a character string under the hood that will be executed by
a back-end C routine.
This means that connection to data warehouse allows for memory-efficient
tools like dbplyr
where data is read lazily rather than loaded
into memory.
Users can supply their own connection string using .conn_str
. To
distinguish between the different architectures, this parameter needs to
be named list, as either sql
or cube
. See Examples
.
Examples
if (FALSE) { # \dontrun{
# The bare minimum required user inputs are `mart` and `dataset_name`.
# User input is simpler and not case-sensitive.
get_phrdw_data(
mart = 'cd',
dataset_name = 'investigation'
)
# Unlike the legacy approach, which would require more user input.
# User need to be wary of spelling and case.
# User also need to provide dates, even if the intent is to retrieve
# the entire dataset. Without date boundaries, unexpected output may return.
# Dates would need to go beyond what exist in the database, but user would not
# have known this info.
get_phrdw_data(
phrdw_datamart_connection = connect_to_phrdw(phrdw_datamart = 'CD Mart'),
phrdw_datamart = 'CD Mart',
dataset_name = 'Investigation',
query_start_date = '1900-01-01',
query_end_date = Sys.Date()
)
# Incorrect spelling will generate helpful message to assist user.
get_phrdw_data(
mart = 'dc',
dataset_name = 'investigation'
)
get_phrdw_data(
mart = 'stibbi',
dataset_name = 'investigations'
)
# Filter
## Some arguments are reserved for filters. These remain unchanged from before.
## They are case- and spelling-sensitive.
get_phrdw_data(
mart = 'cd',
dataset_name = 'investigation',
disease = 'Anthrax'
)
get_phrdw_data(
phrdw_datamart_connection = connect_to_phrdw(phrdw_datamart = 'CD Mart'),
phrdw_datamart = 'CD Mart',
dataset_name = 'Investigation',
query_start_date = '1900-01-01',
query_end_date = Sys.Date(),
disease = 'Anthrax'
)
get_phrdw_data(
mart = 'stibbi',
dataset_name = 'investigation',
disease = 'Chlamydia'
)
get_phrdw_data(
phrdw_datamart_connection = connect_to_phrdw(phrdw_datamart = 'STIBBI'),
phrdw_datamart = 'STIBBI',
dataset_name = 'Investigation',
query_start_date = '1900-01-01',
query_end_date = Sys.Date(),
disease = 'Chlamydia'
)
# Troubleshooting
## User can opt to return query instead of data.
get_phrdw_data(
mart = 'cd',
dataset_name = 'investigation',
.return_query = T
)
# User can use `.check_params` to examine basic dataset metainfo such as
# column names and what's being used to filter by default.
# Note: the output format may look different depending on data source being
# RDBMS or OLAP (ie. relational DB or data cube).
get_phrdw_data(
mart = 'cd',
dataset_name = 'investigation',
.check_params = T
)
get_phrdw_data(
mart = 'stibbi',
dataset_name = 'investigation',
.check_params = T
)
# When `.check_params` is supplied with the name of the column (or
# hierarchy), it will return (nominal) levels.
# Use with caution: it can return dates and street addresses and that would be
# a long list.
get_phrdw_data(
mart = 'cd',
dataset_name = 'investigation',
.check_params = 'source_system'
)
get_phrdw_data(
mart = 'stibbi',
dataset_name = 'investigation',
.check_params = 'Source System'
)
# Customize filtering (only available in stable usage)
## Knowing columns and levels can be helpful, as these values can be passed
## into the function for customized filtering:
## 1. Enclose the column name with backticks.
## 2. Pass the value to filter.
get_phrdw_data(
mart = 'cd',
dataset_name = 'investigation',
`source_system` = 'EMR'
)
get_phrdw_data(
mart = 'stibbi',
dataset_name = 'investigation',
`Source System` = 'EMR'
)
} # }