Details
-
Type:
Task
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Fix Version/s: None
-
Component/s: Storage Engine - Connect
-
Labels:None
Description
As far as I understand, there is no way to tell CONNECT that a special character in a file represents NULL.
Please, take a look at this example. It's an open dataset from Italian government, and the same format is used for several other datasets:
http://www.dati.salute.gov.it/imgs/C_17_dataset_7_download_itemDownload0_upFile.CSV
The column DATAFINEVALIDITA is a DDMMYYYY date, and '-' represents a NULL value.
If I may suggest, a new column option would be a great solution:
col_name DATE null_value = '-',
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
This is a general issue I have been already thinking about. What CONNECT currently does,
when a column is nullable, is to regard character columns as NULL when represented by an empty string
and to regard numeric columns as NULL when their value is 0. The case of DATE column is unclear.
Currently they are treated as numeric (dates are internally represented like in C by the number of
second since 01-01-1970) Unfortunately, even invalid dates are automatically set to 01-01-1970,
their value is not 0 because added or substracted a time shift corresponding to their GMT setting.
This why they are not set to NULL in this case.
I am not yet ready to implement a general solution. Should the NULL_CHAR option be a column or
table option, the NULL representation be a single character or a string or a numeric value?
These questions are still open.
Meanwhile, to take care of your problem, I have updated the DATE data type for invalid dates
to be regarded as NULL whatever GMT setting is applied. Here is the CREATE TABLE I used:
Note the FIELD_FORMAT='D,' to take care of the comma used as decimal separator and
the QUOTED=0 option needed because two lines in the file have their INDIRIZZO column quoted.
The query:
now returns: