Sequence generation for Parallel Processing [message #662782] |
Fri, 12 May 2017 03:07 |
lokimisc
Messages: 101 Registered: February 2008
|
Senior Member |
|
|
Hi,
We have etl process where our system will read txt files, transform the data and generates output files which are in text format.
Entire process is automated which is triggered by shell script ( creates separate sessions every time when called) through schedule job.
Format of output file is abc_mmddyy_seq where seq is for a particular day it can be from 000 to 999 and next day it will start from 001.
Eg: abc_010117_001, abc_010117_002; abc_020117_001
We have table called "sequence_data" which will just hold the latest sequence for that day using which we are getting next sequence (autonomous procedure)
Everything was working perfectly fine but recently while processing multiple files i.e calling simultaneously more than 10 input files, 2 of the output files got same sequence number reason because 1 file had huge data and while it was taking time for transformation and was about generate output file other smaller file completed transformation process, read same sequence number and generated output file.
So could you please advise what is the best approach so that we don't get same output file.
Thanks
Lokesh
|
|
|
|
|
|
Re: Sequence generation for Parallel Processing [message #662872 is a reply to message #662866] |
Sun, 14 May 2017 08:33 |
Solomon Yakobson
Messages: 3286 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
lokimisc wrote on Sun, 14 May 2017 06:56If I use SELECT FOR UPDATE then while parallel processing if another process try to read same row and if it is locked by first session then it will throw exception which I need to avoid.
Not sure what exception are you talking about. My understanding is you have autonomous procedure that:
a) Selects for update date & number columns.
b) if date = trunc(sysdate) then update number = number + 1, return that number + 1 and commit
c) if date != trunc(sysdate) then update date = trunc(sysdate), number = 1, return 1 and commit
Therefore, if multiple sessions call procedure for sequence number they will wait (select for update will wait) since WAIT is default for FOR UPDATE. What you want is optimize that autonomous procedure to work as fast as possible to avoid waits caused by FOR UPDATE locks.
In fact, you don't need SELECT FOR UPDATE. All you need is:
UPDATE SEQUENCE_TABLE
SET SEQ_DT = TRUNC(SYSDATE),
SEQ_VAL = CASE SEQ_DT
WHEN TRUNC(SYSDATE) THEN SEQ_VAL + 1
ELSE 1
END
RETURNING SEQ_VAL
INTO V_SEQ_VAL;
SY.
|
|
|
|
|
|
Re: Sequence generation for Parallel Processing [message #663094 is a reply to message #662906] |
Mon, 22 May 2017 06:22 |
Bill B
Messages: 1971 Registered: December 2004
|
Senior Member |
|
|
personally I would use the following
create sequence my_table_seq start with 0 increment by 1;
then every night at midnight use the scheduler to run the following code
declare
l_val number;
begin
execute immediate
'select my_table_seq.nextval from dual' INTO l_val;
execute immediate
'alter sequence my_table_seq increment by -' || l_val ||
' minvalue 0';
execute immediate
'select my_table_seq.nextval from dual' INTO l_val;
execute immediate
'alter sequence my_table_seq increment by 1 minvalue 0';
end;
/
all done. Guaranteed that the sequence number at the end of your file will always be unique because your using a sequence.
I also agree with the previous comment. If you name your files
abc_20170101_001
then you can easily find a specific date range of tables by simply sorting the table names.
[Updated on: Mon, 22 May 2017 06:25] Report message to a moderator
|
|
|
Re: Sequence generation for Parallel Processing [message #663253 is a reply to message #663094] |
Sat, 27 May 2017 02:18 |
lokimisc
Messages: 101 Registered: February 2008
|
Senior Member |
|
|
Thank You all for your inputs.
Here is the procedure written to handle parallel processing sequence.
create or replace PROCEDURE prc_get_nxt_seq(po_seq OUT NUMBER)
IS
PRAGMA autonomous_transaction;
l_lock_name CONSTANT VARCHAR2(20) := 'EMP_SEQ';
l_sleep_sec CONSTANT NUMBER := 1;
l_lock_handle VARCHAR2(100);
l_lock_id NUMBER;
l_lock_release NUMBER;
l_seq NUMBER;
l_record_count NUMBER;
BEGIN
--dbms_output.put_line( sysdate||': StartSerialProcess()' );
--// create a lock handle
dbms_lock.allocate_unique( lockname => l_lock_name,
lockHandle => l_lock_handle,
expiration_secs => 0
);
--// attempt to lock in order to start process, else
--// loop and wait for the lock to become available
--// for us to use.
LOOP
--// attempt a lock
l_lock_id := dbms_lock.request( lockhandle => l_lock_handle,
timeout => 1
);
EXIT
WHEN l_lock_id != 1; --// lock was successful
--// locking failed - another process is busy, we
--// need to wait
--// retrying the lock)
dbms_lock.sleep( l_sleep_sec );
END LOOP;
--// we have made the lock and can now continue to process...
SELECT COUNT(1)
INTO l_record_count
FROM system_data
WHERE NAME ='EMP_SEQ' ;
IF l_record_count < 1
THEN
INSERT INTO SYSTEM_DATA
(
name,
value,
mod_date
)
SELECT 'EMP_SEQ', 0 ,TRUNC(SYSDATE) FROM dual;
END IF;
UPDATE system_data
SET mod_date = TRUNC(SYSDATE),
mod_user = USER ,
value = CASE TRUNC(mod_date)
WHEN TRUNC(SYSDATE) THEN value + 1
ELSE 1
END
WHERE name = 'EMP_SEQ'
RETURNING value
INTO l_seq;
commit;
po_seq := l_seq;
--dbms_lock.sleep( 5 ); --// pretend processsing for 5 secs for testing purpose;
--// when processing is completed, we need to release the
--// the lock handle
l_lock_release := dbms_lock.release( l_lock_handle );
EXCEPTION WHEN OTHERS
THEN
l_lock_release := dbms_lock.release( l_lock_handle );
END prc_get_nxt_seq;
Thanks and Regards,
Lokesh
|
|
|
Re: Sequence generation for Parallel Processing [message #663269 is a reply to message #663253] |
Sat, 27 May 2017 05:50 |
Solomon Yakobson
Messages: 3286 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
Why do you need DBMS_LOCK on top of oracle built-in locking. Here it causes nothing but performance degradation. You want that autonomous transaction to complete as fast as possible, so code must assume row with NAME ='EMP_SEQ' is there already. And TRUNC on MOD_DATE shouldn't be needed - we must assume initial row had MOD_DATE with time of 00:00:00. Also, you should add p_name parameter instead of hardcoding 'EMP_SEQ'. Then your will be able to use same code for all sequences. And why procedure and not function? Function would allow you setting sequence value right in INSERT statement.
SY.
|
|
|
|
Re: Sequence generation for Parallel Processing [message #663280 is a reply to message #663270] |
Sun, 28 May 2017 02:32 |
lokimisc
Messages: 101 Registered: February 2008
|
Senior Member |
|
|
Thanks Solomon for your response.
Randomly same sequence was used by two processes when time differences between them was very less (actually it is not reproducible) so to avoid such scenario dbms_lock is used.
Also I will incorporate your inputs.
Thanks all for helping to solve the problem.
Regards,
Lokesh
|
|
|
Re: Sequence generation for Parallel Processing [message #663281 is a reply to message #663280] |
Sun, 28 May 2017 05:45 |
Solomon Yakobson
Messages: 3286 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
What scenario DBMS_LOCK will avoid that standard locking wouldn't? If multiple session request sequence value at the same/about the same time only one will update - all others will wait (they will be queued up). So the goal is to minimize sequence value generation transaction time while DBMS_LOCK simply prolongs it.
SY.
|
|
|
Re: Sequence generation for Parallel Processing [message #663282 is a reply to message #663270] |
Sun, 28 May 2017 05:50 |
Solomon Yakobson
Messages: 3286 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
BlackSwan wrote on Sat, 27 May 2017 06:59EXCEPTION handler is bad bug waiting to bite you
http://www.orafaq.com/wiki/WHEN_OTHERS
No, this is one of the few cases when WHEN OTHERS is, pretty-much, a must. Otherwise all sessions needing sequence value will be locked out if exception is raised after DBMS_LOCK lock is placed. We have to release the lock regardless of exception. However, you are correct in a sense we shouldn't be masking exceptions so exception handler shold issue RAISE after releasing the lock.
SY.
|
|
|
Re: Sequence generation for Parallel Processing [message #663433 is a reply to message #663282] |
Fri, 02 June 2017 13:51 |
Bill B
Messages: 1971 Registered: December 2004
|
Senior Member |
|
|
If you really don't want to use an oracle sequence then do the following
create table file_locks (current_date date,the_seq number);
CREATE OR REPLACE FUNCTION Bld_file_name (P_file_prefix IN VARCHAR2)
RETURN VARCHAR2
IS
PRAGMA AUTONOMOUS_TRANSACTION;
V_the_seq NUMBER;
V_current_date DATE;
BEGIN
SELECT CURRENT_DATE, The_seq
INTO V_current_date, V_the_seq
FROM File_locks
FOR UPDATE;
-- check to see if we need to restart the sequence
IF V_current_date < TRUNC (SYSDATE)
THEN
UPDATE File_locks
SET CURRENT_DATE = TRUNC (SYSDATE), The_seq = 0;
V_the_seq := 0;
END IF;
-- increment the sequence
UPDATE File_locks
SET The_seq = The_seq + 1;
COMMIT;
RETURN P_file_prefix
|| '_'
|| TO_CHAR (SYSDATE, 'YYYYMMDD')
|| '_'
|| TO_CHAR (V_the_seq + 1, 'fm999999000');
END Bld_file_name;
Every invocation of the code will return a unique file name and the function does it's own locking of the table.
[Updated on: Fri, 02 June 2017 14:03] Report message to a moderator
|
|
|
Re: Sequence generation for Parallel Processing [message #663439 is a reply to message #663433] |
Fri, 02 June 2017 14:28 |
Solomon Yakobson
Messages: 3286 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
The goal is to make procedure as fast as possible so instead of doing select + if + update + possible second update we can do single update. But main issue is PL/SQL is procedural, not transactional therefore SYSDATE will be calculated each time it is called. So you could end up with wrong results if update was before midnight and return after midnight. You have to make it transactional and not just to itself but also to code it might be called from. Also TO_CHAR (V_the_seq + 1, '000') will produce leading space, so format should be 'fm000'. Anyway, something like:
CREATE OR REPLACE
FUNCTION Bld_file_name(
P_FILE_PREFIX IN VARCHAR2,
P_CURRENT_DATE IN DATE
)
RETURN VARCHAR2
IS
V_THE_SEQ NUMBER:
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
UPDATE File_locks
SET CURRENT_DATE = TRUNC(P_CURRENT_DATE),
THE_SEQ = CASE CURRENT_DATE
WHEN TRUNC(P_CURRENT_DATE) THEN THE_SEQ + 1
ELSE 1
END
RETURNING THE_SEQ
INTO V_THE_SEQ;
COMMIT;
RETURN P_FILE_PREFIX ||
TO_CHAR(P_CURRENT_DATE,'_YYYYMMDD_') ||
TO_CHAR(V_THE_SEQ,'FM000');
END Bld_file_name;
/
SY.
[Updated on: Fri, 02 June 2017 14:35] Report message to a moderator
|
|
|
Re: Sequence generation for Parallel Processing [message #663440 is a reply to message #663439] |
Fri, 02 June 2017 14:34 |
Bill B
Messages: 1971 Registered: December 2004
|
Senior Member |
|
|
Yes except for one change.
CREATE OR REPLACE
FUNCTION Bld_file_name(
P_FILE_PREFIX IN VARCHAR2,
P_CURRENT_DATE IN DATE
)
RETURN VARCHAR2
IS
V_THE_SEQ NUMBER:
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
UPDATE File_locks
SET CURRENT_DATE = TRUNC(P_CURRENT_DATE),
THE_SEQ = CASE CURRENT_DATE
WHEN TRUNC(CURRENT_DATE) THEN THE_SEQ + 1
ELSE 1
END
RETURNING THE_SEQ
INTO V_THE_SEQ;
COMMIT;
RETURN P_FILE_PREFIX ||
TO_CHAR(P_CURRENT_DATE,'_YYYYMMDD_') ||
TO_CHAR(V_THE_SEQ,'FM999999000');
END Bld_file_name;
By increasing the format mask for v_the_seq to FM999999000, any sequence number under 1000 will have left padded with zero's if somehow the sequence gets bigger then 999 it won't return ###, it will expand out the sequence to the necessary length. Probably will never be necessary but always code for the edge conditions.
Also shouldn't it be
UPDATE File_locks
SET CURRENT_DATE = TRUNC(P_CURRENT_DATE),
THE_SEQ = CASE CURRENT_DATE
WHEN TRUNC(P_CURRENT_DATE) THEN THE_SEQ + 1
ELSE 1
END
RETURNING THE_SEQ
INTO V_THE_SEQ;
because the set will not use the modified current_date in the same update .
[Updated on: Fri, 02 June 2017 14:41] Report message to a moderator
|
|
|
Re: Sequence generation for Parallel Processing [message #663442 is a reply to message #663440] |
Fri, 02 June 2017 14:45 |
Solomon Yakobson
Messages: 3286 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
Well, it could be you are masking one issue with another. If daily sequence number count must not exceed 999 I'd rather change V_THE_SEQ to NUMBER(3) to get exception. If daily count isn't limited then I'd rely on implicit conversion:
RETURN P_FILE_PREFIX ||
TO_CHAR(P_CURRENT_DATE,'_YYYYMMDD_') ||
V_THE_SEQ;
SY.
|
|
|
Re: Sequence generation for Parallel Processing [message #663444 is a reply to message #663442] |
Fri, 02 June 2017 14:56 |
Bill B
Messages: 1971 Registered: December 2004
|
Senior Member |
|
|
I just noticed a problem when I was checking out Solomon's code. I used a reserve word in my table and code CURRENT_DATE is reserved and the column name needs to be changed or you can do. Sorry about that
UPDATE File_locks
SET CURRENT_DATE = TRUNC(P_CURRENT_DATE),
THE_SEQ = CASE "CURRENT_DATE"
WHEN TRUNC(P_CURRENT_DATE) THEN THE_SEQ + 1
ELSE 1
END
RETURNING THE_SEQ
INTO V_THE_SEQ;
[Updated on: Fri, 02 June 2017 14:57] Report message to a moderator
|
|
|
Re: Sequence generation for Parallel Processing [message #663445 is a reply to message #663444] |
Fri, 02 June 2017 15:05 |
Solomon Yakobson
Messages: 3286 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
I agree it isn't good choice for column name, although keyword CURRENT_DATE is not a reserved in SQL:
select * from V$RESERVED_WORDS where keyword = 'CURRENT_DATE'
/
KEYWORD LENGTH RESERVED RES_TYPE RES_ATTR RES_SEMI DUPLICATE
------------ ------ -------- -------- -------- -------- ---------
CURRENT_DATE 12 N N N N N
SY.
|
|
|
|