Service Knowledgebase
Children Articles
Engineering Updates
Date | 22 August 2022 |
---|---|
Current Status | Resolved |
Ticket | SVCENG-606 |
Title | Web Rosalia tidak bisa dibuka |
User Impact | Website memakan waktu diatas 1 menit untuk dibuka, tidak bisa dipakai customer |
Root Cause | Tadi sempat tidak bisa diakses dikarenakan ada update penambahan tabel di odoo_conf dari sisi kernel yang berkaitan dengan pipeline kernel V2 dan diperlukan restart webservice karena ada proses yang terhenti saat kernel melakukan update table |
Fix Performed | Restart webservice |
Future Prevention | None. The risk of active development. We already have automated run in place to check server speed every 3 hours. |
Date | 9 April 2022 |
---|---|
Current Status | Resolved |
Ticket | SVCENG-282 |
Title | GT06 Device Offline |
User Impact | Device offline untuk GT06 |
Root Cause | Kita ada restart service berkala setiap hari, berhubung kernel kita tidak stabil. Restart pagi ini tidak selesai dalam jangka waktu yang ditentukan (6 menit). |
Fix Performed | Start service manually. |
Future Prevention | Meningkatkan timeout service restart dari 6 ke 10 menit, yang sudah pasti aman. |
Date | 26 March 2022 |
---|---|
Current Status | Resolved |
Ticket |
|
Title | System Offline for 2 Hours, 2AM-4AM Monday 28 Mar 2022. |
User Impact | Offline total |
Root Cause | Perlu upgrade server untuk handle DisHub request |
Fix Performed | - |
Future Prevention | - |
Date | 23 March 2022 |
---|---|
Current Status | Resolved |
Severity | High |
Ticket | SVCENG-242 SVCENG-244 |
Title | Banyak Unit Offline |
User Impact | Sekitar 30-50% armada offline, dan user tidak bisa menggunakan System dengan maksimal |
Root Cause | Code update by Kernel Team untuk menangani data lag 5 menit. Sudah di testing dan jalan di QC environment, tapi tidak jalan di Production, karena saat backlog diolah, banyak info “backlog” yang diolah dulu, kemudian info baru diantrikan, jadi seakan-akan device offline. |
Fix Performed |
|
Future Prevention | Sedang dicari cara, gimana kita bisa simulasi 5000 devices di QC kita, karena sering sekali issue tidak bisa direplicate di QC berhubung jumlah device tidak terlalu banyak.
|
Date | 10 March 2022 |
---|---|
Current Status | Resolved |
Severity | High |
Ticket | ENG-1317 (Resolved) |
Title | DB Writing Bottleneck |
User Impact | Live view data is lagging behind between 0-5 minutes If this continues, system crash / data loss is a possibility. |
Root Cause | On 18 Feb 2022, live view was updated to accomodate NCR request (on behalf of Rosalia), resulting in 40% increase in overall CPU usage in our infrastructure |
Fix Performed |
|
Future Prevention | Will need to discuss with Founder on how we are going to operate. We cannot continue accomodate requests, without paying price for cost/performance.
|
Date | 9 March 2022 |
---|---|
Current Status | Resolved |
Severity | High |
Ticket | SERVICE-206 (Resolved) |
Title | GT06N Devices offline for 12 hours |
User Impact | Loss of data between 9 March 2022 18:00 - 10 March 2022 06:00, for GT06N devices |
Root Cause | Engineer made a code change that results in an uncaught error |
Fix Performed | Code was fixed by the Engineer |
Future Prevention |
|
Date | 07 April 2022 |
---|---|
|
|
|
|