
Solutions by Business Requirement
Solve your systems management needs one requirement at a time with CCSS Solutions by Business Requirement.
Click the links below for common system issues and the solutions available to optimize system performance and increase data center efficiency.
Problem: Our operators missed a message that occurred out of hours, informing them that a cache battery was nearing end of life. As a result, the battery failed and all the data began writing directly to disk. Had this happened during operating hours we would have noticed the system degradation and the impact on users, but because it was late at night, the system crashed. We lost critical information because the last data collection was stored on cache. It was the worst case scenario and must never happen again.
Solution: This can be avoided in the future by using cache battery monitoring in QSystem Monitor (QSM). This critical functionality gives operators a fast means of checking the battery life on a networked system at any time. Operators can view all batteries on all I/O adapters and on all partitions or systems from a single window. This map view of all cache battery information benefits QSM users by providng easy coordination of the replacement of a number of low batteries at once rather than arranging replacement visits for each individual battery, saving time and preventing unnecessary system downtime.
To protect system data, QMessage Monitor will alert staff members in real time when a battery is nearing the end of its life. Alerts are sent via user-defined escalation methods, including email and mobile phone alerts.
A further benefit of this functionality is that it eliminates the need to manually check battery life in the sensitive area of System Service Tools (SST); therefore, protecting sites from unnecessary security violations.
Problem: It is becoming increasingly difficult to justify our necessary machine upgrades to management in an independent and objective way.
Solution: Predictive projections based on historical resource usage offer a visual representation of an upgrade requirement that can be easily understood by non-technical management teams. For example, QSystem Monitor can produce a predictive graph based on transactions-per-hour that allows management to easily see escalating use of resource is based on greater throughput of transactions by a static number of users. Or, in another instance, that sustained transaction rates are increasing due to a rapidly growing user base.
Problem: Is there any way to determine the likelihood that future instances of managing an important group of jobs, such as QZDASOINIT, will impact our CPU and by how much?
Solution: QSystem Monitor (QSM) utilizes any history of this type of problem in the predictive projection summary. QSM produces a valuable guideline as to how the important group of jobs might perform in the future and quantifies their associated CPU usage. This is particularly helpful in determining the likely impact on users who, if disrupted, could directly and negatively impact business processes.
Problem: The nature of our business means we process hundreds of thousands of payment card transactions each year and are required to have an effective way to monitor for PCI DSS (Payment Card Industry Data Security Standard) compliance auditors.
Solution: QMessage Monitor (QMM) can help businesses set up, maintain and monitor their important audit journals on a 24/7 basis. To meet compliance regulations, companies must prove to auditors that the systems running the credit card applications are adequately monitored. QMM produces a report that goes into a spool file which can be saved and reviewed each day. Users can also monitor these important journals from their desktop in real-time and receive QMM alerts should any out of the ordinary situations occur.
Problem: We want to free up resource on our system but can’t delete audit journal messages because Sarbanes Oxley guidelines require them to be available at all times.
Solution: Using QMessage Monitor’s command, MMARCLOG, audit journal messages can be identified as a separate member, saved to tape and deleted from the system, freeing valuable resource. The result is an ‘on-demand’ data environment that is incredibly easy to manage without the substantial storage expense of purchasing additional disk.
Problem: We have a manual operational checklist that should be completed each day. When we are really busy, the tasks on the checklist are sidelined. Ultimately, this results in more time-consuming problems. Incomplete checklists are virtually useless as an analysis tool. Is there a better solution?
Solution: Automating routine checklists (by creating a system message for each task) with QMessage Monitor (QMM) is a fast way to ensure team members respond and are accountable in the same way they are accountable for urgent system messages. QMM sends real-time prompts to individuals or groups which can be escalated to a number of communication devices. This not only creates better operational efficiency but ensures that compliance to industry standards such as SOX, HIPPA or PCI is maintained on a daily basis.
Problem: We defined members in the history log some time ago but now have an immediate requirement for additional disk space. We might need these members for analysis so we’d prefer not to delete them. What can we do?
Solution: Use the command MMARCRMV in QMessage Monitor to delete previously defined members within the history log. This can act as a ‘mini-purge’ and immediately free up valuable disk space. Members can then be added back to the file for analysis, if required.
Problem: What can we do when we need to immediately free up auxiliary storage?
Solution: Use QSystem Monitor’s (QSM) disk usage inquiry to make use of the MONRGZPF command within the QSM disk module. This command allows managers to easily reorganize files that have the largest number (or percentage) of deleted records. As a result, they free up disk space storage and eliminate the need for additional and immediate disk spend. Large IBM i environments could be supporting over a million deleted records which are taking up space on the machine. By using this QSM feature, managers can quickly identify costly repositories and reclaim their auxiliary.
Problem: How can we get the disk benefits of a purge without interrupting our system monitoring?
Solution: The purge-whilst-active feature within QSystem Monitor’s history module means important data purges can be made without the need to end the product. This means IT managers never have to compromise their real-time monitoring in order to free up valuable additional disk through the MONPURGE command. By breaking the command down into two separate elements, a purge of data and a subsequent reorganization of the files, managers can delete records which essentially flags them for deletion and then reorganize the files at a less busy time on the system.
Problem: We’ve identified a number of security violations and need to produce a ‘document of proof’ to support a network-wide change in policy.
Solution: This task can be carried out immediately using QMessage Monitor’s MMLOGINQ command. The resulting hard copy report can be shared with management teams and external agencies, such as auditors, legal bodies or other authorities as circumstances dictate.
Problem: A security hole was detected too late when a malicious outbound client request occurred. How can we protect ourselves from a similar situation in the future?
Solution: QMessage Monitor (QMM) offers real-time FTP monitoring for outbound requests on IBM i systems. This is an enormous benefit for users, e.g. support staff, who have to allow FTP to be open. With this functionality, they can audit the actual commands that are run both to and from the system. All requests can trigger a user-defined escalation procedure in QMM that will alert staff so they can investigate the request itself and who is responsible. Furthermore, QMM provides a full audit trail of these actions which can act as a valuable ‘proof tool’ should fraudulent use be detected.
Problem: Our firewall became inactive just prior to giving a third party access to the system for troubleshooting. Thankfully, there was no malicious intent on behalf of the third party, but we had no idea we were so exposed. Could we have known about the firewall?
Solution: Managers can guard against this type of scenario by using flexible network monitoring to ensure that critical elements are active 24/7. In this case, that element would be the organization’s firewall. QSystem Monitor can notify managers in real-time if their firewall becomes inactive. As a result, they can take immediate steps to limit any breach of security, ensuring data is not compromised. This flexible monitoring can be extended to any device with an IP address including other IBM i servers, specified PCs (e.g. the CEO), mainframes, and other servers or routers.
Problem: Due to a recent company merger, we have a number of systems that are now managed by a staff member in a remote location. It’s important that we migrate our system messaging rules to these new systems, but our management team is already stretched for time. Is there a time efficient solution for this process?
Solution: Using the auto-replies feature in QMessage Monitor, operators who sub-set their systems into defined groups (for example, based on geographical location) can set up message rules based on those groups. This saves managers considerable administration time because they can create a single record for the group rather than separate auto-replies for each system.
Problem: We could save considerable sums of money by bringing our machines and operations in-house. This would eliminate the cost of managing operations through a third party. However, we want to be sure the move increases our own resource efficiency to a ‘lights out’ status.
Solution: The CCSS suite of automated solutions, QSystem Monitor, QMessage Monitor and QRemote Control, makes an in-house move an opportunity to further enhance automated operations for maximum performance, resource efficiency and sustainability of a ‘lights out’ status. With the systems in-house, managers have hands-on access to not only automate operating tasks, but to resolve reoccurring problems. This creates a cleaner system with far fewer events requiring analysis.
Problem: Certain jobs are experiencing very poor performance, but the cause is not obvious.
Solution: The issue here could be memory pool performance. A high rate of faulting in a specific pool indicates there are either too many jobs or insufficient memory in that pool. Without dedicated memory pool performance monitors, this situation typically leads to a lengthy investigation process. Operators would have to identify the memory pool where jobs are running and then determine how many jobs are in the pool to isolate the problem. With QSystem Monitor’s memory pool performance monitors, operators could immediately discover these types of issues and react before they impact users.
Problem: A user is complaining about delays but there seems to be no corresponding system issues. This makes it hard to repair the problem.
Solution: The user may be experiencing substantial delays because their application program is trying to access an object that is in use by another job. It feels like a delay to the user, but in fact, they are locked out. It is very difficult to diagnose lock wait status. So for this problem, QSystem Monitor (QSM) has a dedicated monitor to supply the average lock wait time per transaction for interactive users. If average times are exceeded, QSM immediately alerts managers, pre-empting complaints to the help desk.
Problem: A user logged on to the IBM i server then switched off his PC, believing he was ‘logged out’ when he was not. As a result, the user entered the system under the generic QUSER profile, consuming resource. There are hundreds of QUSERs on any given system (let alone the entire network), so the task to hunt down the particular culprit became a painfully drawn out and expensive process. How can we quickly identify these users in the future?
Solution: In this example, managers could simply set up QSystem Monitor’s MONCHKJCP. This check monitors all QUSERs in a particular subsystem. If this particular problem arises, operators can automatically take action without further impact on resource. MONCHKJCP checks the CPU usage of the job, then takes the appropriate action, e.g. hold the job, lower its priority, or take no action. In this case, the action would be determined by user-defined levels of CPU usage. To specify the check further, managers can include or exclude generics and users.
Problem: Our payroll file, which is in a third party application, was held up on the system and subsequently didn’t process on time. We were only aware of it when on Monday morning when employees demanded an explanation as to why they hadn’t been paid. How can we ensure this never happens again?
Solution: Important physical files in third party applications (such as payroll) can be monitored in real-time with QSystem Monitor. If the same circumstances arose, IT managers would be alerted to the problem through a dedicated series of escalations (pager, email, mobile phone), giving them ample time to resolve the issue without it ever impacting employees.
Problem: Following a security audit, some small changes were made on the system including revoked access for a particular file. This change caused a job to loop and generate spool files which sent the auxiliary storage soaring. As a last resort, we were forced to purchase additional disk to give us the necessary investigation time. Surely there’s a better way to deal with these types of issues.
Solution: QSystem Monitor’s (QSM) real-time DASD growth monitoring capabilities are ideal for combating the problems raised in this scenario. Firstly, QSM’s combination of real-time alerts and thresholds would have highlighted the escalating situation before it reached any kind of crisis level. Secondly, managers would have been able to radically reduce investigation time because they would have immediately seen the offending, looping job. More importantly, they could resolve the issue without endlessly searching subsystems. Finally, this resolution would have eliminated any need to purchase more DASD for the system.
Problem: Temporary storage issues are always a race against the clock for us. We know there is an issue but finding it is a lengthy process. How can we get from point A (detecting there is a problem) to point B (resolving it) in the fastest possible time frame?
Solution: If temporary storage problems have the potential to cause a lot of damage in a short amount of time, the simplest solution is to monitor temporary storage. This makes issues visible and accessible for prevention. A dedicated group monitor, which can be a single bar on QSystem Monitor’s GUI interface, will alert operators to exceeded thresholds and will flash red. By clicking on the temporary storage bar, users can see the individual subsystems and the total MB usage of each one. Operators can then immediately create a new group monitor to detect all the jobs running in that subsystem and view their individual temporary storage usage. Jobs that reside in that subsystem can be easily viewed and sorted into high usage order for immediate problem identification. This entire process pinpoints problems in a couple of minutes, ensuring operators can be pro-active to their temporary storage issues.
Problem: Our High Availability (HA) solution switches over when CPU reaches a defined high percentage. Will your monitoring and automation software switch over at the same time? Our concern is we could be replicating the same problem on another system without ever knowing about it.
Solution: Yes, this potential shortcoming in the HA theory is addressed by CCSS solutions on two levels. Firstly, if the cause of the high CPU is a system problem that is rapidly escalating, QSystem Monitor will issue a number of individual monitors, alerts and warnings on the primary server before CPU reaches critical levels. This early notification minimizes the likelihood of HA switchovers caused by system issues. Secondly, if these multiple warnings are ignored for some reason--for example, in the event of a natural disaster that affects the data center--automation software, QMessage Monitor (QMM), will switch to the new primary box at the same time as the HA switch. This gives managers time to resolve the issue that originally caused the problem (such as the escalating CPU caused by a looping job). When the primary system is operational again, QMM switches back over automatically. Even in this most extreme and unlikely example, monitoring is uninterrupted.
Problem: We’ve been accruing an unacceptable number of financial penalties because our SLAs are breached when HA software that relies on Journal Receivers, like the Audit Journal, is compromised by rogue jobs. The financial impact is extensive in each instance as we’re obligated to make DASD available until the issue is resolved. Extra investigation time (sometimes overtime) is usually required to find the cause of the problem. How can we guard against this unnecessary expense?
Solution: This problem can be addressed in several ways. Firstly, the root cause, i.e. the rogue job (be it looping, inactive or generating high CPU) can be identified in real-time using QSystem Monitor’s job monitoring functionality. With all the necessary information at operators’ fingertips, investigation time is minimal. Secondly, audit journal entries and the status of journal receivers can be monitored in real-time for immediate problem resolution with QMessage Monitor. Thirdly, managers in this particular situation have the far more economical option of running the receivers to tape while the problem is being resolved. This avoids unnecessary and expensive use of DASD. By meeting these challenges, QSystem Monitor and QMessage Monitor help your team greatly reduce or, better yet, eliminate the instances of SLA penalties.
Problem: In the past, one of our IOPs was being utilized more than others. Consequently, our users complained of poor response time, but the IOP problem was not immediately obvious to our systems team. We need more system visibility for these areas. Issues like this take too much investigation time.
Solution: Multiple jobs that are sharing insufficient main memory can increase I/O and CPU usage. Automated monitoring of the number of non-database page faults allows QSystem Monitor to alert operators to the situation so they can assign more memory to the jobs. This decreases unproductive use of disk and CPU resources and, therefore, prevents the associated problems of diminished throughput and longer response times for users. Alternatively, this could be a system configuration issue. Without instant visibility of IOPs as offered in CCSS solutions, numerous hours could be wasted investigating possible causes.
Problem: A user ran our important ‘end-of-day’ job under the wrong profile which did not have the correct level of authorization for the job to complete. The job failed and the subsystem ended, making it unavailable to other users. The next day our help desk was bombarded with calls from users asking why they couldn’t access their work.
Solution: A simple job status monitor assigned to the important ‘end-of day’ job can resolve this issue in real-time. Managers with QSystem Monitor (QSM) can pinpoint the issue, restart the subsystem and resubmit the job under the correct profile all in a matter of minutes. With QSM, The impact on users would have been zero and operations would have carried on normally the following day.
Business Requirement: Multi-Platform Monitoring and Management that’s Cost-Effective and Easy to Implement.
Solution: ‘up.time,’ easy-to-use and cost-effective systems management software to manage, measure, and monitor Physical, Virtual, and Cloud assets, servers, applications, resources, and services from a unified dashboard across many platforms and multiple datacenters. uptime has over 700 customers in more than 32 countries worldwide.
-
Trial: Download an up.time 30-Day Risk-free Trial, with full support included
Watch: up.time Quick Product Tour
Learn More: up.time Multi-Platform Systems Management

